Thursday, January 11, 2007

Flexible Taxonomies

In Web Analytics And Content Group Management, Gary Angel writes about the use of taxonomies in web analytics, stating that "no single taxonomy is likely to support a very wide range of analytic problems. [...] there are only taxonomies appropriate for more or fewer analytic problems."

He mentions the common yet limiting use of navigational taxonomy and points out that the analysis process/application should "be able to construct multiple 'point' taxonomies that can be used for specific analytic purposes [...] The combination of a graphical drag-and-drop interface, ability to apply regex rules and the ability to create analysis specific taxonomies on the fly [...] would make it relatively easy to 'manufacture' a taxonomy for analysis..."

Angel is proposing a method of flexible taxonomy creation, which would be useful not only in web analytics, but also in the analysis of any type of document collection, including business, scientific or humanities.

So how is a flexible taxonomy implemented? Angel mentions that taxonomy often needs to be based on the content of page instead of its function. The semantic content of a page can be determined by computational linguistics, which includes natural language processing and semantic relatedness/differential.

Theoretically, a document (or even a paragraph) could be analyzed to determine its semantic meaning. The semantic data could then be used to build a taxonomy, either automatically without user-intervention, or as Angel proposes, dynamically, allowing the analyst to determine what characteristics define their custom taxonomic unit.

For example, in building a taxonomy that describes communications technology, an analyst would undoubtedly include Apple's new iPhone. Depending on its commercial success, the present and future technology included in the iPhone could heavily influence the rest of the communications industry. Semantic analysis of documents describing the iPhone would reveal that its multi-touch interface is, in part, the result of research conducted by Jefferson Han at NYU. New taxonomic units could then be created that classify the various influences on a particular technology, such as market, military or academic.

I greatly appreciate Angel's thoughtful remarks about flexible taxonomies. The fields of web analytics and document management obviously have much in common.

