Tuesday, October 20, 2009

Tag, path and weighted tag

Tags have been slowly replacing the path/folder we used to use to organize articles/emails ... etc.

Tags, in mathematic terms, can be considered based on set theory. An information item can be considered an element while tags establish sub-sets. You can, therefore, find information items that belong to the intersection of the 'tea' subset and the 'green' subset and, hopefully, some of them are about 'green tea'.

The paths/folders are more like ordered tuples. It can be thought of as giving weights to the leading/parent folders. For example, a path of Education/computer could describing an item for educational community but contains information about computer usages. On the other hand, Computer/Education could be describing an item that are mainly for IT audiences but is related to education.

In terms of classifying an information item, the path/folder approach can give additional info about and item than tags, assuming using the same keywords. On the flip side, of cause, the classifying with paths are more involved. The other problem with paths is that, even though it provide weights, the weights are either 1 or 0 for parent and child folder.

The other issue to consider is the use of phrases in addition to words for tagging. There are times that tag with words is not the same as tag with a phrase. This applies to the paths too!

A possible alternative to tag and path is the weighted tag/phrase. In addition to plan tags/phases we can give them weights. The searching mechanism can now using both tag and weights to provide better results. In addition to that, the weight to a phrase can be distributed to words in the phrase in calculating the searching weights.

