Category Archives: folksonomy

This should probably go into a funstuff section somewhere, but I used some moments of free time today to upload a script I have written some time ago to github. It’s a very simple piece of code that grabs images tagged with a specified word and, by looking at which tags appear together, creates a co-tag graph file in .gdf format. You can get it from here or run it here. To test how it scales – and to finally know what teens (apparently tumblr’s main audience) dream of – I tried it with 500 sets of 20 images for the tag “dream”. This leads to some 7K distinct tags and after some filtering, that’s what comes out (click image for lager view):
Node size is occurrence count and color (blue => yellow => red) is betweenness centrality. Apparently, love is still a thing out there. Nice.

This may become an actual tool further down the road, but maybe it’s already useful to somebody as is.

EDIT: Try it out here:

After having finished my paper for the forthcoming deep search book I’ve been going back to programming a little bit and I’ve added a feature to termCloud search, which is now v0.4. The new “show relations” button highlights the eight terms with the highest co-occurrence frequency for a selected keyword. This is probably not the final form of the feature but if you crank up the number of terms (with the “term+” button) and look at the relations between some of the less common words, there are already quite interesting patterns being swept to the surface. My next Yahoo BOSS project, termZones, will try to use co-occurrence matrices from many more results to map discourse clusters (sets of words that appear very often together), but this will need a little more time because I’ll have to read up on algorithms to get that done…

PS: termCloud Search was recently a “mashup of the day” at

You’ve probably already read it somewhere (like here or here), has blundered a little bit – for a couple of hours the search query “terrorist costume” brought up a single hit, a rubber mask with Obama’s face. I really don’t know how many people would have found out on their own but there’s some buzz going now and there actually is something worth pondering about the case. How it happened is quite easy to reconstruct: amazon allows users to label products (Folksonomy) and includes these tags into their general search engine. So somebody tagged the Obama mask with “terrorist” (“costume” was already a common keyword) and there you go. What I find interesting about this is not that there would be any real political consequence to this matter but the fact that folk-tagging can be as easily dragged into different directions as anything else. I’m currently working on a talk for the Deep Search conference (running late as so often these days) and I’ve been looking at Jimmy Wales’ project Wikia Search which uses community feedback in order to re-rank results. The question for me is how this system would be less pervasive to manipulation or SEO than today’s dominant principle, link analysis. The amazon case shows quite well that when you enter a contested field, there’s going to be fallout and the reason that there isn’t more of it already is probably because the masses are not yet aware of the mischief potential. And I don’t see how the “wisdom of the crowd” principle (whether that is folksonomy, voting, result re-ranking, etc.) cannot be hijacked by a determined individual or company that understands the workings of the algorithms that structure results (in the amazon case you would have needed to know that user tags are used in the general search). So what is really interesting about the Obama mask incident is how things continue at amazon (and other folksonomy based servives) – if user tags can be used to drive traffic to specific products, the marketeers will come in droves the moment the numbers are relevant…