…but you can go ahead and waste everybody else – according to Google’s suggest function at least:

This is interesting because it is very obvious that Google erases certain queries in their suggest function (porn, etc.) and the idea that the Internet would “make” suggestible teenagers kill themselves is a recurring and media-fed scare that, as a consequence, is one of the few domains where censoring is near consensual. What I find interesting though is that all these other carnage scenarios do not get the DELETE FROM treatment, although one may argue that killing oneself is not more condemnable than killing somebody else.

But independently of this philosophical question (the only one worth pondering according to Camus, remember?), Google suggest is yet another way to query the closest thing to god there is: Google’s database; and the way certain queries are removed, most certainly by hand (BTW, “je veux me” on google.fr DOES suggest that you may want to end your life)…

Arstechnica is one of the reasons why I believe that there is a future for quality journalism online. Not only because they produce great copy but also because it is one of the few places on the Internet where I don’t want to start maiming myself when I accidentally stumble over the article comments. Ars talks about technology, sure, but there is more and more content on science and really great, well researched pieces on wedge topics (“wedgy” mostly in the US, but spreading) like climate change and evolution. In this article on the basic conceptual differences between studying weather and climate, I stumbled over a comment that I would like to (and probably will) frame and hang on my wall. User Andrei Juan writes:

Regarding the author’s remarks made in the first few paragraphs of the article about comments and commenters, it seems to me that the number of people who post comments to online articles is (perhaps to a lesser extent here on ArsTechnica) usually much larger than the number of people whose education — formal or not — allows them to understand the article well, let alone make meaningful comments.

This is, I think, but one manifestation of many people’s tendency to express themselves in many more situations than when they have something to express. Turned into habit, this leads to confusions like the one discussed by the article, which are IMO a natural outcome of situations in which people who barely passed their high school math and physics tests develop their own opinions (or parrot those of their peers) about topics like dynamic systems. Moreover, put this together with the openness of an online “debate” — which lures people into feeling welcome to discussions where they’re utterly out of their depth yet don’t realize it — and another interesting specimen appears: the person who’s opinionated without really having an opinion.

On soccer fields, we hear these people blowing in vuvuzelas; in the comment sections of online articles though, that option is unavailable, so they’re only left with (ab)using the “Leave a comment” option. Could we, perhaps, eliminate most meaningless comments by adding a button labeled “Blow a vuvuzela” next to the one that says “Leave a comment”?…

In that sense, the highly disturbing “like” and “retweet” buttons one can find on so many sites now may actually have the boon to prevent some people from posting a comment. Not the sophistication of Slashdot‘s karma based moderation system but potentially effective…

Gabriel Tarde is a springwell of interesting – and sometimes positively weird – ideas. In his 1899 article L’opinion et la conversation (reprinted in his 1901 book L’opinion et la foule), the French judge/sociologist makes the following comment:

Il n’y [dans un Etat féodal, BR] avait pas “l’opinion”, mais des milliers d’opinions séparées, sans nul lien continuel entre elles. Ce lien, le livre d’abord, le journal ensuite et avec bien plus d’efficacité, l’ont seuls fourni. La presse périodique a permis de former un agrégat secondaire et très supérieur dont les unités s’associent étroitement sans s’être jamais vues ni connues. De là, des différences importantes, et, entre autre, celles-ci : dans les groupes primaires [des groupes locales basés sur la conversation, BR], les voix ponderantur plutôt que numerantur, tandis que, dans le groupe secondaire et beaucoup plus vaste, où l’on se tient sans se voir, à l’aveugle, les voix ne peuvent être que comptées et non pesées. La presse, à son insu, a donc travaillé à créer la puissance du nombre et à amoindrir celle du caractère, sinon de l’intelligence.

After a quick survey, I haven’t found an English translation anywhere – there might be one in here – so here’s my own (taking some liberties to make it easier to read):

[In a feudal state, BR] there was no “opinion” but thousands of separate opinions, without any steady connection between them. This connection was only delivered by first the book, then, and with greater efficiency, the newspaper. The periodical press allowed for the formation of a secondary and higher-order aggregate whose units associate closely without ever having seen or known each other. Several important differences follow from this, amongst others, this one: in primary  groups [local groups based on conversation, BR], voices ponderantur rather than numerantur, while in the secondary and much larger group, where people connect without seeing each other – blind – voices can only be counted and cannot be weighed. The press has thus unknowingly labored towards giving rise to the power of the number and reducing the power of character, if not of intelligence.

Two things are interesting here: first, Lazarsfeld, Berelson, and Gaudet’s classic study from 1945, The People’s Choice, and even more so Lazarsfeld’s canonical Personal Influence (with Elihu Katz, 1955) are seen as a rehabilitation of the significance (for the formation of opinion) of interpersonal communication at a time when media were considered all-powerful brainwashing machines by theorists such as Adorno and Horkheimer (Adorno actually worked with/for Lazarsfeld in the 30ies, where Lazarsfeld tried to force poor Adorno into “measuring culture”, which may have soured the latter to any empirical inquiry, but that’s a story for another time). Tarde’s work on conversation (the first order medium) is theoretically quite sophisticated – floating against the backdrop of Tarde’s theory of imitation as basic mechanism of cultural production – and actually succeeds in thinking together everyday conversation and mass-media without creating any kind of onerous dichotomy. L’opinion et la conversation would merit an inclusion into any history of communication science and it should come as no surprise that Elihu Katz actually published a paper on Tarde in 1999.

Second, the difference between ponderantur (weighing) and numerantur (counting) is at the same time rather self-evident – an object’s weight and it’s number are logically quite different things – and somewhat puzzling: it reminds us that while measurement does indeed create a universe of number where every variable can be compared to any other, the aspects of reality we choose to measure remain connected to a conceptual backdrop that is by itself neither numerical nor mathematical. What Tarde calls “character” is a person’s capacity to influence, to entice imitation, not the size of her social network.

I’m currently working on a software tool that helps studying Twitter and while sifting through the literature I came across this citation from a 2010 paper by Cha et al.:

We describe how we collected the Twitter data and present the characteristics of the top users based on three influence measures: indegree, retweets, and mentions.

Besides the immense problem of defining influence in non trivial terms, I wonder whether many of the studies on (social) networks that pop up all over the place are hoping to weigh but end up counting again. What would it mean, then, to weigh a person’s influence? What kind of concepts would we have to develop and what could be indicators? In our project we use the bit.ly API to look at clickstream referers – if several people post the same link, who succeeds in getting the most people to click it – but this may be yet another count that says little or nothing about how a link will be uses/read/received by a person. But perhaps this is as far as the “hard” data can take us. But is that really a problem? The one thing I love about Tarde is how he can jump from a quantitative worldview to beautiful theoretical speculation and back with a smile on his face…

Over the last year, I have been reading loads of books in and on Information Science, paying special attention to key texts in the (pre)history of the discipline. Fritz Machlup and Una Mansfield’s monumental anthology The Study of Information (Wiley & Sons, 1983) has been a pleasure to read and there are several passages in the foreword that merit a little commentary. I have always wondered why Shannon’s Mathematical Theory of Communication from 1948 has been such a reference point in the discipline I started out in, communication science. Talking about purely technological problems and pumped with formulas than very, very few social science scholars could make sense of, the whole things seems like a misunderstanding. The simplicity and clearness of the schema on page two – which has been built into the canonical sender-receiver model – cannot be the only reason for the exceptional (mostly second or third hand) reception the text has enjoyed. In Machlup & Mansfield’s foreword one can find some strong words on the question of why a work on engineering problems that excludes even the slightest reference to matters of human understanding came to be cited in probably every single introduction to communication science:

“When scholars were chiefly interested in cognitive information, why did they accept a supposedly scientific definition of ‘information apart from meaning’? One possible explanation is the fact that they were impressed by a definition that provided for measurement. To be sure, measurement was needed for the engineering purposes at hand; but how could anybody believe that Shannon’s formula would also measure information in the sense of what one person tells another by word of mouth, in writing, or in print?
We suspect that the failure to find, and perhaps impossibility of finding, any ways of measuring information in this ordinary sense has induced many to accept measurable signal transmission, channel capacity, or selection rate, misnamed amount of information, as a substitute or proxy for information. The impressive slogan, coined by Lord Kelvin, that ‘science is measurement’ has persuaded many researchers who were anxious to qualify as scientists to start measuring things that cannot be measured. As if under a compulsion, they looked for an operational definition of some aspect of communication or information that stipulated quantifiable operations. Shannon’s formula did exactly that; here was something related to information that was objectively measurable. Many users of the definition were smart enough to realize that the proposed measure – perfectly suited for electrical engineering and telecommunication – did not really fit their purposes; but the compulsion to measure was stronger than their courage to admit that they were not operating sensibly.” (p. 52)

For Machlup & Manfield – who, as trained (neoclassical) economists, should not be deemed closet postmodernists – this compulsion to measure is connected to implicit hierarchies in academia where mathematical rationality reigns supreme.  A couple of pages further, the authors’ judgment becomes particularly harsh:

“This extension of information theory, as developed for communication engineering, to other quite different fields has been a methodological disaster – though the overenthusiastic extenders did not see it, and some of them, who now know that it was an aberration, still believe that they have learned a great deal from it. In actual fact, the theory of signal transmission or activating impulses has little or nothing to teach that could be extended of applied to human communication, social behavior, or psychology, theoretical or experimental.” (p. 56)

Shannon himself avoided the term “information theory” and his conception of communication obviously had nothing to do with what the term has come to mean in the social sciences and general discourse. But the need to show that the social sciences could be “real” sciences in search of laws formulated in mathematical terms proved stronger than the somewhat obvious epistemological mismatch.

Like many classic texts, Machlup & Manfield’s work offers a critique that is not based on dismissal or handbag relativism but on deep engagement with the complexities of the subject matter and long experience  with interdisciplinary work, which, necessarily, makes one bump into unfamiliar concepts, methods, ontological preconceptions, modes of reasoning, vectors of explanation and epistemological urges (what is your knowledge itch? how do you want to scratch it?). The Study of Information is a pleasure to read because it brings together very different fields without proposing some kind of unifying meta-concept or imperialist definition of what science – the quest for knowledge – should look like.

…is so much easier if you’ve got a couple of popular pages to advertise on…

chrome_suggest_march_2010

…and another one…

chrome_suggest_april_2010.JPG

…browser wars all over again…

When it comes to search interfaces, there are a lot of good ideas out there, but there is also a lot of potential for further experimentation. Search APIs are a great field for experimentation as they allow developers to play around with advanced functionality without forcing them to work on a heavy backend structure.

Together with Alex Beaugrand, a student of mine, I have built (a couple of month ago) another little search mashup / interface that allows users to switch between a tag cloud view and a list / cluster mode. contextDigger uses the delicious and Bing APIs to widen the search space using associated searches / terms and then Yahoo BOSS to download a thousand results that can be filtered through the interface. It uses the principle of faceted navigation to shorten the list : if you click on two terms, only the results associated with both of them will appear…

Since I have started to play around with the latest (and really great, easy to use) version of the gephi graph visualization and analysis platform, I have developed an obsession to build .gdf output (.gdf is a graph description format that you can open with gephi) into everything I come across. The latest addition is a Facebook application called netvizz that creates a .gdf file describing either your personal network or the groups you are a member of.

There are of course many applications that let you visualize your network directly in Facebook but by being able to download a file, you can choose your own visualization tool, play around with it, select and parameter layout algorithms, change colors and sizes, rearrange by hand, and so forth. Toolkits like gephi are just so much more powerful than Flash toys…

my puny facebook network - gephi can process much larger graphs

my puny facebook network - gephi can process much larger graphs

What’s rather striking about these Facebook networks is how much the shape is connected to physical and social mobility. If you look at my network, you can easily see the Klagenfurt (my hometown) cluster to the very right, my studies in Vienna in the middle, and my French universe on the left. The small grape on the top left documents two semesters of teaching at the American University of Paris…

Update: v0.2 of netvizz is out, allowing you to add some data for each profile. Next up is GraphML and Mondrian file support, more data for profiles, etc…

Update 2: netvizz currently only works with http and not https. I will try to move the app to a different server ASAP.

My colleague Theo Röhle and  I went to the Computational Turn conference this week. While I would have preferred to hear a bit more on truly digital research methodology (in the fully scientific sense of the word “method”), the day was really quite interesting and the weather unexpectedly gorgeous. Most of the papers are available on the conference site, make sure to have a look. The text I wrote with Theo tried to structure some of the epistemological challenges and problems to take into account when working with digital methods. Here’s a tidbit:

…digital technology is set to change the way scholars work with their material, how they “see” it and interact with it. The question is, now, how well the humanities are prepared for these transformations. If there truly is a paradigm shift on the horizon, we will have to dig deeper into the methodological assumptions that are folded into the new tools. We will need to uncover the concepts and models that have carried over from different disciplines into the programs we employ today…

The question of how mathematics could lay the foundation for a machine that sustains such a wide variety of practices is really quite well understood from the point of view of the mathematical theory of computation. From a humanities standpoint however, despite the number of texts commenting on the genius of key figures such as Gödel, Turing, Shannon, and Church, there is still a certain awkwardness when it comes to situating the key steps in mathematical reasoning that lead up to the birth of the computer in the larger context of mathematics itself. One of the questions I find really quite interesting is the role of the formalist stance in mathematics.

In the philosophy of mathematics, there are many different positions. The realist stance for example holds that mathematical objects exist. For the platonist, they exist in some kind of extra spatio-temporal realm of ideas. For the physicalist, they are intrinsically connected to material existence, even if that relationship is not necessarily simple. Then there is formalism and this is where things get interesting. In a tale we can read in many social sciences and humanities books on the computer, there is the young Kurt Gödel that smashes the coherent world of the “establishment” mathematician David Hilbert, inventing the metamathematical tools that will later prove essential for the practical realization of computing machinery in the process. What is most often overlooked in that story is that Hilbert’s formalist position is already an extremely important step in the preparation for what is to come. For Hilbert, the question of the ontological status of mathematical objects is already a no-go – truth is no longer defined via any kind of correspondence to an external system but as a function of the internal coherence of the symbolic system. As Bettina Heintz says, Hilbert’s work rendered mathematical concepts “self-sufficient” (autark) by liberating them from any kind of external benchmark and opening a purely mechanical world where symbolic machinery can be built at will, like in a game.

If we want to think about computing today, I think we should remember this break from an ontological concept of truth to a purely formalistic one (even if that mean Gödel put a pretty big crack in it lateron). Because in a way, programming is like a “game” with formulas and if the algorithm works, that means it is “true”. In this sense, Google’s PageRank algorithm is true. But without the reference to an external system, this “truth” is purely mechanical, internal. In a similar way, an algorithm’s claim to objectivity, impartiality, or neutrality should be seen as internal only. The moment we apply mathematics to the description of some external mechanism (gravity, for example), there is a second truth criterion that intervenes, which refers to the establishment of correspondence between the formal system and the external reality. In the same way, if an algorithm is applied to, let’s say the filtering of information, the formal world of the game is mapped onto another world. There is an important difference however. When mathematics are applied to physical phenomena, the gesture is descriptive and epistemological (verb: is). When an algorithms is applied to tasks such as information filtering, the gesture is prescriptive and political (verb: ought).

The fact than an automatic procedure works makes it true in a formal sense. The moment we apply it to a certain task, other criteria intervene. Hilbert’s formalism pulled mathematics from the empirical world and if we bring the two together again by writing software, the criteria by which we judge the quality of that action should be seen as political because there are no mathematical criteria to judge the mapping of on world onto the other. No Hilbert to hold our hand…

Since Yahoo recently ~sold its search business to Microsoft (see this NYT article for details) a lot of people where asking themselves what would happen to the Yahoo search APIs, which are in fact some of the most powerful free tools out there to built search mashups with. As Simon Wilson indicates in this blog post, some of them (Term Extraction and Contextual Web Search) are closing down at the end of August. Programmable Web lists 33 mashups that use the Term Extraction service and these sites will either have to close down or start looking for alternatives. This highlights a problem that can be a true roadblock for developing applications making heavy use of APIs. My own termcloud search and its spiced up cousin contextdigger use Yahoo BOSS and quite honestly, if MS kills that Service, these experiments (and many others) will be gone for good, because Yahoo BOSS is the only search API that provides a list of extracted keywords for each delivered Web result.

If service providers can close APIs at will, developers might hesitate when deciding whether to put in the necessary coding hours to built the latest mashup. But it is mashups that over the last years have really explored many of the directions left blank by “pure” applications. This creative force should be cherished and I wonder if there may be a need for something similar to creative commons for APIs – a legal construct that gives at least some basic rights to mashup developers…