Category Archives: critique

The use of computers in the humanities has a long and fine history. What is striking though is how lucid scholars reflected on their tools even in the earliest days. Here’s a beautiful citation by Irwin C. Lieb from a text published in the the inaugural issue of Computers in the Humanities, a journal started in 1966.

The great advances which have so far been made with computers have been in those fields where we find countable items or have ready substitutes for them. The real or seeming extraneousness of computer studies for the humanities is owed to the fact that, in the humanities, what are most important are, if items at all, items that we can’t count, or can count only most artificially. We know, for example, how little definite we mean in saying that we have two or three ideas, that there are four themes in a play, or that there were this or that number of historical events. Our “counting” is not the counting of items that were somehow there separate, waiting to be pointed out; it is a “counting” in which judgments themselves mark out what come to be the items that we count. Apart from the judgments, there are no separate items. Therefore, no technique of counting such items so as to yield, for the first time, a judgment or a summary is possible at all. But, granting that this sort of limitation is inescapable, computers could, it seems, still come to have a more vital use in the humanities than we have seen so far.

[…]

The suggestion, then, is that some of the simplest but most important work to be done in deepening the usefulness of computers for the humanities will be in imagining those schemas by which we will model what we know cannot be modeled undistortedly: — ideas, themes, events and even more importantly, insights, appraisals, and appreciations. There are, there must be, revealing models for all of these. And as we think of them, and then use them in the humanities, the achievement for us will come as we feel out just what the distortions are, as we make the right mistakes. For as we see them as mistakes, we will penetrate further and still more appreciate what we are most concerned to understand. With the possibilities for computer studies of depth and importance in the humanities seeming still so genuine, it would be a mistake, I think, to curtail our exploration of them soon.

When it comes to scrutinizing companies for their actions and policies concerning control over information, privacy issues, and market dominance in areas related to public debate, large media conglomerates have been the traditional objects of analysis. More recently, Internet giants such as Google and Facebook have been critically examined and when the hype levels off, Twitter will probably be the next on the list. Malcolm Gladwell’s recent piece in The New Yorker may very well be an indicator of things to come.

Whether the issues related to “social media” are important or not, I have the feeling that the debate overshadows questions and problem fields that may in fact be much more important. The most obvious case, in my view, is the debate on privacy on Facebook. While the matter is not irrelevant, I think that e.g. present and future state-run information systems such as the french EDVIGE, a central police database that assembles all kinds of personal information concerning select persons “of interest”, have been overshadowed by debate on whether your employer can see the pictures that document your drinking binges after somebody (you?) put them on the ‘Book. There is a certain disequilibrium in how Internet researchers and critics distribute their attention that has allowed all kinds of things to pass below the radar. But there is one event that has really shook me up recently, both because of its importance and the lack of outcry it garnered, at least in my echo chamber: the acquisition of the Reuters group by the Thomson corporation in 2008 and the creation of Thomson Reuters, an information giant second to none.

Thomson Reuters market divisions

Thomson Reuters market divisions

I have stumbled upon Thomson Reuters a couple of times over the last years: first, when I researched the history of citation indexing, I learned that Thomson Scientific had bought the Institute of Scientific Information (and their Web of Science citation index megabase from which things like the notorious Impact Factor are calculated) in 1992; then again when I noticed that the ClearForest API for term extraction had be renamed, remodeled, and rebranded as OpenCalais after Reuters bought the company in 2007; finally, last year, when I noticed that the Reuters video platform appeared more and more often in articles and links. When I finally started to look a little closer (NYSE:TRI) I was astounded to find a company with a market cap of $31B, annual revenues of $13B, and 55K+ employees all over the world. Yes, this is no Apple big, but still very, very big for a company that sells information.

I knew Reuters from my studies in communication science as the world’s biggest news agency (with roughly one and a half competitors: Associated Press and Agence France Presse) but I had never consciously registered the Thomson company – a Canadian Family business that went from the media (owning the London Times at one point) to publishing before transforming itself in a rather risky move into a digital information broker for all kinds of special fields (legal, health, finance, etc.). Reuters was a perfect match and I really wonder how that merger went through without too much hassle from the different regulatory bodies. Even more so when I found out that Reuters actually had devised a very spicy regulatory clause when it made its IPO in 1984: to avoid control over such a central source of information, no  single shareholder would be allowed to hold more than 15% of the companies stocks. Apparently, that clause was enacted at least once when Murdoch’s News Corporation (already holding 15%) bought a competitor that also owned a piece of Reuters and consequently had to shed stock to stay below the threshold. The merger effectively brought the new Reuters Thomson under full control (53%) of The Woodbridge Company, a private holding that represents the Thomson family.

Such control over a news agency (and the many more specialized services that are part of the giant’s portfolio) should give us pause in the best of times when media companies are swimming in resources, are able to pay good money for good journalism, and keep their own network of correspondents. But recent years have seen nothing but cost cutting in journalism, which has led to an even greater reliance on news agencies. I wager that Google News would work a lot less well if people actually started to write their own copy instead of remodeling Reuters’ and AP send outs.

But despite these rather traditional – but nonetheless crucial – concerns over media ownership and control, there is a second point that is somewhat closer to my area of expertise. I have recently been thinking a lot about how to best phrase criticism of the assumption that digital networks necessarily lead to decentralization. Thomson Reuters – but also other information giants such as Google and Facebook – is a great example for how digital technologies can lead to quite impressive cost reductions for economies of scale and, consequently, market concentration. These arguments should be taken into account:

  • While the barriers of entry to the Internet are really low (you can have your own blog in minutes), scaling up to millions of visitors is a real challenge. Building your own datacenter is a real bump in the learning curve and to get over it, you need  to make certain investments. But once you pass that bump, scaling suddenly becomes cheaper again because you have the knowledge ressources and experience that can now be applied to make the datacenter grow. One of Google’s strengths lies in this area and this immensely facilitates branching out into new information ventures. The same goes for Thomson Reuters: they master platform technology and distribution technologies for all kinds of contents and they can build on that mastery to add new things to serve information to a globalized planet. To use the language of the long tail: there may be more special interest information that can find an audience with shelve space becoming effectively unlimited; but there is also no longer a need for more than one shelve.
  • The same goes for a more elusive matter: the mastery of information. The database techniques and indexing tools we use to store information – as well as the search and data-mining algorithms – can be very easily transported from one domain to the next. While it may be (very) difficult to create useful search tools for medical information, once you have built them it is rather easy to adapt these tools to, let’s say the legal domain. Again, this is what makes Google strong: basic search technology can be applied to advertising, books, mail, product prices, and even video if you can do automatic transcription. With the acquisition of ClearForest, Thomson Reuters has class-leading in-house data-mining and this is not something you can get by simply posting a couple of job ads in the local newspaper. Data-mining is extremely useful in areas where fast decision-making is crucial but also when it comes to building powerful search tools. Again, these techniques can be applied to any number of fields and once you have the basics right you can just add new domains with very little cost.

These two points go a far way in explaining why the Internet has seen the lightning fast emergence of network giants over the last couple of years. I really don’t want to postulate yet another “law” of the Net but I believe that there is something to this idea of the bump: it’s easy to have a basic presence on the Web but it’s hard to scale up to a large audience and to use advanced computational techniques; but one you pass the bump, the economies of scale kick in and from there it seems like there are no barriers to growth. The Thomsons have certainly made that bet when they acquired Reuters and so far, it seems to work out quite nicely for them.

I hope we can find a means to extend critique from questions of ownership into the heart of the (informational) beast and come up with better ways to understand how the still ongoing shift to exclusively digital information affords new means of handling and exploiting that information – with organizational, economic, and political consequences. While that work is starting to take shape for consumer companies like Google that are in the spotlight, there is surprisingly little on invisible network giants like Thomson Reuters that cater mostly to professional clients.

Some debates are just so much older than our short forgetful minds allow us to recognize. In 1965 Jacques Barzun (still alive today at a biblical 102!) made the following statement:

What have the humanities been doing for thirty-five years except to do exactly what a computer would do, only with their own unaided card indexes and fountain pens? They have taken apart poetry, they have taken apart novels, they have counted images, they have followed symbols that are sometimes non-existent, they have destroyed their own subject matter by a pseudo-computer-like approach, and now they have only themselves to blame if they have to learn the tricks and the jargon of computerizing. (Jacques Barzun at a conference at Yale University, cited in. Taviss (ed.), The Computer Impact, 1970, p.199)

While I have not found the original document of Barzun’s talk, Bowler (ed.), Computers in Humanistic Research, 1967, p.232 has a summary of his three main points of critique:

First is the assumption of a false relation between the units defined and written and the reality they are supposed to represent. For example, 20 years ago, someone attempted to study genius by selecting names from Who’s Who in America, as being indicative of the quality of genius. Second is the fallacy of assessing importance by weight or numbers. The speaker mentioned a published census, again some 20 years ago, which indicated that the number of brownstone or frame houses in New York was much larger than the number of skyscrapers, giving the erroneous impression that the former represented the city’s characteristic architectural form. The third error is the attribution of meaning based upon only a partial study of the object in question. Two conspicuous examples of the faulty attribution of meaning to partial signs are the cases of machine translation and the objective tests given to school children and the people in business.

Would it be very hard to find contemporary examples that fit these three points?

Yesterday, Microsoft announced another step in their “long-term partnership” with Facebook. The two companies have had close ties since Microsoft invested a hefty sum in Facebook in 2007 and the former has managed advertisement on the latter’s site for quite a while. The “next step” will basically add a “social layer” to Bing search results (go to Ars Technica for a writeup or All Things Digital for a liveblog of the PR event) and this is actually a pretty big thing. Google has certainly taken contextual information into account when deciding which results to show and how to rank them: physical location, search history, and gmail contacts have been part of that process for a while, but the effects have been rather subtle.

Bing’s new features basically use the same technical layer as the Facebook boxes that popped up all over the Web about half a year ago (most modern browsers have plug-ins that allow you to block those by the way). If Bing detects the Facebook cookie while you’re on their site and adds a couple of features that allow you to interact with “friends” more easily. There are some basic convenience features but it is the “liked results” that are the most remarkable: results will use your contact’s “likes” to rank results. While we will have to wait to see how these features will pan out, social search may look something like this:

Bing social search interface

In this example, the first result is the announcement of a news article on the release of the DVD version of Iron Man 2 and this would be hardly a top-ranked result without the social layer. If Bing continues to make inroads on Google, the “like” button may take on additional importance for driving traffic and marketeers will most certainly device new ways to get people to “like” stuff – e.g. “press the button and win a free t-shirt”.

Cas Sunstein’s arguments on the dangers of echo chambers – “incestuous amplification” in social groups – will certainly be taken up again, and perhaps rightfully so: while the Internet remains a beautifully heterogeneous mess, the algorithmically sustained support for the logic of homophily (“birds of a feather…”) that can be observed in more and more places on the Web merits critical examination. While Diana Mutz’s work makes the inconvenient argument that “hearing the other side” of political debate may actually lead to less political engagement, our representative systems of democratic governance require a certain willingness to accept different political viewpoints (that always float on less clearly delineated cultural sensibilities) as sincere and legitimate. Also, adding a “friend” dimension to yet another dimension of the Web could be seen as a further reduction of the “publicness” that, according to Michael Schudson, caracterizes working democratic discourse. Being able to dissociate ourselves from our private entanglements and take into account the interests of those who do not ressemble us is perhaps the central prerequisite to successfully navigating a smaller planet.

Bing’s new features are certainly not the end of life as we know it but I believe that the privacy question – as important as it is – is covering a series of more difficult problems that sit at the heart of political life in the age of the Internet…

…is so much easier if you’ve got a couple of popular pages to advertise on…

chrome_suggest_march_2010

…and another one…

chrome_suggest_april_2010.JPG

…browser wars all over again…

My colleague Theo Röhle and  I went to the Computational Turn conference this week. While I would have preferred to hear a bit more on truly digital research methodology (in the fully scientific sense of the word “method”), the day was really quite interesting and the weather unexpectedly gorgeous. Most of the papers are available on the conference site, make sure to have a look. The text I wrote with Theo tried to structure some of the epistemological challenges and problems to take into account when working with digital methods. Here’s a tidbit:

…digital technology is set to change the way scholars work with their material, how they “see” it and interact with it. The question is, now, how well the humanities are prepared for these transformations. If there truly is a paradigm shift on the horizon, we will have to dig deeper into the methodological assumptions that are folded into the new tools. We will need to uncover the concepts and models that have carried over from different disciplines into the programs we employ today…

This spring worked on an R&D project that was really quite interesting but – as it happens with projects – took up nearly all of my spare time. La montre verte is based on the idea that pollution measurement can be brought down to street level if sensors can be made small enough to be carried around by citizens. Together with a series of partners from the private sector, the CiTu group of my laboratory came up with the idea to put an ozone sensor and a microphone (to measure noise levels) into a watch. That way, the device is not very intrusive and still in direct contact with the surrounding air. We built about 15 prototypes, based on the fact that currently, Paris’ air quality is measured by only a handful of (really high quality) sensors and even the low resolution devices we have in our watches should therefore be able to complement that data with a geographically more fine grained analysis of noise and pollution levels. The watch produces a georeferenced  measurement (a GPS is built into the watch) every second and transmits the data via Bluetooth to a Java application on a portable phone, which then sends every data packet via GPRS to a database server.

heatmapMy job in the project was to build a Web application that allows people to interact with and make sense of the data produced by the watches. Despite the help from several brilliant students from our professional Masters program, this proved to be a daunting task and I spent *at lot* of time programming. The result is quite OK I believe; the application allows users to explore the data (which is organized in localized “experiments”) in different ways, either in real-time or afterward. With a little more time (we had only about three month for the whole project and we got the hardware only days before the first public showcase) we could have done more but I’m still quite content with the result. Especially the heatmap (see image) algorithm was fun to program, I’ve never done a lot of visual stuff so this was new territory and a steep learning curve.

Unfortunately, the strong emphasis on the technological side and the various problems we had (the agile methods one needs for experimental projects are still not understood by many companies) cut down the time for reflection to a minimum and did not allow us to come up with a deeper analysis of the social and political dimensions of what could be called “distributed urban intelligence”. The whole project is embedded in a somewhat naive rhetoric of citizen participation and the idea that technological innovation can solve social problems, in this case matters of urban planning and local governance. A lesson I have learned from this is that the current emphasis in funding on short-term projects that bring together universities and the industry makes it very difficult to carve out an actual space for scientific practice between all the deadlines and the heavy technical demands. And by scientific practice, I mean a *critical* practice that does not only try to base specifications and prototyping on “scientifically valid” approaches to building tools and objects but which includes a reflection on social utility that takes a wider view than just immediate usefulness. In the context of this project, this would have implied a close look at how urban development is currently configured in respect to environmental concerns in order to identify structures of governance and chains of decision-making. This way, the whole project could have targeted issues more clearly and consciously, fine-tuning both the tools and the accompanying discourse to the social dimension it aimed at.

I think my point is that we (at least I) have to learn how to better include a humanities-based research agenda into very high-tech projects. We have known for a long time now that every technical project is in fact a socio-technical enterprise but research funding and the project proposals that it generates are still pretending that the “socio-” part is some fluffy coating that decorates the manly material core where cogs and wire produce tangible effects. As I programmer I know how difficult and time-consuming technical work can be but if there is to be a conscious socio-technical perspective in R&D we have to accept that the fluffy stuff takes even more time – if it is done right. And to do it right means not only reading every book and paper relevant to a subject matter but to take the time to reflect on methodology, to evaluate every step critically, to go back to the drawing board, and to include and to produce theory every step of the way. There is a cost to the scientific method and if that cost is not figured in, the result may still be useful, interesting, thought-provoking, etc. but it will not be truly scientific. I believe that we should defend these costs and show why they are necessary; if we cannot do so, we risk confining the humanities to liberal armchair commentary and the social sciences to ex-post usage analysis.

You’ve probably already read it somewhere (like here or here), amazon.com has blundered a little bit – for a couple of hours the search query “terrorist costume” brought up a single hit, a rubber mask with Obama’s face. I really don’t know how many people would have found out on their own but there’s some buzz going now and there actually is something worth pondering about the case. How it happened is quite easy to reconstruct: amazon allows users to label products (Folksonomy) and includes these tags into their general search engine. So somebody tagged the Obama mask with “terrorist” (“costume” was already a common keyword) and there you go. What I find interesting about this is not that there would be any real political consequence to this matter but the fact that folk-tagging can be as easily dragged into different directions as anything else. I’m currently working on a talk for the Deep Search conference (running late as so often these days) and I’ve been looking at Jimmy Wales’ project Wikia Search which uses community feedback in order to re-rank results. The question for me is how this system would be less pervasive to manipulation or SEO than today’s dominant principle, link analysis. The amazon case shows quite well that when you enter a contested field, there’s going to be fallout and the reason that there isn’t more of it already is probably because the masses are not yet aware of the mischief potential. And I don’t see how the “wisdom of the crowd” principle (whether that is folksonomy, voting, result re-ranking, etc.) cannot be hijacked by a determined individual or company that understands the workings of the algorithms that structure results (in the amazon case you would have needed to know that user tags are used in the general search). So what is really interesting about the Obama mask incident is how things continue at amazon (and other folksonomy based servives) – if user tags can be used to drive traffic to specific products, the marketeers will come in droves the moment the numbers are relevant…

A couple of days ago, Marissa Mayer, VP of  “Search Products & User Experience” over at Google posted a piece on “the future of search” and her conclusion is this:

So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best.

It’s from Google’s official blog so everybody and the Denver Broncos (keyword used solely to scramble the document vector of this post) has already commented on it but here’s my 50 centimes.

The first thing that strikes me about Mayer’s definition of the ideal search engine is the “your best friend” thing. Why would I want to be friends with a search engine? This goes very much in the direction of “don’t be evil”, Google’s famous corporate motto, which is, in my view, based on the (erroneous) believe that questions of power can be reduced to questions of morals. “Your best friend” could mean that the search engine will know a lot about you but it will not tell your boss that you search for pr0n on a daily basis. If you live in China it might tell the authorities where you’re at but a friend would too, given the right incentive. The idea is that you can confide in your best friend, spill your dirty little secrets without having to fear that they will pop up somewhere on the blogosphere. So there’s the privacy issue and Mayer is suggesting that you can trust Google with the growing pool of data you leave in their (floating!) datacenters.

The second matter is more subtle and kind of revitalizes all the critique that has been written concerning Nicholas Negroponte’s idea of the “daily me”, most notably the concept of the “echo chamber” which holds that personalization results in people getting exposed only to the views that they already agree with. I am not sure whether such a situation is imminent, in fact, I agree with much of what David Weinberger says in this article, but given the fact that search has become such a pervasive practice, one cannot easily dismiss it. My real problem though is that personalization has become the dominant direction of search engine evolution when there are so many different paths to go down. Mayer actually talks about one:

Yet our presentation is still very linear (the results are just a list) and even (no one result is more important or larger than the next). What if the results page began to transform radically to really harness these different types of results into something that felt much more like an answer rather than just 10 independent guesses?

I find the idea of making the results page smarter very intriguing but not the conclusion of making it more “like an answer”. Why not add semantic clustering along the lines of Clusty, why not add the possibility to easily weight search terms or to better interact with the search results? I find the idea of rendering everything always more convenient and less of an effort quite troubling indeed. Why is there no button to the really useful cheat sheet on the main page? Has the idea of educating users become so completely unthinkable? I’d prefer to have more control over ranking and better means to refine my search and organize my results than a new best friend. Google has all the ingredients for delivering potentially great semantic mapping that would not give definite answers but a better overview of the heterogeneity of search results. Unfortunately, the idea of personalization seems to completely overshadow the more enlightened concept of augmentation.

Continuing in the direction of exploring statistics as an instrument of power more characteristic of contemporary society than means of surveillance centered on individuals, I found a quite beautiful citation by French sociologist Gabriel Tarde in his Les Lois de l’imitation (1890/2001, p.192f):

Si la statistique continue à faire des progrès qu’elle a faits depuis plusieurs années, si les informations qu’elle nous fournit vont se perfectionnant, s’accélérant, se régularisant, se multipliant toujours, il pourra venir un moment où, de chaque fait social en train de s’accomplir, il s’échappera pour ainsi dire automatiquement un chiffre, lequel ira immédiatement prendre son rang sur les registres de la statistique continuellement communiquée au public et répandue en dessins par la presse quotidienne.

And here’s my translation (that’s service, folks):

If statistics continues to make the progress it has made for several years now, if the information it provides us with continues to become more perfect, faster, more regular, steadily multiplying, there might come the moment where from every social fact taking place springs – so to speak – automatically a number that would immediately take its place in the registers of the statistics continuously communicated to the public and distributed in graphic form by the daily press.

When Tarde wrote this in 1890, he saw the progress of statistics as a boon that would allow a more rational governance and give society the means to discuss itself in a more informed, empirical fashion. Nowadays, online, a number springs from every social fact indeed but the resulting statistics are rarely a public good that enters public debate. User data on social networks will probably prove to be the very foundation of any business that is to be made with these platforms and will therefore stay jealously guarded. The digital town squares are quite private after all…