This preprint of a paper I have written about a year and a half ago, entitled Institutionalizing without Institutions? Web 2.0 and the Conundrum of Democracy, is the direct result of what I experienced as a major cultural destabilization. Born in Austria, living in France (and soon the Netherlands), and working in a field that has a strong connection with American culture and scholarship, I had the feeling that debates about the political potential of the Internet were strongly structured along national lines. I called this moral preprocessing.

This paper, which will appear in an anthology on Internet governance later this year, is my attempt to argue that it is not only technology which poses serious challenges, but rather the elusive and difficult concept of democracy. My impression was – and still is – that the latter term is too often used too easily and without enough attention paid to the fundamental contradictions and tensions that characterize this concept.

Instead of asking whether or not the Internet is a force of democratization, I wanted to show that this term, democratization, is complicated, puzzling, and full of conflict: a conundrum.

Published as: B. Rieder (2012). Institutionalizing without institutions? Web 2.0 and the conundrum of democracy. In F. Massit-Folléa, C. Méadel & L. Monnoyer-Smith (Eds.), Normative experience in internet politics (Collection Sciences sociales) (pp. 157-186). Paris: Transvalor-Presses des Mines.

Over the last couple of weeks, things have heated up considerably for Google – on the mobile side with the start of a patent war, but also in the search area, the core of the company’s business. Led by Senator Mike Lee (a Utah Republican), the US Senate’s Antitrust Subcommittee has started to probe into certain aspects of Google’s ranking mechanisms and potential cases of abuse and manipulation.

In a hearing on Wednesday, Lee confronted Eric Schmidt with accusations of tampering with results and the evidence the Senator presented was in fact very interesting because it raises the question of how to show or even prove that a highly complex algorithmic procedure “has been tampered with”. As you can see in this video, a scatter-plot from an “independent study” that compares the search ranking for three price comparison sites (Nextag, Pricegrabber, and Shopper) with Google Price Search using 650 shopping related queries. What we can see on the graph is that while there is considerable variation in ranking for the competitors (a site shows up first for one query and way down for another), Google’s site seems to consistently stick to place three. Lee makes this astounding difference the core of his argument and directly asks Schmidt: “These results are in fact the result of the same algorithm as the rankings for the other comparison sites?” The answer is interesting in itself as Schmidt argues that Google’s service is not a product comparison site but a “product site” and that the study basically compares apples to oranges (“they are different animals”). Lee then homes in on the “uncanny” statistical regularity and says “I don’t know whether you call this a separate algorithm or whether you’re reverse engineered a single algorithm, but either way, you’ve cooked it!” to which Schmidt replies “I can assure you that we haven’t cooked anything.”

According to this LA Times article, Schmidt’s testimony did not satisfy the senators and there’s open talk about bias and conflict of interest. I would like to add to add three things here:

1) The debate shows a real mismatch between 20th century concepts of both bias and technology and the 21st century challenge to both of these question that comes in the form of Google. For the senator, bias is something very blatant and obvious, a malicious individual going to the server room at night, tempering with the machinery, transforming the pure technological objectivity into travesty by inserting a line of code that puts Google to third place most of the time. The problem with this view is of course that it makes a clear and strong distinction between a “biased” and an “unbiased” algorithm and clearly misses the point that every ranking procedure implies a bias. If Schmidt says “We haven’t cooked anything!”, who has written the algorithm? If it comes to an audit of Google’s code, I am certain that no “smoking gun” in the form of a primitive and obvious “manipulation” will be found. If Google wants to favor its own services, there are much more subtle and efficient ways to do so – the company does have the best SEO team one could possibly imagine after all. There is simply no need to “cook” anything if you are the one who specifies the features of the algorithm.

2) The research method applied in the mentioned study however is really quite interesting and I am curious to see how far the Senate committee will be able to take the argument. The statistical regularity shown is certainly astounding and if the hearings attain a deeper level of technological expertise, Google may be forced to detail a significant portion of its ranking procedures to show how something like this can happen. It would, of course, be extremely simple to break the pattern by introducing some random element that does not affect the average rank but adds variation. That’s also the reason why I think that Lee’s argument will ultimately fizzle.

3) The core of the problem, I would argue, is not so much the question of manipulation but the fact that by branching into more and more commercial areas, Google finds itself in a market configuration where conflicts of interest are popping up everywhere they turn. As both a search business and an actor on many of the markets that are, at least in part, ordered by the visibility layering in search results, there is a fundamental and structural problem that cannot be solved by any kind of imagined technical neutrality. Even if there is no “in house SEO” going on, the mere fact that Google search prominently links to other company services could already be seen as problematic. In a sense, Senator Lee’s argument actually creates a potentially useful “way out”: if there is no evil line of code written in the dark of night, no “smoking gun”, then everything is fine. The systematic conflict of interest persists however, and I do not believe that more subtle forms of bias towards Google services could be proven or even be seriously debated in a court of law. This level of technicality, I would argue, is no longer (fully) in reach for this kind of causal demonstration. Not so much because of the complexity of the algorithms, but rather because the “state” of the machine includes the full structure of the dataset it is working on, which means the full index in this case. To understand what Google’s algorithms actually do, looking at these algorithms without the data is no longer enough. And the data is big. Very big.

As you can see, I am quite pessimistic about the possibility to bring the kind of argumentation presented by Senator Lee to a real conclusion. If the case against Microsoft is an indicator, I would argue that this pessimism is warranted.

I do believe that we need to concentrate much more on the principal conflicts of interest rather than actual cases of abuse that may be simply too difficult to prove. The fundamental question is really how far a search company that controls such a large portion of the global market should be allowed to be active in other markets. And, really, should a single company control the search market in the first place? Limiting the very potential for abuse is, in my view, the road that legislators and regulators should take, rather than picking a fight over technological issues that they simply cannot win in the long run.

EDIT: Google has compiled its own Guide to the Hearing. Interesting.

German publisher Heise Verlag is an international curiosity. It publishes a small number of highly influential computer-related magazines that give a voice to a tech ethos that is at the same time extremely competent in the subject matter (I’ve been a steady subscriber to c’t magazin for over 15 years now, and I am still baffled sometimes just how good it is) and very much aware of the social and political implications of computing (their online magazine Telepolis testifies to that).

Data protection and privacy are long-standing concerns of the heise editors and true to a spirit of society-oriented design, they have introduced a concept as well as a technical implementation of a two-step “like” button. Such buttons, by Facebook or other companies, have of course become a major vector of user-tracking on the Web. By using an iframe, every button loads some code from Facebook’s server and sends the referring url (e.g. http://nytimes.com/articlename/blabla) as an information. The iframe being hosted on the facebook.com domain, cross-site privacy protections can be circumvented, the url information connected to an identifier cookie and, consequently, to a user account. Plugins like the Priv3 project block these mechanisms but a) users have to have a heightened level of awareness to even consider installing something like this and b) the plugin interferes with convenient functions like Google search preferences.

Heise’s suggestion, which they already implemented on their own sites, is simple: websites can download a small bit of code that implements a two-step procedure: the “like” button is greyed out after the page first loads and there is no tracking happening. A first click on the button loads the “real” Facebook code, and the second click provides the usual functionality. The solution is very simple to implement and really a very minor inconvenience. Independently from the debate whether “like” buttons and such add any real value to the Web, this example shows that “social” features like these can be designed in a way that does not necessarily lead to pervasive user tracking.

The echo to this initiative has been very strong (check the Slashdot discussion here), especially in Germany, where privacy (or rather Datenschutz, a concept less centered on the individual but rather on the role of data in society) is an intensely debated issue, due to obvious historical reasons. Facebook apparently threatened to blacklist heise.de at a point, but has since then backpedaled. After all, c’t magazin prints around 600.000 issues of every number and is extremely influential in the German (and Dutch!) computer landscape. I am very curious to see how this story unfolds, because let’s be clear: Facebook’s earning potential is closely tied to its capacity to capture, enrich, and analyze user data.

This initiative – and the Heise ethos in general – underscores that a “respectable” and sober engineering culture does not exclude an explicit normative stance on social and political issues. And is shows that this stance can be translated into technical models, implemented, and shared, both as an idea and as code.

While riding my bike today, I listened to a very thought-provoking and enjoyable talk (LSE site / YouTube) given back in may at the LSE by Harvard law professor Gerald Frug, entitled  “The Architecture of Governance”. The argument basically revolves around the actual “design” or “architecture” of governance/government structures and, more precisely, the complicated relationship between local and central governments. While this is not a talk about technology, there is much to learn concerning how to think about the design of (political) systems – mechanisms for organizing collective decision-making – beyond the petty moralizing and finger-pointing that seems to have taken hold of large parts of public debate today in much of the Western world. What I find quite intriguing is that Krug pays so much attention to the particularities of how seemingly consensual ideas (“power to the local”) can be implemented with rather different potential outcomes. In that sense, “parameter details” and fine-print may have a much larger impact than one might think and it’s worth-while to talk about them and not just the grand questions of “participation” vs. “representation”, and so on. Good fun!

The entertaining platform for technology and design, TED, posted a talk by Area/Code chairman and co-Founder Kevin Slavin, entitled “How algorithms shape our world”:

There are a couple of interesting examples and ideas in there and the analogy between finance algorithms and the larger processing of “culture” is well argued. A fun 15 minutes – there’s even explosions in there!

While scholars often underline their commitment to non-deterministic conceptions of “effects”, models of causality in the human and social sciences can still be a bit simplistic sometimes. But a more subtle approach to causality would have to concede that, while most often cumulative and contradictory, lines of causation can sometimes be quite straightforward. Just consider this example from Commensuration as a Social Process, a great text from 1998 by Espeland and Stevens:

Faculty at a well-regarded liberal arts college recently received unexpected, generous raises. Some, concerned over the disparity between their comfortable salaries and those of the college’s arguably underpaid staff, offered to share their raises with staff members. Their offers were rejected by administrators, who explained that their raises were ‘not about them.’ Faculty salaries are one criterion magazines use to rank colleges. (p.313)

This is a rather direct effect of ranking techniques on something very tangible, namely salary. But the relative straightforwardness of the example also highlights a bifurcation of effects: faculty gets paid more, staff less. The specific construction of the ranking mechanism in question therefore produces social segmentation. Or does it simply reinforce the existing segmentation between faculty and staff that lead college evaluators to construct the indicators the way they did in the first place? Well, there goes the simplicity…

Simondon’s Du mode d’existence des objets techniques from 1958 is a most wondrous book. It is not only Simondon’s theory of technology in itself that fascinates me, but rather the intimate closeness with particular technical objects that resonates  through the whole text and marks a fundamental break with the greek heritage of thinking about technology as a unified and coherent force. When Simondon reasons over numerous pages on the difference between a diode and a triode, he accords significance to something that was considered insignificant by virtually every philosopher in history. By conferring a sense of dignity to technology, a certain profoundness, he is able to see heterogeneity and particularity where others before him just saw the declinations of the singular principle of techné. In a distinctly beautiful passage, Simondon argues that “technological thinking” itself is not totalizing but fragmenting:

“L’élément, dans la pensée technique, est plus stable, mieux connu, et en quelque manière plus parfait que l’ensemble ; il est réellement un objet, alors que l’ensemble reste toujours dans une certaine mesure inhérent au monde. La pensée religieuse trouve l’équilibre inverse : pour elle, c’est la totalité qui est plus stable, plus forte, plus valable que l’élément.” (Simondon 1958, p. 175)

And my translation:

“In technological thinking, it is the element that is more stable, better known and – in a certain sense – more perfect than the whole; it is truly an object, whereas the whole always stays inherent to the world to a certain extend. Religious thinking finds the opposite balance: here, it is the whole that is more stable, stronger, and more valid than the element.”

Philosophical thinking, according to Simondon, should strive to situate itself in the interval that separates the two approaches, technological thinking and religious thinking, concept and idea, plurality and totality, a posteriori and a priori. Here, the question of How? is not subordinate to the question of Why? because it is the former that connects us to the world that we inhabit as physical beings. Understanding technology means understanding how the two levels relate and constitute a world. There are two forms of ethics and two forms of knowledge that must be combined both intellectually and practically. Simondon obviously strives to do just that. I would argue that Philip Agre’s concept of critical technical practice is another attempt at pretty much the same challenge.

When it comes to analyzing and visualizing data as a graph, we most often select only one unit to represent nodes. When working with social networks, nodes commonly represent user accounts. In a recent post, I used Twitter hashtags instead and established links by looking at which hashtags occurred in the same tweets. But it is very much possible to use different “ontological” units in the same graph. Consider this example from the IPRI project (a click gives you the full map, a 14MB png file):

Here, I decided to mix Twitter user accounts with hashtags. To keep things manageable, I took only the accounts we identified as journalists that posted at least 300 tweets between February 15 and April 15 from the 25K accounts we follow. For every one of those accounts, I queried our database for the 10 hashtags most often tweeted by the user. I then filtered the graph to show only hashtags used by at least two users. I was finally left with 512 user accounts (the turquoise nodes, size is number of tweets) and 535 hashtags (the red nodes, size is frequency of use). Link strength represents the frequency with which a user tweeted a hashtag. What we get, is still a thematic map (libya, the regional elections, and japan being the main topics), but this time, we also see, which users were most strongly attached to these topics.

Mapping heterogeneous units opens up many new ways to explore data. The next step I will try to work out is using mentions and retweets to identify not only the level of interest that certain accounts accord to certain topics (which you can see in the map above), but the level of echo that an account produces in relation to a certain topic. We’ll see how that goes.

In completely unrelated news, I read an interesting piece by Rocky Agrawal on why he blocked tech blogger Robert Scoble from his Google+ account. At the very end, he mentions a little experiment that delicious.com founder Joshua Schachter did a couple of days ago: he asked his 14K followers on Twitter and 1.5K followers on Google+ to respond to a post, getting 30 answers the former and 42 from the latter. Sitting on still largely unexplored bit.ly click data for millions of urls posted on Twitter, I can only confirm that Twitter impact may be overstated by an order of magnitude…

In the beginning, it was all about the algorithm. PageRank and its “no humans involved” mantra dominated Google since its inception. In recent years however, Google has started to expand the role of “conceptual” knowledge in different areas of its services. The main search bar and its capacity to do all kinds of little tricks is a good example, but I was really quite astounded how seamless concept integration has become on my last trip to Google Translate:

There are many different ways of making sense of large datasets. Using network visualization is one of them. But what is a network? Or rather, which aspects of a dataset do we want to explore as a network? Even social services like Twitter can be graphed in many different ways. Friend/follower connections are an obvious choice, but retweets and mentions can be used as well to establish links between accounts. Hashtag similarity (two users who share a tag are connected, the more they share, the closer) is yet another method. In fact, when we shift from interactions to co-occurrences, many different things become possible. Instead of mapping user accounts, we can, for example, map hashtags: two tags are connected if they appear in the same tweet and the number of co-occurrences defines link strength (or “edge weight”). The Mapping Online Publics project has more ideas on this question, including mapping over time.

In the context of the IPRI research project we have been following 25K Twitter accounts from the French twittersphere. Here is a map (size: occurrence count / color: degree / layout: gephi with OpenOrd) of the hashtag co-occurrences for the 10.000 hashtags used most often between February 15 2011 and April 15 2011 (clicking on the image gets you the full map, 5MB):

The main topics over this period were the regional elections (“cantonales”) and the Arab spring, particularly the events in Libya. The japan earthquake is also very prominent. But you’ll also find smaller events leaving their traces, e.g. star designer Galliano’s antisemitic remarks in a Paris restaurant. Large parts of the map show ongoing topics, cinema, sports, general geekery, and so forth. While not exhaustive, this map is currently helping us to understand which topics are actually “inside” our dataset. This is exploratory data analysis at work: rather than confirming a hypothesis, maps like this can help us get a general understanding of what we’re dealing with and then formulate more precise questions from there.