Category Archives: critique
Yesterday, Google introduced a new feature, which represents a substantial extension to how their search engine presents information and marks a significant departure from some of the principles that have underpinned their conceptual and technological approach since 1998. The “knowledge graph” basically adds a layer to the search engine that is based on formal knowledge modelling rather than word statistics (relevance measures) and link analysis (authority measures). As the title of the post on Google’s search blog aptly points out, the new features work by searching “things not strings”, because what they call the knowledge graph is simply a – very large – ontology, a formal description of objects in the world. Unfortunately, the roll-out is progressive and I have not yet been able to access the new features, but the descriptions, pictures, and video paint a rather clear picture of what product manager Johanna Wright calls the move “from an information engine to a knowledge engine”. In terms of the DIKW model (Data-Information-Knowledge-Wisdom), the new feature proposes to move up a layer by adding a box of factual information on a recognized object (the examples Google uses are the Taj Mahal, Marie Curie, Matt Groening, etc.) next to the search results. From the presentation, we can gather that the 500 million objects already referenced will include a large variety of things, such as movies, events, organizations, ideas, and so on.
This is really a very significant extension to the current logic and although we’ll need more time to try things out and get a better understanding of what this actually means, there are a couple of things that we can already single out:
- On a feature level, the fact box brings Google closer to “knowledge engines” such as Wolfram Alpha and as we learn from the explanatory video, this explicitly includes semantic or computational queries, such as “how many women won the Nobel Prize?” type of questions.
- If we consider Wikipedia to be a similar “description layer”, the fact box can also be seen as a competitor to everybody’s favorite encyclopedia, which is a further step into the direction of bringing information directly to the surface of the results page instead of simply referring to a location. This means that users do not have to leave the Google garden to find a quick answer. It will be interesting to see whether this will actually show up in Wikipedia traffic stats.
- The introduction of an ontology layer is a significant departure from the largely statistical and graph theoretical methods favored by Google in the past. While features based on knowledge modelling have proliferated around the margins (e.g. in Google Maps and Local Search), the company is now bringing them to the center stage. From what I understand, the selection of “facts” to display will be largely driven by user statistics but the facts themselves come from places like Freebase, which Google bought in 2010. While large scale ontologies were prohibitive in the past, a combination of the availability of crowd-sourced databases (Wikipedia, etc.), the open data movement, better knowledge extraction mechanisms, and simply the resources to hire people to do manual repairs has apparently made them a viable option for a company of Google’s size.
- Competing with the dominant search engine has just become a lot harder (again). If users like the new feature, the threshold for market entry moves up because this is not a trivial technical gimmick that can be easily replicated.
- The knowledge graph will most certainly spread out into many other services (it’s already implemented in the new Google Docs research bar), further boosting the company’s economies of scale and enhancing cross-navigation between the different services.
- If the fact box – and the features that may follow – becomes a pervasive and popular feature, Google’s participation in making information and knowledge accessible, in defining its shape, scope, and relevance, will be further extended. This is a reason to worry a bit more, not because the Google tools as such are a danger, but simply because of the levels of institutional and economic concentration the Internet has enabled. The company has become what Michel Callon calls an “obligatory passage point” in our relation to the Web and beyond; the knowledge graph has the potential to exacerbate the situation even further.
This is a development that looks like another element in the war for dominance on the Web that is currently fought at a frenetic pace. Since the introduction of actions into Facebook’s social graph, it has become clear that approaches based on ontologies and concept modelling will play an increasing role in this. In a world mediated by screens, the technological control of meaning – the one true metamedium – is the new battleground. I guess that this is not what Berners-Lee had in mind for the Semantic Web…
This preprint of a paper I have written about a year and a half ago, entitled Institutionalizing without Institutions? Web 2.0 and the Conundrum of Democracy, is the direct result of what I experienced as a major cultural destabilization. Born in Austria, living in France (and soon the Netherlands), and working in a field that has a strong connection with American culture and scholarship, I had the feeling that debates about the political potential of the Internet were strongly structured along national lines. I called this moral preprocessing.
This paper, which will appear in an anthology on Internet governance later this year, is my attempt to argue that it is not only technology which poses serious challenges, but rather the elusive and difficult concept of democracy. My impression was – and still is – that the latter term is too often used too easily and without enough attention paid to the fundamental contradictions and tensions that characterize this concept.
Instead of asking whether or not the Internet is a force of democratization, I wanted to show that this term, democratization, is complicated, puzzling, and full of conflict: a conundrum.
Published as: B. Rieder (2012). Institutionalizing without institutions? Web 2.0 and the conundrum of democracy. In F. Massit-Folléa, C. Méadel & L. Monnoyer-Smith (Eds.), Normative experience in internet politics (Collection Sciences sociales) (pp. 157-186). Paris: Transvalor-Presses des Mines.
Over the last couple of weeks, things have heated up considerably for Google – on the mobile side with the start of a patent war, but also in the search area, the core of the company’s business. Led by Senator Mike Lee (a Utah Republican), the US Senate’s Antitrust Subcommittee has started to probe into certain aspects of Google’s ranking mechanisms and potential cases of abuse and manipulation.
In a hearing on Wednesday, Lee confronted Eric Schmidt with accusations of tampering with results and the evidence the Senator presented was in fact very interesting because it raises the question of how to show or even prove that a highly complex algorithmic procedure “has been tampered with”. As you can see in this video, a scatter-plot from an “independent study” that compares the search ranking for three price comparison sites (Nextag, Pricegrabber, and Shopper) with Google Price Search using 650 shopping related queries. What we can see on the graph is that while there is considerable variation in ranking for the competitors (a site shows up first for one query and way down for another), Google’s site seems to consistently stick to place three. Lee makes this astounding difference the core of his argument and directly asks Schmidt: “These results are in fact the result of the same algorithm as the rankings for the other comparison sites?” The answer is interesting in itself as Schmidt argues that Google’s service is not a product comparison site but a “product site” and that the study basically compares apples to oranges (“they are different animals”). Lee then homes in on the “uncanny” statistical regularity and says “I don’t know whether you call this a separate algorithm or whether you’re reverse engineered a single algorithm, but either way, you’ve cooked it!” to which Schmidt replies “I can assure you that we haven’t cooked anything.”
According to this LA Times article, Schmidt’s testimony did not satisfy the senators and there’s open talk about bias and conflict of interest. I would like to add to add three things here:
1) The debate shows a real mismatch between 20th century concepts of both bias and technology and the 21st century challenge to both of these question that comes in the form of Google. For the senator, bias is something very blatant and obvious, a malicious individual going to the server room at night, tempering with the machinery, transforming the pure technological objectivity into travesty by inserting a line of code that puts Google to third place most of the time. The problem with this view is of course that it makes a clear and strong distinction between a “biased” and an “unbiased” algorithm and clearly misses the point that every ranking procedure implies a bias. If Schmidt says “We haven’t cooked anything!”, who has written the algorithm? If it comes to an audit of Google’s code, I am certain that no “smoking gun” in the form of a primitive and obvious “manipulation” will be found. If Google wants to favor its own services, there are much more subtle and efficient ways to do so – the company does have the best SEO team one could possibly imagine after all. There is simply no need to “cook” anything if you are the one who specifies the features of the algorithm.
2) The research method applied in the mentioned study however is really quite interesting and I am curious to see how far the Senate committee will be able to take the argument. The statistical regularity shown is certainly astounding and if the hearings attain a deeper level of technological expertise, Google may be forced to detail a significant portion of its ranking procedures to show how something like this can happen. It would, of course, be extremely simple to break the pattern by introducing some random element that does not affect the average rank but adds variation. That’s also the reason why I think that Lee’s argument will ultimately fizzle.
3) The core of the problem, I would argue, is not so much the question of manipulation but the fact that by branching into more and more commercial areas, Google finds itself in a market configuration where conflicts of interest are popping up everywhere they turn. As both a search business and an actor on many of the markets that are, at least in part, ordered by the visibility layering in search results, there is a fundamental and structural problem that cannot be solved by any kind of imagined technical neutrality. Even if there is no “in house SEO” going on, the mere fact that Google search prominently links to other company services could already be seen as problematic. In a sense, Senator Lee’s argument actually creates a potentially useful “way out”: if there is no evil line of code written in the dark of night, no “smoking gun”, then everything is fine. The systematic conflict of interest persists however, and I do not believe that more subtle forms of bias towards Google services could be proven or even be seriously debated in a court of law. This level of technicality, I would argue, is no longer (fully) in reach for this kind of causal demonstration. Not so much because of the complexity of the algorithms, but rather because the “state” of the machine includes the full structure of the dataset it is working on, which means the full index in this case. To understand what Google’s algorithms actually do, looking at these algorithms without the data is no longer enough. And the data is big. Very big.
As you can see, I am quite pessimistic about the possibility to bring the kind of argumentation presented by Senator Lee to a real conclusion. If the case against Microsoft is an indicator, I would argue that this pessimism is warranted.
I do believe that we need to concentrate much more on the principal conflicts of interest rather than actual cases of abuse that may be simply too difficult to prove. The fundamental question is really how far a search company that controls such a large portion of the global market should be allowed to be active in other markets. And, really, should a single company control the search market in the first place? Limiting the very potential for abuse is, in my view, the road that legislators and regulators should take, rather than picking a fight over technological issues that they simply cannot win in the long run.
EDIT: Google has compiled its own Guide to the Hearing. Interesting.
While riding my bike today, I listened to a very thought-provoking and enjoyable talk (LSE site / YouTube) given back in may at the LSE by Harvard law professor Gerald Frug, entitled “The Architecture of Governance”. The argument basically revolves around the actual “design” or “architecture” of governance/government structures and, more precisely, the complicated relationship between local and central governments. While this is not a talk about technology, there is much to learn concerning how to think about the design of (political) systems – mechanisms for organizing collective decision-making – beyond the petty moralizing and finger-pointing that seems to have taken hold of large parts of public debate today in much of the Western world. What I find quite intriguing is that Krug pays so much attention to the particularities of how seemingly consensual ideas (“power to the local”) can be implemented with rather different potential outcomes. In that sense, “parameter details” and fine-print may have a much larger impact than one might think and it’s worth-while to talk about them and not just the grand questions of “participation” vs. “representation”, and so on. Good fun!
After having sparked a series of revolutions mostly on it’s own – socioeconomics is a thing of the 20th century anyways – Twitter is looking to finally make some money off that society-changing prowess. One of the steps in that direction are the new regulations for developers, or rather, the new regulations for those who want to develop a Twitter app but are no longer welcome to do so. As this Ars Technica piece describes, apps that provide similar features as Twitter applications are no longer allowed; existing programs will be allowed to linger on, but new ones will be blocked. Ars cites a mail by developer Steve Streza on the twitter-dev mailing-list, here in full:
Twitter continues to make hostile and aggressive moves to alienate the third-party developers who helped make it the platform it is now. Today it’s third party Twitter clients. Tomorrow it’ll be URL shorteners and image/video hosts. Next it’ll be analytics and ads and who knows what else. Maybe you guys should spend some time improving the core of the service (uptime, reliability, bug fixes, etc.) rather than ingressing on the work of the thousands of developers who made Twitter an exciting place to be.
The story itself is not new. APIs are a great way for a company to experiment with new features and ideas without having to take any major risks themselves. Google led the way with Google Maps, slowly adding features to its service that had been pioneered by third party developers and deemed viable by users. Legally, there is not much to do about these practices (it they want to, companies can simply close down their web services, too) and it’s quite understandable that Twitter wants to control a value chain that promises to be quite profitable in the end. But for users and developers the reliance on private companies and closed systems is a big risk indeed. I’ve been working on a research project using Twitter data for over a year and while everything seems to be OK for the moment, what if our team suddenly gets locked out? Hundreds of hours down the drain?
When using proprietary services, you should be prepared for such things to happen but when I look at the role Twitter did play in recent events in North Africa and the Middle East – it was a mayor conduit after all – and I think about that one company’s (well, there’s Facebook, too) ability to simply close the pipes, I can’t help but feel worried. While the Internet was presented as a herald of decentralization, its global span has actually allowed for a concentration and system lock-in that is quite unique in the history of communication.
I think I’m just going to stick to email after all…
The use of computers in the humanities has a long and fine history. What is striking though is how lucid scholars reflected on their tools even in the earliest days. Here’s a beautiful citation by Irwin C. Lieb from a text published in the the inaugural issue of Computers in the Humanities, a journal started in 1966.
The great advances which have so far been made with computers have been in those fields where we find countable items or have ready substitutes for them. The real or seeming extraneousness of computer studies for the humanities is owed to the fact that, in the humanities, what are most important are, if items at all, items that we can’t count, or can count only most artificially. We know, for example, how little definite we mean in saying that we have two or three ideas, that there are four themes in a play, or that there were this or that number of historical events. Our “counting” is not the counting of items that were somehow there separate, waiting to be pointed out; it is a “counting” in which judgments themselves mark out what come to be the items that we count. Apart from the judgments, there are no separate items. Therefore, no technique of counting such items so as to yield, for the first time, a judgment or a summary is possible at all. But, granting that this sort of limitation is inescapable, computers could, it seems, still come to have a more vital use in the humanities than we have seen so far.
The suggestion, then, is that some of the simplest but most important work to be done in deepening the usefulness of computers for the humanities will be in imagining those schemas by which we will model what we know cannot be modeled undistortedly: — ideas, themes, events and even more importantly, insights, appraisals, and appreciations. There are, there must be, revealing models for all of these. And as we think of them, and then use them in the humanities, the achievement for us will come as we feel out just what the distortions are, as we make the right mistakes. For as we see them as mistakes, we will penetrate further and still more appreciate what we are most concerned to understand. With the possibilities for computer studies of depth and importance in the humanities seeming still so genuine, it would be a mistake, I think, to curtail our exploration of them soon.
Some debates are just so much older than our short forgetful minds allow us to recognize. In 1965 Jacques Barzun (still alive today at a biblical 102!) made the following statement:
What have the humanities been doing for thirty-five years except to do exactly what a computer would do, only with their own unaided card indexes and fountain pens? They have taken apart poetry, they have taken apart novels, they have counted images, they have followed symbols that are sometimes non-existent, they have destroyed their own subject matter by a pseudo-computer-like approach, and now they have only themselves to blame if they have to learn the tricks and the jargon of computerizing. (Jacques Barzun at a conference at Yale University, cited in. Taviss (ed.), The Computer Impact, 1970, p.199)
While I have not found the original document of Barzun’s talk, Bowler (ed.), Computers in Humanistic Research, 1967, p.232 has a summary of his three main points of critique:
First is the assumption of a false relation between the units defined and written and the reality they are supposed to represent. For example, 20 years ago, someone attempted to study genius by selecting names from Who’s Who in America, as being indicative of the quality of genius. Second is the fallacy of assessing importance by weight or numbers. The speaker mentioned a published census, again some 20 years ago, which indicated that the number of brownstone or frame houses in New York was much larger than the number of skyscrapers, giving the erroneous impression that the former represented the city’s characteristic architectural form. The third error is the attribution of meaning based upon only a partial study of the object in question. Two conspicuous examples of the faulty attribution of meaning to partial signs are the cases of machine translation and the objective tests given to school children and the people in business.
Would it be very hard to find contemporary examples that fit these three points?