Category Archives: method
When it comes to digital methods, one of the basic conundrums one encounters is the ambivalence between platform and practice. To phrase it in basic terms: are outcomes genuine human practice or simply artifacts of the platform’s affordances? There are different ways to approach this problem conceptually and I would go as far as saying that it is a false problem, since I do not think that there is something like unmediated human practice in the first place. The fact remains, however, that we may want to focus on one or the other for various reasons. My own interest lie squarely in understanding the technical dimension and this post introduces an approach to studying the algorithms at work in social media platforms with the help of digital methods.
While a number of scholars have recently been engaged in attempts to reverse engineer relevant algorithms, the objects I am interested in are clearly too complex and dynamic to reproduce the decision mechanisms involved – which, in any case, are probably in constant movement due to machine learning components being part of the larger procedure. My goal is actually more basic and the approach I want to present is largely descriptive in the sense that it does little more than propose a way to talk about the outcomes of algorithmic work, in this case of ranking mechanisms. By “talk about”, I first mean graphically and quantitatively, but the goal, in fact, is quite qualitative. While I have real sympathies for the desire to describe artifacts considered to be the apogee of exactness in exact terms, I think that we need to explore other directions as well. In any case, we constantly examine and analyze phenomena in ways that do not require formal descriptions. We can study the NY Times’ editorial decisions – which involve a lot of ranking and appreciation of value – in ways that do not include building a formal decision model and still make interesting observations. Maybe it is time to see how methods for describing social phenomena can be used to describe formal mechanisms and not the other way round. What I have in mind does not go very far in this direction, but it embraces description as its methodology.
To make this idea more plastic, I take YouTube (YT) as my example and focus on YT’s search ranking. When looking for the keyword [syria], for example, YT returns an ordered list of videos. How can we talk about the produced rankings, here? One way would be to look into the factors YT itself communicates as relevant or turn to SEO blogs to gather attempts to identify the central variables. This is certainly interesting, but we could also just look at the results themselves. Using the YouTube Data Tools (YTDT), I have been collecting daily rankings for a number of keywords over the last months, [syria] being one of them. This file contains the data for five days. The rows are videos ordered by result rank and there is also a viewcount for each video. The file looks like this:
A very basic way to start making sense of these results is to visualize them. To help with this, I built a small tool, RankFlow, which is explicitly designed for analyzing rankings over time. Here is a screenshot of a visualization of the data (click for larger image):
Every column is a day of videos and each column is ordered by result rank. The height of each block encodes the viewcount variable as logarithm (to compress the vast differences in viewcount) while colors (from blue to red) indicate the unprocessed viewcount. The video with the highest viewcount actually only appears at rank 15 on the fifth day. What can we learn from such a basic visualization? First, absolute viewcount is obviously not the main ranking criterion. Second, rankings change quite a lot; between the second and the third day, for example, seven videos fall out of the top 15 and the video that comes in first on day three is again gone on day five. Third, there are a number of videos in the top ranks that have surprisingly low viewcounts. What I take from this case – and others I have looked at – is that YT probably uses a predictive ranking model that calculates something like a “chance to find an audience” metric (e.g. based on channels’ previous videos), places the video in the rankings, and – if it does not catch on – removes it again quite quickly (the top video on the first day is good example for a video that does catch on). This is in stark contrast to the “authoritative” rankings on Google Search that change much less frequently and tend towards something like a stable consensus. On YT, the ranking mechanism seems to “care” much more about quick turnover, newness, and serendipity. Looking at a simple RankFlow can give us a pretty good idea what is happening with a specific query and looking at a number of them can lead us to a more general assessment about output dynamics.
A second approach to describing ranking follows a direction that uses an algorithm to talk about another algorithm’s output. The problem with the above visualization is that it quickly gets very complicated to read and summarize when we start adding columns. But information scientists have been working on ways to produce quantitative measures to describe changes in rankings. On the bottom of the above visualization, you can see a number that tries to measure the changes between each two day pairs. There are many such measures available, but the one I found most intriguing came from a 2010 paper by William Webber, Alistair Moffat, and Justin Zobel. This was the one metric I found that would a) work with ranked lists where elements are not necessarily the same for each list (i.e. a video present on one day is no longer there on the next day), b) take into account changes in rank, not just presence or absence of an element, and c) attribute more value to changes at the top of the list than changes happening at the bottom. Rank-Biased Overlap (and its metrical form, Rank-Biased Distance) does just that. The RBD value between two days thus interprets changes in rank in a particular way and it condenses its interpretation into a single value. The higher the value, the more change. This is, of course, a reductionist gesture, but if we understand how the metric reduces, it can be extremely helpful to make sense of the “changiness” of rankings in a context where we have a lot of data. The algorithm (equation 32 in the paper, the “calc_rbo” function in my implementation) is not simple, but if you take some time to compare the visualization to the RBD values, you can get a basic feel for how it reacts to changes in rankings. This opens the door to more “macro” appreciations of changes in ranking and, interestingly, to comparison between platforms. A high average RBD value would indicate a tendency to fluctuate, a low value a preference for stability.
Both of these examples do not allow us to reverse engineer the actual algorithm(s) in question, but we need to get comfortable with the idea that this is not going to be an option in most cases anyways. Systematic description, however, allows us to still say something about the structure and dynamics of outputs and gives us an idea of the character or temperament of a ranking mechanism, for example. This post is just a starting point that I hope to turn into something more substantial in the future, but I hope it shows how relatively simple techniques can be employed to make potentially interesting findings.
After about two years of thinking and coding, my colleague Erik Borra and myself are happy to announce that the Digital Methods Initiative Twitter Capture and Analysis Toolkit (DMI-TCAT) is finally available for download. DMI-TCAT runs in a LAMP environment and allows for capturing data in a number of different ways via both the streaming and search APIs, and provides a whole battery of analytical approaches to investigating tweet collections. For a more detailed description check out the wiki on github. There is also a paper (paywall, preprint will follow) that details the tool and the thinking behind it.
“Of course, in the study of such complicated phenomena as occur in biology and sociology, the mathematical method cannot play the same role as, let us say, in physics. In all cases, but especially where the phenomena are most complicated, we must bear in mind, if we are not to lose our way in meaningless play with formulas, that the application of mathematics is significant only if the concrete phenomena have already been made the subject of a profound theory.“
A. D. Aleksandrov, A General View of Mathematics. In: A. D. Aleksandrov, A. N. Kolmogorov, M. A. Lavrent’ev, Mathematics: Its Content, Methods and Meaning. Moscow 1956 (trans. 1964)
Over the last couple of weeks, things have heated up considerably for Google – on the mobile side with the start of a patent war, but also in the search area, the core of the company’s business. Led by Senator Mike Lee (a Utah Republican), the US Senate’s Antitrust Subcommittee has started to probe into certain aspects of Google’s ranking mechanisms and potential cases of abuse and manipulation.
In a hearing on Wednesday, Lee confronted Eric Schmidt with accusations of tampering with results and the evidence the Senator presented was in fact very interesting because it raises the question of how to show or even prove that a highly complex algorithmic procedure “has been tampered with”. As you can see in this video, a scatter-plot from an “independent study” that compares the search ranking for three price comparison sites (Nextag, Pricegrabber, and Shopper) with Google Price Search using 650 shopping related queries. What we can see on the graph is that while there is considerable variation in ranking for the competitors (a site shows up first for one query and way down for another), Google’s site seems to consistently stick to place three. Lee makes this astounding difference the core of his argument and directly asks Schmidt: “These results are in fact the result of the same algorithm as the rankings for the other comparison sites?” The answer is interesting in itself as Schmidt argues that Google’s service is not a product comparison site but a “product site” and that the study basically compares apples to oranges (“they are different animals”). Lee then homes in on the “uncanny” statistical regularity and says “I don’t know whether you call this a separate algorithm or whether you’re reverse engineered a single algorithm, but either way, you’ve cooked it!” to which Schmidt replies “I can assure you that we haven’t cooked anything.”
According to this LA Times article, Schmidt’s testimony did not satisfy the senators and there’s open talk about bias and conflict of interest. I would like to add to add three things here:
1) The debate shows a real mismatch between 20th century concepts of both bias and technology and the 21st century challenge to both of these question that comes in the form of Google. For the senator, bias is something very blatant and obvious, a malicious individual going to the server room at night, tempering with the machinery, transforming the pure technological objectivity into travesty by inserting a line of code that puts Google to third place most of the time. The problem with this view is of course that it makes a clear and strong distinction between a “biased” and an “unbiased” algorithm and clearly misses the point that every ranking procedure implies a bias. If Schmidt says “We haven’t cooked anything!”, who has written the algorithm? If it comes to an audit of Google’s code, I am certain that no “smoking gun” in the form of a primitive and obvious “manipulation” will be found. If Google wants to favor its own services, there are much more subtle and efficient ways to do so – the company does have the best SEO team one could possibly imagine after all. There is simply no need to “cook” anything if you are the one who specifies the features of the algorithm.
2) The research method applied in the mentioned study however is really quite interesting and I am curious to see how far the Senate committee will be able to take the argument. The statistical regularity shown is certainly astounding and if the hearings attain a deeper level of technological expertise, Google may be forced to detail a significant portion of its ranking procedures to show how something like this can happen. It would, of course, be extremely simple to break the pattern by introducing some random element that does not affect the average rank but adds variation. That’s also the reason why I think that Lee’s argument will ultimately fizzle.
3) The core of the problem, I would argue, is not so much the question of manipulation but the fact that by branching into more and more commercial areas, Google finds itself in a market configuration where conflicts of interest are popping up everywhere they turn. As both a search business and an actor on many of the markets that are, at least in part, ordered by the visibility layering in search results, there is a fundamental and structural problem that cannot be solved by any kind of imagined technical neutrality. Even if there is no “in house SEO” going on, the mere fact that Google search prominently links to other company services could already be seen as problematic. In a sense, Senator Lee’s argument actually creates a potentially useful “way out”: if there is no evil line of code written in the dark of night, no “smoking gun”, then everything is fine. The systematic conflict of interest persists however, and I do not believe that more subtle forms of bias towards Google services could be proven or even be seriously debated in a court of law. This level of technicality, I would argue, is no longer (fully) in reach for this kind of causal demonstration. Not so much because of the complexity of the algorithms, but rather because the “state” of the machine includes the full structure of the dataset it is working on, which means the full index in this case. To understand what Google’s algorithms actually do, looking at these algorithms without the data is no longer enough. And the data is big. Very big.
As you can see, I am quite pessimistic about the possibility to bring the kind of argumentation presented by Senator Lee to a real conclusion. If the case against Microsoft is an indicator, I would argue that this pessimism is warranted.
I do believe that we need to concentrate much more on the principal conflicts of interest rather than actual cases of abuse that may be simply too difficult to prove. The fundamental question is really how far a search company that controls such a large portion of the global market should be allowed to be active in other markets. And, really, should a single company control the search market in the first place? Limiting the very potential for abuse is, in my view, the road that legislators and regulators should take, rather than picking a fight over technological issues that they simply cannot win in the long run.
EDIT: Google has compiled its own Guide to the Hearing. Interesting.