Yearly Archives: 2010

If we want to understand the plethora of very specific roles computers play in today’s world, the question “What is software?” is inevitable. Many different answers have been articulated from different viewpoints and different positions – creator, user, enterprise, etc. – in the networks of practices that surround digital objects. From a scholarly perspective, the question is often tied to another one, “Where does software come from?”, and is connected to a history of mathematical thought and the will/pressure/need to mechanize calculation. There we learn for example that the term “algorithm” is derived from the name of the Persian mathematician al-Khwārizmī and that in mathematical textbooks from the middle ages, the term algorism is used to denote the basic arithmetic techniques – that we now learn in grammar school – which break down e.g. the calculation of a multiplication with large numbers into a series of smaller operations. We learn first about Pascal, Babbage, and Lady Lovelace and then about Hilbert, Gödel, and Turing, about the calculation of projectile trajectories, about cryptography, the halt-problem, and the lambda calculus. The heroic history of bold pioneers driven by an uncompromising vision continues into the PC (Engelbart, Kay, the Steves, etc.) and Network (Engelbart again, Cerf, Berners-Lee, etc.) eras. These trajectories of successive invention (mixed with a sometimes exaggerated emphasis on elements from the arsenal of “identity politics”, counter-culture, hacker ethos, etc.) are an integral part for answering our twin question, but they are not enough.

A second strand of inquiry has developed in the slipstream of the monumental work by economic historian Alfred Chandler Jr. (The Visible Hand) who placed the birth of computers and software in the flux of larger developments like industrialization (and particularly the emergence of the large scale enterprise in the late 19th century), bureaucratization, (systems) management, and the general history of modern capitalism. The books by James Beniger (The Control Revolution), JoAnne Yates (Control through Communication and more recently Structuring the Information Age), James W. Cortada (most notably The Digital Hand in three Volumes), and others deepened the economic perspective while Paul N. Edwards’ Closed World or Jon Agar’s The Government Machine look more closely at the entanglements between computers and government (bureaucracy). While these works supply a much needed corrective to the heroic accounts mentioned above, they rarely go beyond the 1960s and do not aim at understanding the specifics of computer technology and software beyond their capacity to increase efficiency and control in information-rich settings (I have not yet read Martin Campell-Kelly’s From Airline Reservations to Sonic the Hedgehog, the title is a downer but I’m really curious about the book).

Lev Manovich’s Language of New Media is perhaps the most visible work of a third “school”, where computers (equipped with GUIs) are seen as media born from cinema and other analogue technologies of representation (remember Computers as Theatre?). Clustering around an illustrious theoretical neighborhood populated by McLuhan, Metz, Barthes, and many others, these works used to dominate the “XY studies” landscape of the 90s and early 00s before all the excitement went to Web 2.0, participation, amateur culture, and so on. This last group could be seen as a fourth strand but people like Clay Shirky and Yochai Benkler focus so strongly on discontinuity that the question of historical filiation is simply not relevant to their intellectual project. History is there to be baffled by both present and future.

This list could go on, but I do not want to simply inventory work on computers and software but to make the following point: there is a pronounced difference between the questions “What is software?” and “What is today’s software?”. While the first one is relevant to computational theory, software engineering, analytical philosophy, and (curiously) cognitive science, there is no direct line from universal Turing machines to our particular landscape with the millions of specific programs written every year. Digital technology is so ubiquitous that the history of computing is caught up with nearly every aspect of the development of western societies over the last 150 years. Bureaucratization, mass-communication, globalization, artistic avant-garde movements, transformations in the organization of labor, expert movements in public administrations, big science, library classifications, the emergence of statistics, minority struggles, two world wars and too many smaller conflicts to count, accounting procedures, stock markets and the financial crisis, politics from fascism to participatory democracy,… – all of these elements can be examined in connection with computing, shaping the tools and being shaped by them in return. I am starting to believe that for the humanities scholar or the social scientist the question “What is software?” is only slightly less daunting than “What is culture?” or “What is society?”. One thing seems sure: we can no longer pretend to answer the latter two questions without bumping into the first one. The problem for the author, then, becomes to choose the relevant strands, to untangle the mess.

In my view, there is a case to be made for a closer look at the role the library and information sciences played in the development of contemporary software techniques, most obviously on the Internet, by not exclusively. While Bush’s Memex has perhaps been commented on somewhat beyond its actual relevance, the work done by people such as Eugene Garfield (citation analysis), Calvin M. Mooers (information retrieval), Hans-Peter Luhn (KWIC), Edgar Codd (relational database) or Gerard Salton (the vector space model) from the 1950s on has not been worked on much outside of specialist circles – despite the fact that our current ways of working with information (yes, this includes your Facebook profile, everything Google is doing, cloud computing, mobile applications and all the other cool stuff Wired writes about) have left behind the logic of the library catalog quite some time ago. This is also where today’s software comes from.

Books are great and I just finished another one that I wish I had read years ago: Alfred D. Chandler Jr. and James W. Cortada: A Nation Transformed by Information. How Information Has Shaped the United States from Colonial Times to the Present. Oxford: Oxford University Press, 2000 (Google Books). The fact that the leading historians on business (Chandler) and computing (Cortada) edit a book together is a setup for great things and the book does not disappoint. Their concluding chapter proved to be particularly stimulating especially the couple of pages in a section called “The Case for Software” (p.290f). Here, the authors argue that while there have been many continuities in the development of IT over the last two centuries, software represents a major discontinuity because of 1) what it is, 2) how it came into the economy, and 3) how it was sold. There is quite a list of arguments the authors present, but two stand out:

First, software is diagnosed as being the “least capital-intensive and most knowledge-intensive of all information technologies to emerge” (p.290) which lead to low barriers to market-entry and immense opportunities for start-ups. Second, the fact that IBM chose to market the IBM-PC as an open hardware platform and Windows’ dominance as a standardized platform for application development created a gigantic market where even niche products could find a considerable audience. For Chandler and Cortada, the “story” of software is not so much the epic battle between operating systems that we love to dwell on but the development of applications. Their story goes like this:

Although software development is very much a knowledge business, the personal commitment required to learn enough to write software is far less than is needed by a computer scientist who is developing either hardware or the next generation of computer chips. The teenager or college student who writes software and ultimately finds a distributor has far less training in the field than the engineer working on Intel’s future product line. Yet both arrive at the same point: they create a marketable product. Thus, in economic terms, software so far has required less intellectual capital, hence offering fewer knowledge barriers to new entrants. Will that change? Perhaps, but what occurred in the 1980s and 1990s is that the barriers to entry remained far lower than for any previous form of information technology and products. (p.296f)

So far so good. This account has been echoed repeatedly (my colleague Mirko Schäfer and I have been amongst the many) but Chandler and Cortada weave a pretty dense and economically sound argument. What is interesting though is the historical backdrop against which the emergence of software unfolds:

In the electronic-based industries on which the Information Age rests, opportunities for individual entrepreneurs to build long-term competitive enterprises also came primarily with the introduction of a new technology. But these opportunities only occurred three times. The first was in the early 1920s with the coming of broadcasting. The second opportunity occurred in the late 1960s and early 1970s, after the introduction of IBM’s System 360 and Digital’s PDP series greatly expanded computerized data processing for commercial activities. The third took place in the first half of the 1950s with the sudden and unexpected coming of the multi-billion-dollar microcomputer industry. Since the mid-1950s opportunities for entrepreneurial start-ups in hardware arose primarily in the production of specialized niche products or for providers of supplies and services to the large established core companies. So if history is any guide, a small number of large complex enterprises, particularly those experienced in building systems, will continue to lead in commercializing the hardware for today’s Information Age.

For the authors, software is different, for the reasons given above. Now, what seems to have happened over the last three years is something that is bringing software incrementally back to the “normal” course of history: the app store and the cloud. To provide a cloud based service, a little coding skill is obviously no longer enough – building a datacenter is not that easy and even cloud hosting services that scale well do not eliminate the need for handing the software logistics of a large user base and huge amounts of data. Mastering synchronization between cloud and client, handling different versions of data points, providing clients to various different (mobile) platforms, etc. requires pretty neat skills and a team of experts. In short, the cloud makes software service development much more capital-intensive (Chandler and Cortada’s first argument) and quickly raises barriers to market entry. Just look at how many billions Mircosoft dumped into search technology for some scraps of the market.

The app store story is a little more complex because – just like Windows – the iPhone SDK and store combo has created a market that is standardized and quite large, affording a new business model for many a developer. But with all the technical limitations (since I got an Android phone the fact that I don’t have a common file storage area on the iPad just feels very, very weird) and the filtering, I would argue that the logic of the app store (at least in its Apple version, but Google also has its kill switch) is halfway between the classic logic of operating systems and the television market where independent studios and production companies sell content to the all powerful networks. The independent journalist that sells copy to newspapers and magazines also comes to mind.

While the software market – despite the long-standing existence of software giants – continues to be a pretty diverse playing field, the process of commodification of software via the cloud and the app store may very well be a step away from software as usual, a kind of historic “normalization” to a situation where a limited number of companies (Google, Apple, Microsoft?) dominate or shape a large portion of the market for software.

…but you can go ahead and waste everybody else – according to Google’s suggest function at least:

This is interesting because it is very obvious that Google erases certain queries in their suggest function (porn, etc.) and the idea that the Internet would “make” suggestible teenagers kill themselves is a recurring and media-fed scare that, as a consequence, is one of the few domains where censoring is near consensual. What I find interesting though is that all these other carnage scenarios do not get the DELETE FROM treatment, although one may argue that killing oneself is not more condemnable than killing somebody else.

But independently of this philosophical question (the only one worth pondering according to Camus, remember?), Google suggest is yet another way to query the closest thing to god there is: Google’s database; and the way certain queries are removed, most certainly by hand (BTW, “je veux me” on google.fr DOES suggest that you may want to end your life)…

Arstechnica is one of the reasons why I believe that there is a future for quality journalism online. Not only because they produce great copy but also because it is one of the few places on the Internet where I don’t want to start maiming myself when I accidentally stumble over the article comments. Ars talks about technology, sure, but there is more and more content on science and really great, well researched pieces on wedge topics (“wedgy” mostly in the US, but spreading) like climate change and evolution. In this article on the basic conceptual differences between studying weather and climate, I stumbled over a comment that I would like to (and probably will) frame and hang on my wall. User Andrei Juan writes:

Regarding the author’s remarks made in the first few paragraphs of the article about comments and commenters, it seems to me that the number of people who post comments to online articles is (perhaps to a lesser extent here on ArsTechnica) usually much larger than the number of people whose education — formal or not — allows them to understand the article well, let alone make meaningful comments.

This is, I think, but one manifestation of many people’s tendency to express themselves in many more situations than when they have something to express. Turned into habit, this leads to confusions like the one discussed by the article, which are IMO a natural outcome of situations in which people who barely passed their high school math and physics tests develop their own opinions (or parrot those of their peers) about topics like dynamic systems. Moreover, put this together with the openness of an online “debate” — which lures people into feeling welcome to discussions where they’re utterly out of their depth yet don’t realize it — and another interesting specimen appears: the person who’s opinionated without really having an opinion.

On soccer fields, we hear these people blowing in vuvuzelas; in the comment sections of online articles though, that option is unavailable, so they’re only left with (ab)using the “Leave a comment” option. Could we, perhaps, eliminate most meaningless comments by adding a button labeled “Blow a vuvuzela” next to the one that says “Leave a comment”?…

In that sense, the highly disturbing “like” and “retweet” buttons one can find on so many sites now may actually have the boon to prevent some people from posting a comment. Not the sophistication of Slashdot‘s karma based moderation system but potentially effective…

Gabriel Tarde is a springwell of interesting – and sometimes positively weird – ideas. In his 1899 article L’opinion et la conversation (reprinted in his 1901 book L’opinion et la foule), the French judge/sociologist makes the following comment:

Il n’y [dans un Etat féodal, BR] avait pas “l’opinion”, mais des milliers d’opinions séparées, sans nul lien continuel entre elles. Ce lien, le livre d’abord, le journal ensuite et avec bien plus d’efficacité, l’ont seuls fourni. La presse périodique a permis de former un agrégat secondaire et très supérieur dont les unités s’associent étroitement sans s’être jamais vues ni connues. De là, des différences importantes, et, entre autre, celles-ci : dans les groupes primaires [des groupes locales basés sur la conversation, BR], les voix ponderantur plutôt que numerantur, tandis que, dans le groupe secondaire et beaucoup plus vaste, où l’on se tient sans se voir, à l’aveugle, les voix ne peuvent être que comptées et non pesées. La presse, à son insu, a donc travaillé à créer la puissance du nombre et à amoindrir celle du caractère, sinon de l’intelligence.

After a quick survey, I haven’t found an English translation anywhere – there might be one in here – so here’s my own (taking some liberties to make it easier to read):

[In a feudal state, BR] there was no “opinion” but thousands of separate opinions, without any steady connection between them. This connection was only delivered by first the book, then, and with greater efficiency, the newspaper. The periodical press allowed for the formation of a secondary and higher-order aggregate whose units associate closely without ever having seen or known each other. Several important differences follow from this, amongst others, this one: in primary  groups [local groups based on conversation, BR], voices ponderantur rather than numerantur, while in the secondary and much larger group, where people connect without seeing each other – blind – voices can only be counted and cannot be weighed. The press has thus unknowingly labored towards giving rise to the power of the number and reducing the power of character, if not of intelligence.

Two things are interesting here: first, Lazarsfeld, Berelson, and Gaudet’s classic study from 1945, The People’s Choice, and even more so Lazarsfeld’s canonical Personal Influence (with Elihu Katz, 1955) are seen as a rehabilitation of the significance (for the formation of opinion) of interpersonal communication at a time when media were considered all-powerful brainwashing machines by theorists such as Adorno and Horkheimer (Adorno actually worked with/for Lazarsfeld in the 30ies, where Lazarsfeld tried to force poor Adorno into “measuring culture”, which may have soured the latter to any empirical inquiry, but that’s a story for another time). Tarde’s work on conversation (the first order medium) is theoretically quite sophisticated – floating against the backdrop of Tarde’s theory of imitation as basic mechanism of cultural production – and actually succeeds in thinking together everyday conversation and mass-media without creating any kind of onerous dichotomy. L’opinion et la conversation would merit an inclusion into any history of communication science and it should come as no surprise that Elihu Katz actually published a paper on Tarde in 1999.

Second, the difference between ponderantur (weighing) and numerantur (counting) is at the same time rather self-evident – an object’s weight and it’s number are logically quite different things – and somewhat puzzling: it reminds us that while measurement does indeed create a universe of number where every variable can be compared to any other, the aspects of reality we choose to measure remain connected to a conceptual backdrop that is by itself neither numerical nor mathematical. What Tarde calls “character” is a person’s capacity to influence, to entice imitation, not the size of her social network.

I’m currently working on a software tool that helps studying Twitter and while sifting through the literature I came across this citation from a 2010 paper by Cha et al.:

We describe how we collected the Twitter data and present the characteristics of the top users based on three influence measures: indegree, retweets, and mentions.

Besides the immense problem of defining influence in non trivial terms, I wonder whether many of the studies on (social) networks that pop up all over the place are hoping to weigh but end up counting again. What would it mean, then, to weigh a person’s influence? What kind of concepts would we have to develop and what could be indicators? In our project we use the bit.ly API to look at clickstream referers – if several people post the same link, who succeeds in getting the most people to click it – but this may be yet another count that says little or nothing about how a link will be uses/read/received by a person. But perhaps this is as far as the “hard” data can take us. But is that really a problem? The one thing I love about Tarde is how he can jump from a quantitative worldview to beautiful theoretical speculation and back with a smile on his face…

Over the last year, I have been reading loads of books in and on Information Science, paying special attention to key texts in the (pre)history of the discipline. Fritz Machlup and Una Mansfield’s monumental anthology The Study of Information (Wiley & Sons, 1983) has been a pleasure to read and there are several passages in the foreword that merit a little commentary. I have always wondered why Shannon’s Mathematical Theory of Communication from 1948 has been such a reference point in the discipline I started out in, communication science. Talking about purely technological problems and pumped with formulas than very, very few social science scholars could make sense of, the whole things seems like a misunderstanding. The simplicity and clearness of the schema on page two – which has been built into the canonical sender-receiver model – cannot be the only reason for the exceptional (mostly second or third hand) reception the text has enjoyed. In Machlup & Mansfield’s foreword one can find some strong words on the question of why a work on engineering problems that excludes even the slightest reference to matters of human understanding came to be cited in probably every single introduction to communication science:

“When scholars were chiefly interested in cognitive information, why did they accept a supposedly scientific definition of ‘information apart from meaning’? One possible explanation is the fact that they were impressed by a definition that provided for measurement. To be sure, measurement was needed for the engineering purposes at hand; but how could anybody believe that Shannon’s formula would also measure information in the sense of what one person tells another by word of mouth, in writing, or in print?
We suspect that the failure to find, and perhaps impossibility of finding, any ways of measuring information in this ordinary sense has induced many to accept measurable signal transmission, channel capacity, or selection rate, misnamed amount of information, as a substitute or proxy for information. The impressive slogan, coined by Lord Kelvin, that ‘science is measurement’ has persuaded many researchers who were anxious to qualify as scientists to start measuring things that cannot be measured. As if under a compulsion, they looked for an operational definition of some aspect of communication or information that stipulated quantifiable operations. Shannon’s formula did exactly that; here was something related to information that was objectively measurable. Many users of the definition were smart enough to realize that the proposed measure – perfectly suited for electrical engineering and telecommunication – did not really fit their purposes; but the compulsion to measure was stronger than their courage to admit that they were not operating sensibly.” (p. 52)

For Machlup & Manfield – who, as trained (neoclassical) economists, should not be deemed closet postmodernists – this compulsion to measure is connected to implicit hierarchies in academia where mathematical rationality reigns supreme.  A couple of pages further, the authors’ judgment becomes particularly harsh:

“This extension of information theory, as developed for communication engineering, to other quite different fields has been a methodological disaster – though the overenthusiastic extenders did not see it, and some of them, who now know that it was an aberration, still believe that they have learned a great deal from it. In actual fact, the theory of signal transmission or activating impulses has little or nothing to teach that could be extended of applied to human communication, social behavior, or psychology, theoretical or experimental.” (p. 56)

Shannon himself avoided the term “information theory” and his conception of communication obviously had nothing to do with what the term has come to mean in the social sciences and general discourse. But the need to show that the social sciences could be “real” sciences in search of laws formulated in mathematical terms proved stronger than the somewhat obvious epistemological mismatch.

Like many classic texts, Machlup & Manfield’s work offers a critique that is not based on dismissal or handbag relativism but on deep engagement with the complexities of the subject matter and long experience  with interdisciplinary work, which, necessarily, makes one bump into unfamiliar concepts, methods, ontological preconceptions, modes of reasoning, vectors of explanation and epistemological urges (what is your knowledge itch? how do you want to scratch it?). The Study of Information is a pleasure to read because it brings together very different fields without proposing some kind of unifying meta-concept or imperialist definition of what science – the quest for knowledge – should look like.

…is so much easier if you’ve got a couple of popular pages to advertise on…

chrome_suggest_march_2010

…and another one…

chrome_suggest_april_2010.JPG

…browser wars all over again…

When it comes to search interfaces, there are a lot of good ideas out there, but there is also a lot of potential for further experimentation. Search APIs are a great field for experimentation as they allow developers to play around with advanced functionality without forcing them to work on a heavy backend structure.

Together with Alex Beaugrand, a student of mine, I have built (a couple of month ago) another little search mashup / interface that allows users to switch between a tag cloud view and a list / cluster mode. contextDigger uses the delicious and Bing APIs to widen the search space using associated searches / terms and then Yahoo BOSS to download a thousand results that can be filtered through the interface. It uses the principle of faceted navigation to shorten the list : if you click on two terms, only the results associated with both of them will appear…

Since I have started to play around with the latest (and really great, easy to use) version of the gephi graph visualization and analysis platform, I have developed an obsession to build .gdf output (.gdf is a graph description format that you can open with gephi) into everything I come across. The latest addition is a Facebook application called netvizz that creates a .gdf file describing either your personal network or the groups you are a member of.

There are of course many applications that let you visualize your network directly in Facebook but by being able to download a file, you can choose your own visualization tool, play around with it, select and parameter layout algorithms, change colors and sizes, rearrange by hand, and so forth. Toolkits like gephi are just so much more powerful than Flash toys…

my puny facebook network - gephi can process much larger graphs

my puny facebook network - gephi can process much larger graphs

What’s rather striking about these Facebook networks is how much the shape is connected to physical and social mobility. If you look at my network, you can easily see the Klagenfurt (my hometown) cluster to the very right, my studies in Vienna in the middle, and my French universe on the left. The small grape on the top left documents two semesters of teaching at the American University of Paris…

Update: v0.2 of netvizz is out, allowing you to add some data for each profile. Next up is GraphML and Mondrian file support, more data for profiles, etc…

Update 2: netvizz currently only works with http and not https. I will try to move the app to a different server ASAP.

My colleague Theo Röhle and  I went to the Computational Turn conference this week. While I would have preferred to hear a bit more on truly digital research methodology (in the fully scientific sense of the word “method”), the day was really quite interesting and the weather unexpectedly gorgeous. Most of the papers are available on the conference site, make sure to have a look. The text I wrote with Theo tried to structure some of the epistemological challenges and problems to take into account when working with digital methods. Here’s a tidbit:

…digital technology is set to change the way scholars work with their material, how they “see” it and interact with it. The question is, now, how well the humanities are prepared for these transformations. If there truly is a paradigm shift on the horizon, we will have to dig deeper into the methodological assumptions that are folded into the new tools. We will need to uncover the concepts and models that have carried over from different disciplines into the programs we employ today…