2010
08.06

Books are great and I just finished another one that I wish I had read years ago: Alfred D. Chandler Jr. and James W. Cortada: A Nation Transformed by Information. How Information Has Shaped the United States from Colonial Times to the Present. Oxford: Oxford University Press, 2000 (Google Books). The fact that the leading historians on business (Chandler) and computing (Cortada) edit a book together is a setup for great things and the book does not disappoint. Their concluding chapter proved to be particularly stimulating especially the couple of pages in a section called “The Case for Software” (p.290f). Here, the authors argue that while there have been many continuities in the development of IT over the last two centuries, software represents a major discontinuity because of 1) what it is, 2) how it came into the economy, and 3) how it was sold. There is quite a list of arguments the authors present, but two stand out:

First, software is diagnosed as being the “least capital-intensive and most knowledge-intensive of all information technologies to emerge” (p.290) which lead to low barriers to market-entry and immense opportunities for start-ups. Second, the fact that IBM chose to market the IBM-PC as an open hardware platform and Windows’ dominance as a standardized platform for application development created a gigantic market where even niche products could find a considerable audience. For Chandler and Cortada, the “story” of software is not so much the epic battle between operating systems that we love to dwell on but the development of applications. Their story goes like this:

Although software development is very much a knowledge business, the personal commitment required to learn enough to write software is far less than is needed by a computer scientist who is developing either hardware or the next generation of computer chips. The teenager or college student who writes software and ultimately finds a distributor has far less training in the field than the engineer working on Intel’s future product line. Yet both arrive at the same point: they create a marketable product. Thus, in economic terms, software so far has required less intellectual capital, hence offering fewer knowledge barriers to new entrants. Will that change? Perhaps, but what occurred in the 1980s and 1990s is that the barriers to entry remained far lower than for any previous form of information technology and products. (p.296f)

So far so good. This account has been echoed repeatedly (my colleague Mirko Schäfer and I have been amongst the many) but Chandler and Cortada weave a pretty dense and economically sound argument. What is interesting though is the historical backdrop against which the emergence of software unfolds:

In the electronic-based industries on which the Information Age rests, opportunities for individual entrepreneurs to build long-term competitive enterprises also came primarily with the introduction of a new technology. But these opportunities only occurred three times. The first was in the early 1920s with the coming of broadcasting. The second opportunity occurred in the late 1960s and early 1970s, after the introduction of IBM’s System 360 and Digital’s PDP series greatly expanded computerized data processing for commercial activities. The third took place in the first half of the 1950s with the sudden and unexpected coming of the multi-billion-dollar microcomputer industry. Since the mid-1950s opportunities for entrepreneurial start-ups in hardware arose primarily in the production of specialized niche products or for providers of supplies and services to the large established core companies. So if history is any guide, a small number of large complex enterprises, particularly those experienced in building systems, will continue to lead in commercializing the hardware for today’s Information Age.

For the authors, software is different, for the reasons given above. Now, what seems to have happened over the last three years is something that is bringing software incrementally back to the “normal” course of history: the app store and the cloud. To provide a cloud based service, a little coding skill is obviously no longer enough – building a datacenter is not that easy and even cloud hosting services that scale well do not eliminate the need for handing the software logistics of a large user base and huge amounts of data. Mastering synchronization between cloud and client, handling different versions of data points, providing clients to various different (mobile) platforms, etc. requires pretty neat skills and a team of experts. In short, the cloud makes software service development much more capital-intensive (Chandler and Cortada’s first argument) and quickly raises barriers to market entry. Just look at how many billions Mircosoft dumped into search technology for some scraps of the market.

The app store story is a little more complex because – just like Windows – the iPhone SDK and store combo has created a market that is standardized and quite large, affording a new business model for many a developer. But with all the technical limitations (since I got an Android phone the fact that I don’t have a common file storage area on the iPad just feels very, very weird) and the filtering, I would argue that the logic of the app store (at least in its Apple version, but Google also has its kill switch) is halfway between the classic logic of operating systems and the television market where independent studios and production companies sell content to the all powerful networks. The independent journalist that sells copy to newspapers and magazines also comes to mind.

While the software market – despite the long-standing existence of software giants – continues to be a pretty diverse playing field, the process of commodification of software via the cloud and the app store may very well be a step away from software as usual, a kind of historic “normalization” to a situation where a limited number of companies (Google, Apple, Microsoft?) dominate or shape a large portion of the market for software.

2010
07.22

…but you can go ahead and waste everybody else – according to Google’s suggest function at least:

This is interesting because it is very obvious that Google erases certain queries in their suggest function (porn, etc.) and the idea that the Internet would “make” suggestible teenagers kill themselves is a recurring and media-fed scare that, as a consequence, is one of the few domains where censoring is near consensual. What I find interesting though is that all these other carnage scenarios do not get the DELETE FROM treatment, although one may argue that killing oneself is not more condemnable than killing somebody else.

But independently of this philosophical question (the only one worth pondering according to Camus, remember?), Google suggest is yet another way to query the closest thing to god there is: Google’s database; and the way certain queries are removed, most certainly by hand (BTW, “je veux me” on google.fr DOES suggest that you may want to end your life)…

2010
07.10

Arstechnica is one of the reasons why I believe that there is a future for quality journalism online. Not only because they produce great copy but also because it is one of the few places on the Internet where I don’t want to start maiming myself when I accidentally stumble over the article comments. Ars talks about technology, sure, but there is more and more content on science and really great, well researched pieces on wedge topics (“wedgy” mostly in the US, but spreading) like climate change and evolution. In this article on the basic conceptual differences between studying weather and climate, I stumbled over a comment that I would like to (and probably will) frame and hang on my wall. User Andrei Juan writes:

Regarding the author’s remarks made in the first few paragraphs of the article about comments and commenters, it seems to me that the number of people who post comments to online articles is (perhaps to a lesser extent here on ArsTechnica) usually much larger than the number of people whose education — formal or not — allows them to understand the article well, let alone make meaningful comments.

This is, I think, but one manifestation of many people’s tendency to express themselves in many more situations than when they have something to express. Turned into habit, this leads to confusions like the one discussed by the article, which are IMO a natural outcome of situations in which people who barely passed their high school math and physics tests develop their own opinions (or parrot those of their peers) about topics like dynamic systems. Moreover, put this together with the openness of an online “debate” — which lures people into feeling welcome to discussions where they’re utterly out of their depth yet don’t realize it — and another interesting specimen appears: the person who’s opinionated without really having an opinion.

On soccer fields, we hear these people blowing in vuvuzelas; in the comment sections of online articles though, that option is unavailable, so they’re only left with (ab)using the “Leave a comment” option. Could we, perhaps, eliminate most meaningless comments by adding a button labeled “Blow a vuvuzela” next to the one that says “Leave a comment”?…

In that sense, the highly disturbing “like” and “retweet” buttons one can find on so many sites now may actually have the boon to prevent some people from posting a comment. Not the sophistication of Slashdot’s karma based moderation system but potentially effective…

2010
07.03

Gabriel Tarde is a springwell of interesting – and sometimes positively weird – ideas. In his 1899 article L’opinion et la conversation (reprinted in his 1901 book L’opinion et la foule), the French judge/sociologist makes the following comment:

Il n’y [dans un Etat féodal, BR] avait pas “l’opinion”, mais des milliers d’opinions séparées, sans nul lien continuel entre elles. Ce lien, le livre d’abord, le journal ensuite et avec bien plus d’efficacité, l’ont seuls fourni. La presse périodique a permis de former un agrégat secondaire et très supérieur dont les unités s’associent étroitement sans s’être jamais vues ni connues. De là, des différences importantes, et, entre autre, celles-ci : dans les groupes primaires [des groupes locales basés sur la conversation, BR], les voix ponderantur plutôt que numerantur, tandis que, dans le groupe secondaire et beaucoup plus vaste, où l’on se tient sans se voir, à l’aveugle, les voix ne peuvent être que comptées et non pesées. La presse, à son insu, a donc travaillé à créer la puissance du nombre et à amoindrir celle du caractère, sinon de l’intelligence.

After a quick survey, I haven’t found an English translation anywhere – there might be one in here – so here’s my own (taking some liberties to make it easier to read):

[In a feudal state, BR] there was no “opinion” but thousands of separate opinions, without any steady connection between them. This connection was only delivered by first the book, then, and with greater efficiency, the newspaper. The periodical press allowed for the formation of a secondary and higher-order aggregate whose units associate closely without ever having seen or known each other. Several important differences follow from this, amongst others, this one: in primary  groups [local groups based on conversation, BR], voices ponderantur rather than numerantur, while in the secondary and much larger group, where people connect without seeing each other – blind – voices can only be counted and cannot be weighed. The press has thus unknowingly labored towards giving rise to the power of the number and reducing the power of character, if not of intelligence.

Two things are interesting here: first, Lazarsfeld, Berelson, and Gaudet’s classic study from 1945, The People’s Choice, and even more so Lazarsfeld’s canonical Personal Influence (with Elihu Katz, 1955) are seen as a rehabilitation of the significance (for the formation of opinion) of interpersonal communication at a time when media were considered all-powerful brainwashing machines by theorists such as Adorno and Horkheimer (Adorno actually worked with/for Lazarsfeld in the 30ies, where Lazarsfeld tried to force poor Adorno into “measuring culture”, which may have soured the latter to any empirical inquiry, but that’s a story for another time). Tarde’s work on conversation (the first order medium) is theoretically quite sophisticated – floating against the backdrop of Tarde’s theory of imitation as basic mechanism of cultural production – and actually succeeds in thinking together everyday conversation and mass-media without creating any kind of onerous dichotomy. L’opinion et la conversation would merit an inclusion into any history of communication science and it should come as no surprise that Elihu Katz actually published a paper on Tarde in 1999.

Second, the difference between ponderantur (weighing) and numerantur (counting) is at the same time rather self-evident – an object’s weight and it’s number are logically quite different things – and somewhat puzzling: it reminds us that while measurement does indeed create a universe of number where every variable can be compared to any other, the aspects of reality we choose to measure remain connected to a conceptual backdrop that is by itself neither numerical nor mathematical. What Tarde calls “character” is a person’s capacity to influence, to entice imitation, not the size of her social network.

I’m currently working on a software tool that helps studying Twitter and while sifting through the literature I came across this citation from a 2010 paper by Cha et al.:

We describe how we collected the Twitter data and present the characteristics of the top users based on three influence measures: indegree, retweets, and mentions.

Besides the immense problem of defining influence in non trivial terms, I wonder whether many of the studies on (social) networks that pop up all over the place are hoping to weigh but end up counting again. What would it mean, then, to weigh a person’s influence? What kind of concepts would we have to develop and what could be indicators? In our project we use the bit.ly API to look at clickstream referers – if several people post the same link, who succeeds in getting the most people to click it – but this may be yet another count that says little or nothing about how a link will be uses/read/received by a person. But perhaps this is as far as the “hard” data can take us. But is that really a problem? The one thing I love about Tarde is how he can jump from a quantitative worldview to beautiful theoretical speculation and back with a smile on his face…

2010
07.02

Over the last year, I have been reading loads of books in and on Information Science, paying special attention to key texts in the (pre)history of the discipline. Fritz Machlup and Una Mansfield’s monumental anthology The Study of Information (Wiley & Sons, 1983) has been a pleasure to read and there are several passages in the foreword that merit a little commentary. I have always wondered why Shannon’s Mathematical Theory of Communication from 1948 has been such a reference point in the discipline I started out in, communication science. Talking about purely technological problems and pumped with formulas than very, very few social science scholars could make sense of, the whole things seems like a misunderstanding. The simplicity and clearness of the schema on page two – which has been built into the canonical sender-receiver model – cannot be the only reason for the exceptional (mostly second or third hand) reception the text has enjoyed. In Machlup & Mansfield’s foreword one can find some strong words on the question of why a work on engineering problems that excludes even the slightest reference to matters of human understanding came to be cited in probably every single introduction to communication science:

“When scholars were chiefly interested in cognitive information, why did they accept a supposedly scientific definition of ‘information apart from meaning’? One possible explanation is the fact that they were impressed by a definition that provided for measurement. To be sure, measurement was needed for the engineering purposes at hand; but how could anybody believe that Shannon’s formula would also measure information in the sense of what one person tells another by word of mouth, in writing, or in print?
We suspect that the failure to find, and perhaps impossibility of finding, any ways of measuring information in this ordinary sense has induced many to accept measurable signal transmission, channel capacity, or selection rate, misnamed amount of information, as a substitute or proxy for information. The impressive slogan, coined by Lord Kelvin, that ’science is measurement’ has persuaded many researchers who were anxious to qualify as scientists to start measuring things that cannot be measured. As if under a compulsion, they looked for an operational definition of some aspect of communication or information that stipulated quantifiable operations. Shannon’s formula did exactly that; here was something related to information that was objectively measurable. Many users of the definition were smart enough to realize that the proposed measure – perfectly suited for electrical engineering and telecommunication – did not really fit their purposes; but the compulsion to measure was stronger than their courage to admit that they were not operating sensibly.” (p. 52)

For Machlup & Manfield – who, as trained (neoclassical) economists, should not be deemed closet postmodernists – this compulsion to measure is connected to implicit hierarchies in academia where mathematical rationality reigns supreme.  A couple of pages further, the authors’ judgment becomes particularly harsh:

“This extension of information theory, as developed for communication engineering, to other quite different fields has been a methodological disaster – though the overenthusiastic extenders did not see it, and some of them, who now know that it was an aberration, still believe that they have learned a great deal from it. In actual fact, the theory of signal transmission or activating impulses has little or nothing to teach that could be extended of applied to human communication, social behavior, or psychology, theoretical or experimental.” (p. 56)

Shannon himself avoided the term “information theory” and his conception of communication obviously had nothing to do with what the term has come to mean in the social sciences and general discourse. But the need to show that the social sciences could be “real” sciences in search of laws formulated in mathematical terms proved stronger than the somewhat obvious epistemological mismatch.

Like many classic texts, Machlup & Manfield’s work offers a critique that is not based on dismissal or handbag relativism but on deep engagement with the complexities of the subject matter and long experience  with interdisciplinary work, which, necessarily, makes one bump into unfamiliar concepts, methods, ontological preconceptions, modes of reasoning, vectors of explanation and epistemological urges (what is your knowledge itch? how do you want to scratch it?). The Study of Information is a pleasure to read because it brings together very different fields without proposing some kind of unifying meta-concept or imperialist definition of what science – the quest for knowledge – should look like.

2010
04.15

…is so much easier if you’ve got a couple of popular pages to advertise on…

chrome_suggest_march_2010

…and another one…

chrome_suggest_april_2010.JPG

…browser wars all over again…

2010
03.24

When it comes to search interfaces, there are a lot of good ideas out there, but there is also a lot of potential for further experimentation. Search APIs are a great field for experimentation as they allow developers to play around with advanced functionality without forcing them to work on a heavy backend structure.

Together with Alex Beaugrand, a student of mine, I have built (a couple of month ago) another little search mashup / interface that allows users to switch between a tag cloud view and a list / cluster mode. contextDigger uses the delicious and Bing APIs to widen the search space using associated searches / terms and then Yahoo BOSS to download a thousand results that can be filtered through the interface. It uses the principle of faceted navigation to shorten the list : if you click on two terms, only the results associated with both of them will appear…

2010
03.22

Since I have started to play around with the latest (and really great, easy to use) version of the gephi graph visualization and analysis platform, I have developed an obsession to build .gdf output (.gdf is a graph description format that you can open with gephi) into everything I come across. The latest addition is a Facebook application called netvizz that creates a .gdf file describing either your personal network or the groups you are a member of.

There are of course many applications that let you visualize your network directly in Facebook but by being able to download a file, you can choose your own visualization tool, play around with it, select and parameter layout algorithms, change colors and sizes, rearrange by hand, and so forth. Toolkits like gephi are just so much more powerful than Flash toys…

my puny facebook network - gephi can process much larger graphs

my puny facebook network - gephi can process much larger graphs

What’s rather striking about these Facebook networks is how much the shape is connected to physical and social mobility. If you look at my network, you can easily see the Klagenfurt (my hometown) cluster to the very right, my studies in Vienna in the middle, and my French universe on the left. The small grape on the top left documents two semesters of teaching at the American University of Paris…

Update: v0.2 of netvizz is out, allowing you to add some data for each profile. Next up is GraphML and Mondrian file support, more data for profiles, etc…

2010
03.12

My colleague Theo Röhle and  I went to the Computational Turn conference this week. While I would have preferred to hear a bit more on truly digital research methodology (in the fully scientific sense of the word “method”), the day was really quite interesting and the weather unexpectedly gorgeous. Most of the papers are available on the conference site, make sure to have a look. The text I wrote with Theo tried to structure some of the epistemological challenges and problems to take into account when working with digital methods. Here’s a tidbit:

…digital technology is set to change the way scholars work with their material, how they “see” it and interact with it. The question is, now, how well the humanities are prepared for these transformations. If there truly is a paradigm shift on the horizon, we will have to dig deeper into the methodological assumptions that are folded into the new tools. We will need to uncover the concepts and models that have carried over from different disciplines into the programs we employ today…

2010
02.26

The question of how mathematics could lay the foundation for a machine that sustains such a wide variety of practices is really quite well understood from the point of view of the mathematical theory of computation. From a humanities standpoint however, despite the number of texts commenting on the genius of key figures such as Gödel, Turing, Shannon, and Church, there is still a certain awkwardness when it comes to situating the key steps in mathematical reasoning that lead up to the birth of the computer in the larger context of mathematics itself. One of the questions I find really quite interesting is the role of the formalist stance in mathematics.

In the philosophy of mathematics, there are many different positions. The realist stance for example holds that mathematical objects exist. For the platonist, they exist in some kind of extra spatio-temporal realm of ideas. For the physicalist, they are intrinsically connected to material existence, even if that relationship is not necessarily simple. Then there is formalism and this is where things get interesting. In a tale we can read in many social sciences and humanities books on the computer, there is the young Kurt Gödel that smashes the coherent world of the “establishment” mathematician David Hilbert, inventing the metamathematical tools that will later prove essential for the practical realization of computing machinery in the process. What is most often overlooked in that story is that Hilbert’s formalist position is already an extremely important step in the preparation for what is to come. For Hilbert, the question of the ontological status of mathematical objects is already a no-go – truth is no longer defined via any kind of correspondence to an external system but as a function of the internal coherence of the symbolic system. As Bettina Heintz says, Hilbert’s work rendered mathematical concepts “self-sufficient” (autark) by liberating them from any kind of external benchmark and opening a purely mechanical world where symbolic machinery can be built at will, like in a game.

If we want to think about computing today, I think we should remember this break from an ontological concept of truth to a purely formalistic one (even if that mean Gödel put a pretty big crack in it lateron). Because in a way, programming is like a “game” with formulas and if the algorithm works, that means it is “true”. In this sense, Google’s PageRank algorithm is true. But without the reference to an external system, this “truth” is purely mechanical, internal. In a similar way, an algorithm’s claim to objectivity, impartiality, or neutrality should be seen as internal only. The moment we apply mathematics to the description of some external mechanism (gravity, for example), there is a second truth criterion that intervenes, which refers to the establishment of correspondence between the formal system and the external reality. In the same way, if an algorithm is applied to, let’s say the filtering of information, the formal world of the game is mapped onto another world. There is an important difference however. When mathematics are applied to physical phenomena, the gesture is descriptive and epistemological (verb: is). When an algorithms is applied to tasks such as information filtering, the gesture is prescriptive and political (verb: ought).

The fact than an automatic procedure works makes it true in a formal sense. The moment we apply it to a certain task, other criteria intervene. Hilbert’s formalism pulled mathematics from the empirical world and if we bring the two together again by writing software, the criteria by which we judge the quality of that action should be seen as political because there are no mathematical criteria to judge the mapping of on world onto the other. No Hilbert to hold our hand…