Category Archives: epistemolgy

The question of how mathematics could lay the foundation for a machine that sustains such a wide variety of practices is really quite well understood from the point of view of the mathematical theory of computation. From a humanities standpoint however, despite the number of texts commenting on the genius of key figures such as Gödel, Turing, Shannon, and Church, there is still a certain awkwardness when it comes to situating the key steps in mathematical reasoning that lead up to the birth of the computer in the larger context of mathematics itself. One of the questions I find really quite interesting is the role of the formalist stance in mathematics.

In the philosophy of mathematics, there are many different positions. The realist stance for example holds that mathematical objects exist. For the platonist, they exist in some kind of extra spatio-temporal realm of ideas. For the physicalist, they are intrinsically connected to material existence, even if that relationship is not necessarily simple. Then there is formalism and this is where things get interesting. In a tale we can read in many social sciences and humanities books on the computer, there is the young Kurt Gödel that smashes the coherent world of the “establishment” mathematician David Hilbert, inventing the metamathematical tools that will later prove essential for the practical realization of computing machinery in the process. What is most often overlooked in that story is that Hilbert’s formalist position is already an extremely important step in the preparation for what is to come. For Hilbert, the question of the ontological status of mathematical objects is already a no-go – truth is no longer defined via any kind of correspondence to an external system but as a function of the internal coherence of the symbolic system. As Bettina Heintz says, Hilbert’s work rendered mathematical concepts “self-sufficient” (autark) by liberating them from any kind of external benchmark and opening a purely mechanical world where symbolic machinery can be built at will, like in a game.

If we want to think about computing today, I think we should remember this break from an ontological concept of truth to a purely formalistic one (even if that mean Gödel put a pretty big crack in it lateron). Because in a way, programming is like a “game” with formulas and if the algorithm works, that means it is “true”. In this sense, Google’s PageRank algorithm is true. But without the reference to an external system, this “truth” is purely mechanical, internal. In a similar way, an algorithm’s claim to objectivity, impartiality, or neutrality should be seen as internal only. The moment we apply mathematics to the description of some external mechanism (gravity, for example), there is a second truth criterion that intervenes, which refers to the establishment of correspondence between the formal system and the external reality. In the same way, if an algorithm is applied to, let’s say the filtering of information, the formal world of the game is mapped onto another world. There is an important difference however. When mathematics are applied to physical phenomena, the gesture is descriptive and epistemological (verb: is). When an algorithms is applied to tasks such as information filtering, the gesture is prescriptive and political (verb: ought).

The fact than an automatic procedure works makes it true in a formal sense. The moment we apply it to a certain task, other criteria intervene. Hilbert’s formalism pulled mathematics from the empirical world and if we bring the two together again by writing software, the criteria by which we judge the quality of that action should be seen as political because there are no mathematical criteria to judge the mapping of on world onto the other. No Hilbert to hold our hand…

Over the last couple of years, the social sciences have been increasingly interested in using computer-based tools to analyze the complexity of the social ant farm that is the Web. Issuecrawler was one of the first of such tools and today researchers are indeed using very sophisticated pieces of software to “see” the Web. Sciences-Po, one of these rather strange french institutions that were founded to educate the elite but which now have to increasingly justify their existence by producing research, has recently hired Bruno Latour to head their new médialab, which will most probably head into that very direction. Given Latour’s background (and the fact that Paul Girard, a very competent former colleague at my lab, heads the R&D departement), this should be really very interesting. I do hope that there will be occasion to tackle the most compelling methodological question when in comes to the application of computers (or mathematics in general) to analyzing human life, which is beautifully framed in a rather reluctant statement from 1889 by Karl Pearson, a major figure in the history of statistics:

“Personally I ought to say that there is, in my own opinion, considerable danger in applying the methods of exact science to problems in descriptive science, whether they be problems of heredity or of political economy; the grace and logical accuracy of the mathematical processes are apt to so fascinate the descriptive scientist that he seeks for sociological hypotheses which fit his mathematical reasoning and this without first ascertaining whether the basis of his hypotheses is as broad as that human life to which the theory is to be applied.” cit. in. Stigler, Stephen M.: The History of Statistics. Harvard University Press, 1990 p. 304

This spring worked on an R&D project that was really quite interesting but – as it happens with projects – took up nearly all of my spare time. La montre verte is based on the idea that pollution measurement can be brought down to street level if sensors can be made small enough to be carried around by citizens. Together with a series of partners from the private sector, the CiTu group of my laboratory came up with the idea to put an ozone sensor and a microphone (to measure noise levels) into a watch. That way, the device is not very intrusive and still in direct contact with the surrounding air. We built about 15 prototypes, based on the fact that currently, Paris’ air quality is measured by only a handful of (really high quality) sensors and even the low resolution devices we have in our watches should therefore be able to complement that data with a geographically more fine grained analysis of noise and pollution levels. The watch produces a georeferenced  measurement (a GPS is built into the watch) every second and transmits the data via Bluetooth to a Java application on a portable phone, which then sends every data packet via GPRS to a database server.

heatmapMy job in the project was to build a Web application that allows people to interact with and make sense of the data produced by the watches. Despite the help from several brilliant students from our professional Masters program, this proved to be a daunting task and I spent *at lot* of time programming. The result is quite OK I believe; the application allows users to explore the data (which is organized in localized “experiments”) in different ways, either in real-time or afterward. With a little more time (we had only about three month for the whole project and we got the hardware only days before the first public showcase) we could have done more but I’m still quite content with the result. Especially the heatmap (see image) algorithm was fun to program, I’ve never done a lot of visual stuff so this was new territory and a steep learning curve.

Unfortunately, the strong emphasis on the technological side and the various problems we had (the agile methods one needs for experimental projects are still not understood by many companies) cut down the time for reflection to a minimum and did not allow us to come up with a deeper analysis of the social and political dimensions of what could be called “distributed urban intelligence”. The whole project is embedded in a somewhat naive rhetoric of citizen participation and the idea that technological innovation can solve social problems, in this case matters of urban planning and local governance. A lesson I have learned from this is that the current emphasis in funding on short-term projects that bring together universities and the industry makes it very difficult to carve out an actual space for scientific practice between all the deadlines and the heavy technical demands. And by scientific practice, I mean a *critical* practice that does not only try to base specifications and prototyping on “scientifically valid” approaches to building tools and objects but which includes a reflection on social utility that takes a wider view than just immediate usefulness. In the context of this project, this would have implied a close look at how urban development is currently configured in respect to environmental concerns in order to identify structures of governance and chains of decision-making. This way, the whole project could have targeted issues more clearly and consciously, fine-tuning both the tools and the accompanying discourse to the social dimension it aimed at.

I think my point is that we (at least I) have to learn how to better include a humanities-based research agenda into very high-tech projects. We have known for a long time now that every technical project is in fact a socio-technical enterprise but research funding and the project proposals that it generates are still pretending that the “socio-” part is some fluffy coating that decorates the manly material core where cogs and wire produce tangible effects. As I programmer I know how difficult and time-consuming technical work can be but if there is to be a conscious socio-technical perspective in R&D we have to accept that the fluffy stuff takes even more time – if it is done right. And to do it right means not only reading every book and paper relevant to a subject matter but to take the time to reflect on methodology, to evaluate every step critically, to go back to the drawing board, and to include and to produce theory every step of the way. There is a cost to the scientific method and if that cost is not figured in, the result may still be useful, interesting, thought-provoking, etc. but it will not be truly scientific. I believe that we should defend these costs and show why they are necessary; if we cannot do so, we risk confining the humanities to liberal armchair commentary and the social sciences to ex-post usage analysis.

When talking about the politics of the social Web and particularly online networking, the first issue coming up is invariably the question of privacy and its counterpart, surveillance – big brother, corporations bent on world dominance, and so on. My gut reaction has always been “yeah, but there’s a lot more to it than that” and on this blog (and hopefully a book in a not so far future) I’ve been trying to sort out some of the political issues that do not pertain to surveillance. For me, social networking platforms are more relevant to politics as marketing rather than surveillance. Not that these tools cannot function quite formidably to spy on people, but it is my impression that contemporary governance relies on other principles more than the gathering of intelligence about individual citizens (although it does, too). But I’ve never been very pleased with most of the conceptualizations of “post-disciplinarian” mechanisms of power, even Deleuze’s Post-scriptum sur les sociétés de contrôle, although full of remarkable leads, does not provide a fleshed-out theoretical tool – and it does not fit well with recent developments in the Internet domain.

But then, a couple of days ago I finally started to read the lectures Foucault gave at the Collège de France between 1971 and 1984. In the 1977-1978 term the topic of that class was “Sécurité, Territoire, Population” (STP, Gallimard, 2004) and it holds, in my view, the key to a quite different perspective on how social networking platforms can be thought of as tools of governance involved in specific mechanisms of power.
STP can be seen as both an extension and reevaluation of Foucault’s earlier work on the transition from punishment to discipline as central form in the exercise of power, around the end of the 18th century. The establishing of “good practice” is central to the notion of discipline and disciplinary settings such as schools, prisons or hospitals serve most of all as means for instilling these “good practices” into their subjects. Jeremy Bentham’s Panopticon – a prison architecture that allows a single guard to observe a large population of inmates from a central control point – has in a sense become the metaphor for a technology of power that, in Foucault’s view, is part of a much more complex arrangement of how sovereignty can be performed. Many a blogpost has been dedicated to applying the concept on social networking online.

Curiously though, in STP, Foucault calls the Panopticon both modern and archaic, and he goes as far as dismissing it as the defining element of the modern mechanics of power; in fact, the whole course is organized around the introduction of a third logic of governance besides (and historically following) “punishment” and “discipline”, which he calls “security”. This third regime is no longer focusing on the individual as subject that has to be punished or disciplined but on a new entity, a statistical representation of all individuals, namely the population. The logic of security, in a sense, gives up on the idea of producing a perfect status quo by reforming individuals and begins to focus on the management on averages, acceptable margins, and homeostasis. With the development of the social sciences, society is perceived as a “natural” phenomenon in the sense that it has its own rules and mechanisms that cannot be so easily bent into shape by disciplinary reform of the individual. Contemporary mechanisms of power are, then, not so much based on the formatting of individuals according to good practices but rather on the management of the many subsystems (economy, technology, public health, etc.) that affect a population so that this population will refrain from starting a revolution. Foucault actually comes pretty close to what Ulrich Beck’s will call, eight years later, the Risk Society. The sovereign (Foucault speaks increasingly of “government”) assures its political survival no longer primarily through punishment and discipline but by managing risk by means of scientific arrangements of security. This not only means external risk, but also risk produced by imbalance in the corps social itself.

I would argue that this opens another way of thinking about social networking platforms in political terms. First, we would look at something like Facebook in terms of population not in terms of the individual. I would argue that governmental structures and commercial companies are only in rare cases interested in the doings of individuals – their business is with statistical representations of populations because this is the level contemporary mechanisms of power (governance as opinion management, market intelligence, cultural industries, etc.) preferably operate on. And second – and this really is a very nasty challenge indeed – we would probably have to give up on locating power in specific subsystems (say, information and communication systems) and trace the interplay between the many different layers that compose contemporary society.

The concept of self-organization has recently made quite a comeback and I find myself making a habit of criticizing it. Quite generally I use this blog to sort things out in my head by writing about them and this is an itch that needs scratching. Fortunately, political scientist Steven Weber, in his really remarkable book The Success of Open Source, has already done all the work. On page 132 he writes:

Self-organization is used too often as a placeholder for an unspecified mechanism. The term becomes a euphemism for “I don’t really understand the mechanism that holds the system together.” That is the political equivalent of cosmological dark matter.

This seems really right on target: self-organization is really quite often just a means to negate organizing principles in the absence of an easily identifiable organizing institution. By speaking of self-organization we can skip closer examination and avoid the slow and difficult process of understanding complex phenomena. Webers second point is perhaps even more important in the current debate about Web 2.0:

Self-organization often evokes an optimistically tinged “state of nature” narrative, a story about the good way things would evolve if the “meddling” hands of corporations and lawyers and governments and bureaucracies would just stay away.

I would go even further and argue that especially the digerati philosophy pushed by Wired Magazine equates self-organization with freedom and democracy. Much of the current thinking about Web 2.0 seems to be quite strongly infused by this mindset. But I believe that there is a double fallacy:

  1. Much of what is happening on the Social Web is not self-organization in the sense that governance is the result of pure micro-negotiations between agents; technological platforms lay the ground for and shape social and cultural processes that are most certainly less evident than the organizational structures of the classic firm but nonetheless mechanisms that can be described and explained.
  2. Democracy as a form of governance is really quite dependent on strong organizational principles and the more participative a system becomes, the more complicated it gets. Organizational principles do not need to be institutional in the sense of the different bodies of government; they can be embedded in procedures, protocols or even tacit norms. A code repository like SourceForge.net is quite a complicated system and much of the organizational labor in Open Source is delegated to this and other platforms – coordinating the work effort between that many people would be impossible without it.

My guess is that the concept of self-organization as “state of nature” narrative (nature = good) is much too often used to justify modes of organization that would imply a shift power from traditional institutions of governance to the technological elite (the readers and editors of Wired Magazine). Researchers should therefore be weary of the term and whenever it comes up take an even closer look at the actual mechanisms at work. Self-organization is an explanandum (something that needs to be explainend) and not an explanans (an explanation). This is why I find network science really very interesting. Growth mecanism like preferential attachment allow us to give an analytical content to the placeholder that is “self-organization” and examine, albeit on a very abstract level, the ways in which dynamic systems organize (and distribute power) without central control.

This morning Jonah Bossewitch pointed me to an article over at Wired, authored by Chris Anderson which announces “The End of Theory”. The article’s main argument in itself is not very interesting for anybody with a knack for epistemology – Anderson has apparently never heard of the induction / deduction discussion and a limited idea about what statistics does – but there is a very interesting question lurking somewhere behind all the Californian Ideology and the following citation points right to it:

We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

One could point to the fact that the natural sciences had their experimental side for quite a while (Roger Bacon advocated his scientia experimentalis in the 13th century) and that a laboratory is in a sense a pattern-finding machine where induction continuously plays an important role. What interests me more though is Anderson’s insinuation that statistical algorithms are not models. Let’s just look at one of the examples he uses:

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.

This is a very limited understanding of what constitutes a model. I would argue that PageRank does in fact rely very explicitly on a model which combines several layers of justification. In their seminal paper on Google, Brin and Page write the following:

PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank.

The assumption behind this graph oriented justification is that people do not randomly place links but they do so with purpose. Linking implies attribution of importance: we don’t link to documents that we’re indifferent about. The statistical exploration of the huge graph that is the Web is indeed oriented by this basic assumption and adds the quite contestable ruling according to which shall be most visible what is thought important by the greatest number of linkers. I would, then, argue that there is no experimental method that is purely inductive, not even neural networks. Sure, on the mathematical side we can explore data without limitations concerning their dimensionality, i.e. the number of characteristics that can be taken into account; the method of gathering data is however always a process of selection that is influenced by some idea or intuition that at least implicitly has the characteristic of a model. There is a deductive side to even the most inductive approach. Data is made not given and every projection of that data is oriented. To quote Fernando Pereira:

[W]ithout well-chosen constraints — from scientific theories — all that number crunching will just memorize the experimental data.

As Jonah points out, Anderson’s article is probably a straw man argument whose sole purpose is to attract attention but it points to something that is really important: too many people think that mathematical methods for knowledge discovery (datamining that is) are neutral and objective tools that will find what’s really there and show the world as it is without the stain of human intentionality; these algorithms are therefore not seen as objects of political inquiry. In this view statistics is all about counting facts and only higher layers of abstraction (models, theories,…) can have a political dimension. But it matters what we count and how we count.

In the end, Anderson’s piece is little more than the habitual prostration before the altar of emergence and self-organization. Just exchange the invisible hand for the invisible brain and you’ll get pop epistemology for hive minds…

When sites that involve any kind of ranking change their algorithm, there’ll probably be a spectacle worth watching. When Google made some changes to their search algorithms in 2005, the company was sued by KinderStart.com (a search engine for kids, talk about irony) who went from PageRank riches to rags and lost 70% of their traffic in a day (the case was dismissed in 2007). When Digg finally gave in to a lot of criticism about organized front page hijacking and changed the way story promotion works to include a measure of “diversity”, the regulars were vocally hurt and unhappy. What I find fascinating about the latter case was the technical problem-solving approach that implied the programming of nothing less that diversity. It’s not that hard to understand how such a thing works (think “anti-recommendation system” or “un-collaborative filtering”), but still, one has to sit back and appreciate the idea. We are talking about social engineering done by software engineers. Social problem = design problem.

The very real-world effects of algorithms are quite baffling and since I started to read this book, I truly appreciate the ingenuity and complex simplicity that cannot be reduced to a pure “this is what I want to achieve and so I do it” narrative. There is a delta between the “want” and the “can” and the final system will be the result of a complex negotiation that will have changed both sides of the story in the end. Programming diversity means to give the elusive concept of diversity an analytical core, to formalize it and to turn it into a machine. The “politics” of a ranking algorithm is not only about the values and the project (make story promotion more diverse) but also a matter – to put it bluntly – of the state of knowledge in computer science. This means, in my opinion, that the politics of systems must be discussed in the larger context of an examination of computer science / engineering / design as in itself an already oriented project, based on yet another layer of “want” and “can”.

Thanks to Joris for pointing out that my blog was hacked. Damn you spammers.

 

The term “determine” is often used rather lightly by those who write about the political dimension of technology. At the same time the accusation of “technological determinism” – albeit sometimes right on target – is being used as a means to exclude discussion of technological parameters from the humanities and the social sciences. But what is actually meant by “technological determinism”? In my view, there are three basic forms of thinking about determinism when it comes to technology:

The first is very much connected to French anthropologist André Leroi-Gourhan and holds that technological evolution is largely self-determined. His notion of “tendance technique” takes its inspiration from evolutionary theory in the sense that the technology evolves blindly but following the paths carved out by the “choices” made throughout its phylogenesis (this has been called “cumulative causation” or “path dependency” by some). Leroi-Gourhan’s perspective has been developed further by Deleuze and Guattari in their concept of “phylum” and, most notably, by philosopher of technology Gilbert Simendon (who’s work is finally going to be translated into English, hopefully still in 2008) who sees the process of technological evolution as “concretization”, going from modular designs to always more integrated forms. “Technological determinism” would mean, in this first sense, that technology is not the result of social, economic, or cultural process but largely independent, forcing the other sectors to adapt. Technology is determined by its inner logic.

A more colloquial meaning of technological determinism is, of course, connected to the Toronto school, namely Harold A. Innis and Marshall McLuhan. This stuff is so well known and overcommented that I don’t really want to get into it – let’s just say that technology, here, determines social process either by installing a specific rapport to covering space and time (Innis) or by establishing a certain equilibrium of the senses (McLuhan). You can find dystopical versions of the same basic concept in Ellul or Postman: technology determines society, to state matters bluntly.

I would argue that there is third version of technological determinism which is, although not completely dissimilar, far more subtle than the last one. Heidegger’s framing of technology as Gestell (an outlook based on cold mathematical reasoning, industrial destruction of more integrated ways of living, exploitation of nature, etc.) opens up a question that has been taken up by a large number of people in design theory and practice: is technology determined to follow the logic of Gestell? In Heidegger’s perspective, technology is doomed to exert a dehumanizing force on being itself: the determinism here does not so much concern the relationship between technology and society but the essence (Wesen) of technology itself. A lot of thinking about design over the last thirty years has been based on the assumption that a different form of technology is possible: technology that would escape its destiny as Gestell and be emancipating instead of alienating. Discourse about information technology is indeed full of such hopes.

Although “technological determinism” refers most often to the second perspective, a closer examination of “what determines what” opens up a series of quite interesting questions that go beyond the vulgar interpretations of McLuhan’s writings. For those who still adhere to the idea that tools determine their use, here is a list of possible remedies:

  1. Look at design studies where determinism has been replaced by the quite elegant notion of affordance.
  2. Read more Actor-Network Theory.
  3. Think about what Roland Barthes meant by “interpretation”.
  4. Dust off your copy of Hall’s “encoding/decoding”.
  5. Work as a software developer and marvel at the infinity of ways users find to use, appropriate, and break your applications.

Two things currently stand out in my life: a) I’m working on an article on the relationship between mathematical network analysis and the humanities, and b) continental Europe is finally discovering Facebook. The fact that A is highly stimulating (some of the stuff I’m reading is just very decent scholarship, especially Mathématiques et Sciences humaines [mostly French, some English] is a source of wonder) and B quite annoying (no, I don’t miss kindergarten) is of little importance here; there is, however, a connection between the two things that I would like to explore a little bit here.

Part of the research that I’m looking into is what has been called “The New Science of Networks” (NSN), a field founded mostly by physicists and mathematicians that started to quantitatively analyze very big networks belonging to very different domains (networks of acquaintance, the Internet, food networks, brain connectivity, movie actor networks, disease spread, etc.). Sociologists have worked with mathematical analysis and network concepts from at least the 1930ies but because of the limits of available data, the networks studied rarely went beyond hundreds of nodes. NSN however studies networks with millions of nodes and tries to come up with representations of structure, dynamics and growth that are not just used to make sense of empirical data but also to build simulations and come up with models that are independent of specific domains of application.

Very large data sets have only become available in recent history: social network data used to be based on either observation or surveys and thus inherently limited. Since the arrival of digital networking, a lot more data has been produced because many forms of communication or interaction leave analyzable traces. From newsgroups to trackback networks on blogs, very simple crawler programs suffice to produce matrices that include millions of nodes and can be played around with indefinitely, from all kinds of angles. Social network sites like Facebook or MySpace are probably the best example for data pools just waiting to be analyzed by network scientists (and marketers, but that’s a different story). This brings me to a naive question: what is a social network?

The problem of creating data sets for quantitative analysis in the social sciences is always twofold: a) what do I formalize, i.e. what are the variables I want to measure? b) how do I produce my data? The question is that of building a representation. Do my categories represent the defining traits of the system I wish to study? Do my measuring instruments truly capture the categories I decided on? In short: what to measure and how to measure it, categories and machinery. The results of mathematical analysis (which is not necessarily statistical in nature) will only begin to make sense if formalization and data collection were done with sufficient care. So, again, what is a social network?

Facebook (pars pro toto for the whole category qua currently most annoying of the bunch) allows me to add “friends” to my “network”. By doing so, I am “digitally mapping out the relationships I already have”, as Mark Zuckerberg recently explained. So I am, indeed, creating a data model of my social network. Fifty million people are doing the same, so the result is a digital representation of the social connectivity of an important part of the Internet-connected world. From a social science research perspective, we could now ask whether Facebook’s social network (as database) is a good model of the social network (as social structure) it supposedly maps. This does, of course, depend on what somebody would want to study but if you ask yourself, whether Facebook is an accurate map of your social connections, you’ll probably say no. For the moment, the formalization and data collection that apply when people use a social networking site does not capture the whole gamut of our daily social interactions (work, institutions, groceries, etc.) and does not include many of the people that play important roles in our lives. This does not mean that Facebook would not be an interesting data set to explore quantitatively; but it means that there still is an important distinction between the formal model (data and algorithm, what? and how?) of “social network” produced by this type of information system and the reality of daily social experience.

So what’s my point? Facebook is not a research tool for the social sciences and nobody cares whether the digital maps of our social networks are accurate or not. Facebook’s data model was not created to represent a social system but to produce a social system. Unlike the descriptive models of science, computer models are performative in a very materialist sense. As Baudrillard argues, the question is no longer whether the map adequately represents the territory, but in which way the map is becoming the new territory. The data model in Facebook is a model in the sense that it orients rather than represents. The “machinery” is not there to measure but to produce a set of possibilities for action. The social network (as database) is set to change the way our social network (as social structure) works – to produce reality rather than map it. But much as we can criticize data models in research for not being adequate to the phenomena they try to describe, we can examine data models, algorithms and interfaces of information systems and decide whether they are adequate for the task at hand. In science, “adequate” can only be defined in connection to the research question. In design and engineering there needs to be a defined goal in order to make such a judgment. Does the system achieve what I set out to achieve? And what is the goal, really?

When looking at Facebook and what the people around me do with it, the question of what “the politics of systems” could mean becomes a little clearer: how does the system affect people’s social network (as social structure) by allowing them to build a social network (as database)? What’s the (implicit?) goal that informs the system’s design?

Social networking systems are in their infancy and both technology and uses will probably evolve rapidly. For the moment, at least, what Facebook seems to be doing is quite simply to sociodigitize as many forms of interaction as possible; to render the implicit explicit by formalizing it into data and algorithms. But beware merry people of The Book of Faces! For in a database “identity” and “consumer profile” are one and the same thing. And that might just be the design goal…

I have admired the work of Geoffrey Bowker and Susan Leigh Star for quite a while, especially their co-authored book Sorting Things Out is a major step towards understanding how systems of classification structure fields of perception and, consequently, action. The study of advanced technology is intrinsically related to information handling (in the largest sense, ranging from human cognition to information science): building categories, models, languages, and metaphors is a major part of designing information systems and with the ongoing infiltration of society by IT, the process of formalization (i.e. the construction of analytical categories that translate our messy world into manageable symbolic representations) has become a major difficulty in many software projects that concern human work settings. Ontology is ontology indeed but very often “reality as phenomenon” does resist being turned into “reality as model” – our social world is too complex and incoherent to fit into tidy data models. The incongruity between the two explains why there are so many competing classifications, models, and theories in the humanities and social sciences: no single explanation can claim to adequately cover even a small section of the cultural world. Our research is necessarily cumulative and tentative.

The categories and models used to build information systems are only propositions too, but they are certainly not (only) descriptive in nature. There is peculiar performativity to information structures that are part of software because they do not only affect people on the level of “ideas have impacts”. A scientific theory has to be understood, at least in part, in order to make a difference. When PageRank, which is basically a theory on the production of relevancy, became an implemented algorithm, there was no need for people to understand how it worked in order for it to become effective. Information technology relies on the reliable but brainless causality of the natural world to in-form the cultural world.

Why am I writing about this? The University of Vienna (my first alma mater) is organizing a workshop [german] on search engines before Google. And “before” should be read as “before digital technology” (think “library catalogue”). This is a very good idea because instead of obsessing about the “effects” that IT might have (or not) on “society” I believe we should take a step back and look at the categories, models, and theories that our information technologies are based on. And as a first step that means going back in time and researching the intellectual genealogy that is behind these nasty algorithms. The abstract I sent in (four days late, shame on me) proposes to look at early developments in bibliometrics that lead to the development of impact analysis, which is the main inspiration for PageRank.

The proposal is part of this project on mathematics and the humanities that I’m fantasizing about, but that’s a story for another day.