Monthly Archives: November 2007

Two things currently stand out in my life: a) I’m working on an article on the relationship between mathematical network analysis and the humanities, and b) continental Europe is finally discovering Facebook. The fact that A is highly stimulating (some of the stuff I’m reading is just very decent scholarship, especially Mathématiques et Sciences humaines [mostly French, some English] is a source of wonder) and B quite annoying (no, I don’t miss kindergarten) is of little importance here; there is, however, a connection between the two things that I would like to explore a little bit here.

Part of the research that I’m looking into is what has been called “The New Science of Networks” (NSN), a field founded mostly by physicists and mathematicians that started to quantitatively analyze very big networks belonging to very different domains (networks of acquaintance, the Internet, food networks, brain connectivity, movie actor networks, disease spread, etc.). Sociologists have worked with mathematical analysis and network concepts from at least the 1930ies but because of the limits of available data, the networks studied rarely went beyond hundreds of nodes. NSN however studies networks with millions of nodes and tries to come up with representations of structure, dynamics and growth that are not just used to make sense of empirical data but also to build simulations and come up with models that are independent of specific domains of application.

Very large data sets have only become available in recent history: social network data used to be based on either observation or surveys and thus inherently limited. Since the arrival of digital networking, a lot more data has been produced because many forms of communication or interaction leave analyzable traces. From newsgroups to trackback networks on blogs, very simple crawler programs suffice to produce matrices that include millions of nodes and can be played around with indefinitely, from all kinds of angles. Social network sites like Facebook or MySpace are probably the best example for data pools just waiting to be analyzed by network scientists (and marketers, but that’s a different story). This brings me to a naive question: what is a social network?

The problem of creating data sets for quantitative analysis in the social sciences is always twofold: a) what do I formalize, i.e. what are the variables I want to measure? b) how do I produce my data? The question is that of building a representation. Do my categories represent the defining traits of the system I wish to study? Do my measuring instruments truly capture the categories I decided on? In short: what to measure and how to measure it, categories and machinery. The results of mathematical analysis (which is not necessarily statistical in nature) will only begin to make sense if formalization and data collection were done with sufficient care. So, again, what is a social network?

Facebook (pars pro toto for the whole category qua currently most annoying of the bunch) allows me to add “friends” to my “network”. By doing so, I am “digitally mapping out the relationships I already have”, as Mark Zuckerberg recently explained. So I am, indeed, creating a data model of my social network. Fifty million people are doing the same, so the result is a digital representation of the social connectivity of an important part of the Internet-connected world. From a social science research perspective, we could now ask whether Facebook’s social network (as database) is a good model of the social network (as social structure) it supposedly maps. This does, of course, depend on what somebody would want to study but if you ask yourself, whether Facebook is an accurate map of your social connections, you’ll probably say no. For the moment, the formalization and data collection that apply when people use a social networking site does not capture the whole gamut of our daily social interactions (work, institutions, groceries, etc.) and does not include many of the people that play important roles in our lives. This does not mean that Facebook would not be an interesting data set to explore quantitatively; but it means that there still is an important distinction between the formal model (data and algorithm, what? and how?) of “social network” produced by this type of information system and the reality of daily social experience.

So what’s my point? Facebook is not a research tool for the social sciences and nobody cares whether the digital maps of our social networks are accurate or not. Facebook’s data model was not created to represent a social system but to produce a social system. Unlike the descriptive models of science, computer models are performative in a very materialist sense. As Baudrillard argues, the question is no longer whether the map adequately represents the territory, but in which way the map is becoming the new territory. The data model in Facebook is a model in the sense that it orients rather than represents. The “machinery” is not there to measure but to produce a set of possibilities for action. The social network (as database) is set to change the way our social network (as social structure) works – to produce reality rather than map it. But much as we can criticize data models in research for not being adequate to the phenomena they try to describe, we can examine data models, algorithms and interfaces of information systems and decide whether they are adequate for the task at hand. In science, “adequate” can only be defined in connection to the research question. In design and engineering there needs to be a defined goal in order to make such a judgment. Does the system achieve what I set out to achieve? And what is the goal, really?

When looking at Facebook and what the people around me do with it, the question of what “the politics of systems” could mean becomes a little clearer: how does the system affect people’s social network (as social structure) by allowing them to build a social network (as database)? What’s the (implicit?) goal that informs the system’s design?

Social networking systems are in their infancy and both technology and uses will probably evolve rapidly. For the moment, at least, what Facebook seems to be doing is quite simply to sociodigitize as many forms of interaction as possible; to render the implicit explicit by formalizing it into data and algorithms. But beware merry people of The Book of Faces! For in a database “identity” and “consumer profile” are one and the same thing. And that might just be the design goal…

I have admired the work of Geoffrey Bowker and Susan Leigh Star for quite a while, especially their co-authored book Sorting Things Out is a major step towards understanding how systems of classification structure fields of perception and, consequently, action. The study of advanced technology is intrinsically related to information handling (in the largest sense, ranging from human cognition to information science): building categories, models, languages, and metaphors is a major part of designing information systems and with the ongoing infiltration of society by IT, the process of formalization (i.e. the construction of analytical categories that translate our messy world into manageable symbolic representations) has become a major difficulty in many software projects that concern human work settings. Ontology is ontology indeed but very often “reality as phenomenon” does resist being turned into “reality as model” – our social world is too complex and incoherent to fit into tidy data models. The incongruity between the two explains why there are so many competing classifications, models, and theories in the humanities and social sciences: no single explanation can claim to adequately cover even a small section of the cultural world. Our research is necessarily cumulative and tentative.

The categories and models used to build information systems are only propositions too, but they are certainly not (only) descriptive in nature. There is peculiar performativity to information structures that are part of software because they do not only affect people on the level of “ideas have impacts”. A scientific theory has to be understood, at least in part, in order to make a difference. When PageRank, which is basically a theory on the production of relevancy, became an implemented algorithm, there was no need for people to understand how it worked in order for it to become effective. Information technology relies on the reliable but brainless causality of the natural world to in-form the cultural world.

Why am I writing about this? The University of Vienna (my first alma mater) is organizing a workshop [german] on search engines before Google. And “before” should be read as “before digital technology” (think “library catalogue”). This is a very good idea because instead of obsessing about the “effects” that IT might have (or not) on “society” I believe we should take a step back and look at the categories, models, and theories that our information technologies are based on. And as a first step that means going back in time and researching the intellectual genealogy that is behind these nasty algorithms. The abstract I sent in (four days late, shame on me) proposes to look at early developments in bibliometrics that lead to the development of impact analysis, which is the main inspiration for PageRank.

The proposal is part of this project on mathematics and the humanities that I’m fantasizing about, but that’s a story for another day.