…is so much easier if you’ve got a couple of popular pages to advertise on…
…and another one…

…browser wars all over again…
When it comes to search interfaces, there are a lot of good ideas out there, but there is also a lot of potential for further experimentation. Search APIs are a great field for experimentation as they allow developers to play around with advanced functionality without forcing them to work on a heavy backend structure.
Together with Alex Beaugrand, a student of mine, I have built (a couple of month ago) another little search mashup / interface that allows users to switch between a tag cloud view and a list / cluster mode. contextDigger uses the delicious and Bing APIs to widen the search space using associated searches / terms and then Yahoo BOSS to download a thousand results that can be filtered through the interface. It uses the principle of faceted navigation to shorten the list : if you click on two terms, only the results associated with both of them will appear…
Since I have started to play around with the latest (and really great, easy to use) version of the gephi graph visualization and analysis platform, I have developed an obsession to build .gdf output (.gdf is a graph description format that you can open with gephi) into everything I come across. The latest addition is a Facebook application called netvizz that creates a .gdf file describing either your personal network or the groups you are a member of.
There are of course many applications that let you visualize your network directly in Facebook but by being able to download a file, you can choose your own visualization tool, play around with it, select and parameter layout algorithms, change colors and sizes, rearrange by hand, and so forth. Toolkits like gephi are just so much more powerful than Flash toys…

my puny facebook network - gephi can process much larger graphs
What’s rather striking about these Facebook networks is how much the shape is connected to physical and social mobility. If you look at my network, you can easily see the Klagenfurt (my hometown) cluster to the very right, my studies in Vienna in the middle, and my French universe on the left. The small grape on the top left documents two semesters of teaching at the American University of Paris…
Update: v0.2 of netvizz is out, allowing you to add some data for each profile. Next up is GraphML and Mondrian file support, more data for profiles, etc…
Update 2: netvizz currently only works with http and not https. I will try to move the app to a different server ASAP.
My colleague Theo Röhle and I went to the Computational Turn conference this week. While I would have preferred to hear a bit more on truly digital research methodology (in the fully scientific sense of the word “method”), the day was really quite interesting and the weather unexpectedly gorgeous. Most of the papers are available on the conference site, make sure to have a look. The text I wrote with Theo tried to structure some of the epistemological challenges and problems to take into account when working with digital methods. Here’s a tidbit:
…digital technology is set to change the way scholars work with their material, how they “see” it and interact with it. The question is, now, how well the humanities are prepared for these transformations. If there truly is a paradigm shift on the horizon, we will have to dig deeper into the methodological assumptions that are folded into the new tools. We will need to uncover the concepts and models that have carried over from different disciplines into the programs we employ today…
The question of how mathematics could lay the foundation for a machine that sustains such a wide variety of practices is really quite well understood from the point of view of the mathematical theory of computation. From a humanities standpoint however, despite the number of texts commenting on the genius of key figures such as Gödel, Turing, Shannon, and Church, there is still a certain awkwardness when it comes to situating the key steps in mathematical reasoning that lead up to the birth of the computer in the larger context of mathematics itself. One of the questions I find really quite interesting is the role of the formalist stance in mathematics.
In the philosophy of mathematics, there are many different positions. The realist stance for example holds that mathematical objects exist. For the platonist, they exist in some kind of extra spatio-temporal realm of ideas. For the physicalist, they are intrinsically connected to material existence, even if that relationship is not necessarily simple. Then there is formalism and this is where things get interesting. In a tale we can read in many social sciences and humanities books on the computer, there is the young Kurt Gödel that smashes the coherent world of the “establishment” mathematician David Hilbert, inventing the metamathematical tools that will later prove essential for the practical realization of computing machinery in the process. What is most often overlooked in that story is that Hilbert’s formalist position is already an extremely important step in the preparation for what is to come. For Hilbert, the question of the ontological status of mathematical objects is already a no-go – truth is no longer defined via any kind of correspondence to an external system but as a function of the internal coherence of the symbolic system. As Bettina Heintz says, Hilbert’s work rendered mathematical concepts “self-sufficient” (autark) by liberating them from any kind of external benchmark and opening a purely mechanical world where symbolic machinery can be built at will, like in a game.
If we want to think about computing today, I think we should remember this break from an ontological concept of truth to a purely formalistic one (even if that mean Gödel put a pretty big crack in it lateron). Because in a way, programming is like a “game” with formulas and if the algorithm works, that means it is “true”. In this sense, Google’s PageRank algorithm is true. But without the reference to an external system, this “truth” is purely mechanical, internal. In a similar way, an algorithm’s claim to objectivity, impartiality, or neutrality should be seen as internal only. The moment we apply mathematics to the description of some external mechanism (gravity, for example), there is a second truth criterion that intervenes, which refers to the establishment of correspondence between the formal system and the external reality. In the same way, if an algorithm is applied to, let’s say the filtering of information, the formal world of the game is mapped onto another world. There is an important difference however. When mathematics are applied to physical phenomena, the gesture is descriptive and epistemological (verb: is). When an algorithms is applied to tasks such as information filtering, the gesture is prescriptive and political (verb: ought).
The fact than an automatic procedure works makes it true in a formal sense. The moment we apply it to a certain task, other criteria intervene. Hilbert’s formalism pulled mathematics from the empirical world and if we bring the two together again by writing software, the criteria by which we judge the quality of that action should be seen as political because there are no mathematical criteria to judge the mapping of on world onto the other. No Hilbert to hold our hand…
Since Yahoo recently ~sold its search business to Microsoft (see this NYT article for details) a lot of people where asking themselves what would happen to the Yahoo search APIs, which are in fact some of the most powerful free tools out there to built search mashups with. As Simon Wilson indicates in this blog post, some of them (Term Extraction and Contextual Web Search) are closing down at the end of August. Programmable Web lists 33 mashups that use the Term Extraction service and these sites will either have to close down or start looking for alternatives. This highlights a problem that can be a true roadblock for developing applications making heavy use of APIs. My own termcloud search and its spiced up cousin contextdigger use Yahoo BOSS and quite honestly, if MS kills that Service, these experiments (and many others) will be gone for good, because Yahoo BOSS is the only search API that provides a list of extracted keywords for each delivered Web result.
If service providers can close APIs at will, developers might hesitate when deciding whether to put in the necessary coding hours to built the latest mashup. But it is mashups that over the last years have really explored many of the directions left blank by “pure” applications. This creative force should be cherished and I wonder if there may be a need for something similar to creative commons for APIs – a legal construct that gives at least some basic rights to mashup developers…
Over the last couple of years, the social sciences have been increasingly interested in using computer-based tools to analyze the complexity of the social ant farm that is the Web. Issuecrawler was one of the first of such tools and today researchers are indeed using very sophisticated pieces of software to “see” the Web. Sciences-Po, one of these rather strange french institutions that were founded to educate the elite but which now have to increasingly justify their existence by producing research, has recently hired Bruno Latour to head their new médialab, which will most probably head into that very direction. Given Latour’s background (and the fact that Paul Girard, a very competent former colleague at my lab, heads the R&D departement), this should be really very interesting. I do hope that there will be occasion to tackle the most compelling methodological question when in comes to the application of computers (or mathematics in general) to analyzing human life, which is beautifully framed in a rather reluctant statement from 1889 by Karl Pearson, a major figure in the history of statistics:
“Personally I ought to say that there is, in my own opinion, considerable danger in applying the methods of exact science to problems in descriptive science, whether they be problems of heredity or of political economy; the grace and logical accuracy of the mathematical processes are apt to so fascinate the descriptive scientist that he seeks for sociological hypotheses which fit his mathematical reasoning and this without first ascertaining whether the basis of his hypotheses is as broad as that human life to which the theory is to be applied.” cit. in. Stigler, Stephen M.: The History of Statistics. Harvard University Press, 1990 p. 304
This spring worked on an R&D project that was really quite interesting but – as it happens with projects – took up nearly all of my spare time. La montre verte is based on the idea that pollution measurement can be brought down to street level if sensors can be made small enough to be carried around by citizens. Together with a series of partners from the private sector, the CiTu group of my laboratory came up with the idea to put an ozone sensor and a microphone (to measure noise levels) into a watch. That way, the device is not very intrusive and still in direct contact with the surrounding air. We built about 15 prototypes, based on the fact that currently, Paris’ air quality is measured by only a handful of (really high quality) sensors and even the low resolution devices we have in our watches should therefore be able to complement that data with a geographically more fine grained analysis of noise and pollution levels. The watch produces a georeferenced measurement (a GPS is built into the watch) every second and transmits the data via Bluetooth to a Java application on a portable phone, which then sends every data packet via GPRS to a database server.
My job in the project was to build a Web application that allows people to interact with and make sense of the data produced by the watches. Despite the help from several brilliant students from our professional Masters program, this proved to be a daunting task and I spent *at lot* of time programming. The result is quite OK I believe; the application allows users to explore the data (which is organized in localized “experiments”) in different ways, either in real-time or afterward. With a little more time (we had only about three month for the whole project and we got the hardware only days before the first public showcase) we could have done more but I’m still quite content with the result. Especially the heatmap (see image) algorithm was fun to program, I’ve never done a lot of visual stuff so this was new territory and a steep learning curve.
Unfortunately, the strong emphasis on the technological side and the various problems we had (the agile methods one needs for experimental projects are still not understood by many companies) cut down the time for reflection to a minimum and did not allow us to come up with a deeper analysis of the social and political dimensions of what could be called “distributed urban intelligence”. The whole project is embedded in a somewhat naive rhetoric of citizen participation and the idea that technological innovation can solve social problems, in this case matters of urban planning and local governance. A lesson I have learned from this is that the current emphasis in funding on short-term projects that bring together universities and the industry makes it very difficult to carve out an actual space for scientific practice between all the deadlines and the heavy technical demands. And by scientific practice, I mean a *critical* practice that does not only try to base specifications and prototyping on “scientifically valid” approaches to building tools and objects but which includes a reflection on social utility that takes a wider view than just immediate usefulness. In the context of this project, this would have implied a close look at how urban development is currently configured in respect to environmental concerns in order to identify structures of governance and chains of decision-making. This way, the whole project could have targeted issues more clearly and consciously, fine-tuning both the tools and the accompanying discourse to the social dimension it aimed at.
I think my point is that we (at least I) have to learn how to better include a humanities-based research agenda into very high-tech projects. We have known for a long time now that every technical project is in fact a socio-technical enterprise but research funding and the project proposals that it generates are still pretending that the “socio-” part is some fluffy coating that decorates the manly material core where cogs and wire produce tangible effects. As I programmer I know how difficult and time-consuming technical work can be but if there is to be a conscious socio-technical perspective in R&D we have to accept that the fluffy stuff takes even more time – if it is done right. And to do it right means not only reading every book and paper relevant to a subject matter but to take the time to reflect on methodology, to evaluate every step critically, to go back to the drawing board, and to include and to produce theory every step of the way. There is a cost to the scientific method and if that cost is not figured in, the result may still be useful, interesting, thought-provoking, etc. but it will not be truly scientific. I believe that we should defend these costs and show why they are necessary; if we cannot do so, we risk confining the humanities to liberal armchair commentary and the social sciences to ex-post usage analysis.
After having finished my paper for the forthcoming deep search book I’ve been going back to programming a little bit and I’ve added a feature to termCloud search, which is now v0.4. The new “show relations” button highlights the eight terms with the highest co-occurrence frequency for a selected keyword. This is probably not the final form of the feature but if you crank up the number of terms (with the “term+” button) and look at the relations between some of the less common words, there are already quite interesting patterns being swept to the surface. My next Yahoo BOSS project, termZones, will try to use co-occurrence matrices from many more results to map discourse clusters (sets of words that appear very often together), but this will need a little more time because I’ll have to read up on algorithms to get that done…
PS: termCloud Search was recently a “mashup of the day” at programmeableweb.com…
Programmable web just pointed to a really interesting mashup competition. Sunlight labs announced the Apps for America contest and the idea is to attract programmers that will use a series of data APIs to “make Congress more accountable, interactive and transparent”. Among the criteria two stand out:
- Usefulness to constituents for watching over and communicating with their members of Congress
- Potential impact of ethical standards on Congress
The design goal is accountability and that indeed is a perfect case for society oriented design. While people in Europa love to scold the US for their lack of data protection and privacy laws, just looking at the APIs the contest proposes to use makes me salivate for something similar in France. If you look at the Capitol Words API for example, just imagine the kind of discourse analysis one could build on that. Representations of what is said in Congress that make the data digestable and bring at least some of the debate potentially closer to citizens. The whole thing is just a really great idea…