<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Politics of Systems &#187; epistemolgy</title>
	<atom:link href="http://thepoliticsofsystems.net/category/epistemolgy/feed/" rel="self" type="application/rss+xml" />
	<link>http://thepoliticsofsystems.net</link>
	<description>Thoughts on Software, Power, and Digital Method</description>
	<lastBuildDate>Thu, 17 May 2012 08:10:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Google&#8217;s knowledge graph: here comes the ontology</title>
		<link>http://thepoliticsofsystems.net/2012/05/googles-knowledge-graph-here-comes-the-ontology/</link>
		<comments>http://thepoliticsofsystems.net/2012/05/googles-knowledge-graph-here-comes-the-ontology/#comments</comments>
		<pubDate>Thu, 17 May 2012 08:04:49 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[actor-network theory]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[critique]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[economy]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[ontologies]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=479</guid>
		<description><![CDATA[Yesterday, Google introduced a new feature, which represents a substantial extension to how their search engine presents information and marks a significant departure from some of the principles that have underpinned their conceptual and technological approach since 1998. The &#8220;knowledge graph&#8221; basically adds a layer to the search engine that is based on formal knowledge]]></description>
			<content:encoded><![CDATA[<p>Yesterday, Google <a href="http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html">introduced </a>a new feature, which represents a substantial extension to how their search engine presents information and marks a significant departure from some of the principles that have underpinned their conceptual and technological approach since 1998. The &#8220;knowledge graph&#8221; basically adds a layer to the search engine that is based on formal knowledge modelling rather than word statistics (relevance measures) and link analysis (authority measures). As the title of the post on Google&#8217;s search blog aptly points out, the new features work by searching &#8220;things not strings&#8221;, because what they call the knowledge graph is simply a &#8211; very large &#8211; <a href="http://en.wikipedia.org/wiki/Ontology_%28information_science%29">ontology</a>, a formal description of objects in the world. Unfortunately, the roll-out is progressive and I have not yet been able to access the new features, but the descriptions, pictures, and video paint a rather clear picture of what product manager Johanna Wright calls the move &#8220;from an information engine to a knowledge engine&#8221;. In terms of the <a href="http://en.wikipedia.org/wiki/DIKW">DIKW model</a> (Data-Information-Knowledge-Wisdom), the new feature proposes to move up a layer by adding a box of factual information on a recognized object (the examples Google uses are the Taj Mahal, Marie Curie, Matt Groening, etc.) next to the search results. From the presentation, we can gather that the 500 million objects already referenced will include a large variety of things, such as movies, events, organizations, ideas, and so on.</p>
<p><img class="alignleft" title="Knowledge Graph" src="http://4.bp.blogspot.com/-6CZW79UMwyg/T7PKsKaiyyI/AAAAAAAAJK0/yj5a8qKknQg/s2000/marie%2Bcurie.png" alt="" width="640" height="286" /></p>
<p>This is really a very significant extension to the current logice and although we&#8217;ll need more time to try things out and get a better understanding of what this actually means, there are a couple of things that we can already single out:</p>
<ul>
<li>On a feature level, the fact box brings Google closer to &#8220;knowledge engines&#8221; such as <a href="http://www.wolframalpha.com/">Wolfram Alpha</a> and as we learn from the explanatory video, this explicitly includes semantic or computational queries, such as &#8220;how many women won the Nobel Prize?&#8221; type of questions.</li>
<li>If we consider Wikipedia to be a similar &#8220;description layer&#8221;, the fact box can also be seen as a competitor to everybody&#8217;s favorite encyclopedia, which is a further step into the direction of bringing information directly to the surface of the results page instead of simply referring to a location. This means that users do not have to leave the Google garden to find a quick answer. It will be interesting to see whether this will actually show up in Wikipedia traffic stats.</li>
<li>The introduction of an ontology layer is a significant departure from the largely statistical and graph theoretical methods favored by Google in the past. While features based on knowledge modelling have proliferated around the margins (e.g. in Google Maps and Local Search), the company is now bringing them to the center stage. From what I understand, the selection of &#8220;facts&#8221; to display will be largely driven by user statistics but the facts themselves come from places like <a href="http://www.freebase.com/">Freebase</a>, which Google bought in 2010. While large scale ontologies were prohibitive in the past, a combination of the availability of crowd-sourced databases (Wikipedia, etc.), the open data movement, better knowledge extraction mechanisms, and simply the resources to hire people to do manual repairs has apparently made them a viable option for a company of Google&#8217;s size.</li>
<li>Competing with the dominant search engine has just become a lot harder (again). If users like the new feature, the threshold for market entry moves up because this is not a trivial technical gimmick that can be easily replicated.</li>
<li>The knowledge graph will most certainly spread out into many other services (it&#8217;s already implemented in the new Google Docs <a href="http://arstechnica.com/business/2012/05/google-docs-new-sidebar-makes-research-faster/">research bar</a>), further boosting the company&#8217;s economies of scale and enhancing cross-navigation between the different services.</li>
<li>If the fact box &#8211; and the features that may follow &#8211; becomes a pervasive and popular feature, Google&#8217;s participation in making information and knowledge accessible, in defining its shape, scope, and relevance, will be further extended. This is a reason to worry a bit more, not because the Google tools as such are a danger, but simply because of the levels of institutional and economic concentration the Internet has enabled. The company has become what Michel Callon calls an &#8220;obligatory passage point&#8221; in our relation to the Web and beyond; the knowledge graph has the potential to exacerbate the situation even further.</li>
</ul>
<p>This is a development that looks like another element in the war for dominance on the Web that is currently fought at a frenetic pace. Since the introduction of <a href="https://developers.facebook.com/docs/opengraph/">actions</a> into Facebook&#8217;s social graph, it has become clear that approaches based on ontologies and concept modelling will play an increasing role in this. In a world mediated by screens, the technological control of <em>meaning</em> &#8211; the one true metamedium &#8211; is the new battleground. I guess that this is not what Berners-Lee had in mind for the Semantic Web&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2012/05/googles-knowledge-graph-here-comes-the-ontology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>simple causality, ranking, and segmentation</title>
		<link>http://thepoliticsofsystems.net/2011/07/simple-causality-ranking-and-segmentation/</link>
		<comments>http://thepoliticsofsystems.net/2011/07/simple-causality-ranking-and-segmentation/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 11:05:54 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[technological determinism]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=373</guid>
		<description><![CDATA[While scholars often underline their commitment to non-deterministic conceptions of &#8220;effects&#8221;, models of causality in the human and social sciences can still be a bit simplistic sometimes. But a more subtle approach to causality would have to concede that, while most often cumulative and contradictory, lines of causation can sometimes be quite straightforward. Just consider]]></description>
			<content:encoded><![CDATA[<p>While scholars often underline their commitment to non-deterministic conceptions of &#8220;effects&#8221;, models of causality in the human and social sciences can still be a bit simplistic sometimes. But a more subtle approach to causality would have to concede that, while most often cumulative and contradictory, lines of causation can sometimes be quite straightforward. Just consider this example from<em> Commensuration as a Social Process</em>, a great <a href="steinhardt.nyu.edu/scmsAdmin/uploads/000/341/Commensuration.pdf">text </a>from 1998 by Espeland and Stevens:</p>
<blockquote><p>Faculty at a well-regarded liberal arts college recently received unexpected, generous raises. Some, concerned over the disparity between their comfortable salaries and those of the college&#8217;s arguably underpaid staff, offered to share their raises with staff members. Their offers were rejected by administrators, who explained that their raises were &#8216;not about them.&#8217; Faculty salaries are one criterion magazines use to rank colleges. (p.313)</p></blockquote>
<p>This is a rather direct effect of ranking techniques on something very tangible, namely salary. But the relative straightforwardness of the example also highlights a bifurcation of effects: faculty gets paid more, staff less. The specific construction of the ranking mechanism in question therefore produces social segmentation. Or does it simply reinforce the existing segmentation between faculty and staff that lead college evaluators to construct the indicators the way they did in the first place? Well, there goes the simplicity&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/07/simple-causality-ranking-and-segmentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>a fragment of simondon</title>
		<link>http://thepoliticsofsystems.net/2011/07/a-fragment-of-simondon/</link>
		<comments>http://thepoliticsofsystems.net/2011/07/a-fragment-of-simondon/#comments</comments>
		<pubDate>Sat, 16 Jul 2011 08:20:05 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[method]]></category>
		<category><![CDATA[philosophy]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=366</guid>
		<description><![CDATA[Simondon&#8217;s Du mode d&#8217;existence des objets techniques from 1958 is a most wondrous book. It is not only Simondon&#8217;s theory of technology in itself that fascinates me, but rather the intimate closeness with particular technical objects that resonates  through the whole text and marks a fundamental break with the greek heritage of thinking about technology as a]]></description>
			<content:encoded><![CDATA[<p>Simondon&#8217;s <em>Du mode d&#8217;existence des objets techniques</em> from 1958 is a most wondrous book. It is not only Simondon&#8217;s theory of technology in itself that fascinates me, but rather the intimate closeness with particular technical objects that resonates  through the whole text and marks a fundamental break with the greek heritage of thinking about technology as a unified and coherent force. When Simondon reasons over numerous pages on the difference between a diode and a triode, he accords significance to something that was considered insignificant by virtually every philosopher in history. By conferring a sense of <em>dignity</em> to technology, a certain <em>profoundness</em>, he is able to see heterogeneity and particularity where others before him just saw the declinations of the singular principle of <em>techné</em>. In a distinctly beautiful passage, Simondon argues that &#8220;technological thinking&#8221; itself is not totalizing but fragmenting:</p>
<blockquote><p>&#8220;L&#8217;élément, dans la pensée technique, est plus stable, mieux connu, et en quelque manière plus parfait que l&#8217;ensemble ; il est réellement un <em>objet</em>, alors que l&#8217;ensemble reste toujours dans une certaine mesure inhérent au monde. La pensée religieuse trouve l&#8217;équilibre inverse : pour elle, c&#8217;est la totalité qui est plus stable, plus forte, plus valable que l&#8217;élément.&#8221; (Simondon 1958, p. 175)</p></blockquote>
<p>And my translation:</p>
<blockquote><p>&#8220;In technological thinking, it is the element that is more stable, better known and &#8211; in a certain sense &#8211; more perfect than the whole; it is truly an <em>object</em>, whereas the whole always stays inherent to the world to a certain extend. Religious thinking finds the opposite balance: here, it is the whole that is more stable, stronger, and more valid than the element.&#8221;</p></blockquote>
<p>Philosophical thinking, according to Simondon, should strive to situate itself in the interval that separates the two approaches, technological thinking and religious thinking, concept and idea, plurality and totality, <em>a posteriori</em> and <em>a priori</em>. Here, the question of <em>How? </em>is not subordinate to the question of <em>Why? </em>because it is the former that connects us to the world that we inhabit as physical beings. Understanding technology means understanding how the two levels relate and constitute a <em>world</em>. There are two forms of ethics and two forms of knowledge that must be combined both intellectually and practically. Simondon obviously strives to do just that. I would argue that Philip Agre&#8217;s concept of <a href="http://polaris.gseis.ucla.edu/pagre/critical.html">critical technical practice</a> is another attempt at pretty much the same challenge.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/07/a-fragment-of-simondon/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>goldman-sachs, software critic</title>
		<link>http://thepoliticsofsystems.net/2011/05/goldman-sachs-software-critic/</link>
		<comments>http://thepoliticsofsystems.net/2011/05/goldman-sachs-software-critic/#comments</comments>
		<pubDate>Tue, 03 May 2011 10:23:21 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[economy]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[technological determinism]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=332</guid>
		<description><![CDATA[In August 2010, Edinburgh Sociologist Donald MacKenzie (whose book An Engine, not a Camera is an outstanding piece of scholarship) wrote an article in the Financial Times titled Unlocking the Language of Structured Securities where he discusses a software suite for financial analysis called Intex and compares it to a language that allows to see]]></description>
			<content:encoded><![CDATA[<p>In August 2010, Edinburgh Sociologist Donald MacKenzie (whose book <em><a href="http://books.google.com/books?id=M3x5tvAwzrQC">An Engine, not a Camera</a> </em>is an outstanding piece of scholarship) wrote an <a href="http://www.sps.ed.ac.uk/__data/assets/pdf_file/0007/53998/ftaug10.pdf">article</a> in the Financial Times titled <em>Unlocking the Language of Structured Securities</em> where he discusses a software suite for financial analysis called <a href="http://www.intex.com/main/">Intex</a> and compares it to a <em>language </em>that allows to see and interact with the world in certain ways rather than others. MacKenzie describes his first encounter with Intex as a moment of revelation that quickly turned into doubt:</p>
<blockquote><p>The psychological effect was striking: for the first time, I felt I could understand mortgage-backed securities. Of course, my new-found confidence was spurious. The reliability of Intex’s output depends entirely on the validity of the user’s assumptions about prepayment, default and severity. Nevertheless, it is interesting to speculate whether some of the pre-crisis vogue for mortgage-backed securities resulted from having a system that enabled neophytes such as myself to feel they understood them.</p></blockquote>
<p>While MacKenzie does not go as far as imputing the recent financial crisis to a piece of software, he points out that Intex is not recursive in its mode of analysis: when evaluating a complex financial asset, for example one of the now (in)famous <a href="http://en.wikipedia.org/wiki/Collateralized_debt_obligations">CDOs</a> that are made up of other assets, themselves combining further values, and so on, Intex does not follow the trail down to the basic entities (the individual mortgage) but calculates risk only from the rating of the asset in question. MacKenzie argues that Goldman-Sachs&#8217; 2006 decision to basically get out of mortgage-based securities may well be a result of their commitment to go beyond available tools by implementing a (very costly) &#8220;bottom-up&#8221; approach that builds its evaluation of an asset by calculating up from the basic units of value. The card-house character of these financial instruments could become visible by changing tools and thereby changing perspective or language. Software makes it possible to implement very different practices or languages and to make them pervasive; but how does a company chose one strategy over another? What are the organizational and &#8220;cultural&#8221; factors that lead Goldman-Sachs to change its approach? These may be the truly challenging questions here, although they may never get answered. But they lead to a methodological lesson.</p>
<p>The particular strength of systems like Intex lies in their capacity to black-box evaluation strategies behind a neat interface that allows users to immediately operate on the underlying models, weaving these models into their decisions and practices. Conceptually, we understand the ways in which software shapes action better and better but the empirical complexity of concrete settings is positively daunting even outside of the realm of financial markets. What I take from MacKenzie&#8217;s work is that in order to understand the role of software, we have to be very familiar with the specific terrain a system is embedded in, instead of bringing overarching assumptions to the table. Software is a means for building structure and this building is always happening in particular organizational settings that are certainly caught up in larger trends but also full of local challenges, politics, and knowledge. Programs are at the same time structuring backdrop practice and part of a strategic repertoire that actors dispose of.</p>
<p>The case of financial software indicates that market behavior standardizes around available tools which leads to the systemic delegation of certain decision processes to software makers. This may result in a particular type of herd behavior and potentially in imbalance and crisis. Somewhat ironically, it is Goldman-Sachs that showed the potential of going against the grain by questioning programmed wisdom. That the company recently paid $550M in fines for abusing their analytical advantage by betting against a CDO they were selling to customers as an investment indicates that ethics and cunning are unfortunately two pair of shoes&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/05/goldman-sachs-software-critic/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>studying culture, the Google way</title>
		<link>http://thepoliticsofsystems.net/2011/03/studying-culture-the-google-way/</link>
		<comments>http://thepoliticsofsystems.net/2011/03/studying-culture-the-google-way/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 12:01:44 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[method]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=280</guid>
		<description><![CDATA[While there are probably a lot of people that have stumbled over the Google Ngram Viewer, it is safe to assume that fewer have read the paper (Science, January 2011) by Michel et al. that documents the project and gives a good idea of the kind of &#8220;big iron&#8221; science we can expect to capture quite]]></description>
			<content:encoded><![CDATA[<p>While there are probably a lot of people that have stumbled over the <a href="http://ngrams.googlelabs.com/">Google Ngram Viewer</a>, it is safe to assume that fewer have read the <a href="http://www.sciencemag.org/content/331/6014/176.full">paper</a> (Science, January 2011) by Michel et al. that documents the project and gives a good idea of the kind of &#8220;big iron&#8221; science we can expect to capture quite a lot of attention over the next couple of years. According to the (14, one being &#8220;The Google Books Team&#8221;, another <a href="http://pinker.wjh.harvard.edu/">Steven Pinker</a>) authors, the projet &#8211; fittingly termed <a href="http://www.culturomics.org/">culturomics</a> &#8211; is based on a sample of 5,195,769 books, which apparently represents roughly 4% of all the books ever published. They easiest way to show the scope of what the researchers aim to do is quoting the abstract in full:</p>
<blockquote><p>We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.</p></blockquote>
<p>Next to the sheer size of the corpus, there are several things that are quite remarkable with this project:</p>
<p>1) While the paper is full of graphs, it is immensely interesting that many of the measurements taken can be &#8220;reenacted&#8221; with the <a href="http://ngrams.googlelabs.com/">Ngram Viewer</a>. In a passage that diagnoses &#8220;a greater focus on the present&#8221; in more recent publications, the authors show that the half-life (i.e. the number of years it takes for a date to get to half the frequency value of an initial peak) of dates gets much shorter over time. We can easily graph the result ourselves:<a href="http://ngrams.googlelabs.com/graph?content=1850%2C1900%2C1950%2C1970%2C1980%2C1990&amp;year_start=1800&amp;year_end=2008&amp;corpus=0&amp;smoothing=3"><img class="alignnone size-full wp-image-282" title="Ngram Viewer" src="http://thepoliticsofsystems.net/wp-content/uploads/2011/03/Screen-shot-2011-03-02-at-10.40.05-.png" alt="" width="580" /></a>This possibility to query the data ourselves (as well as the comprehensive data sharing) represents quite a change in how we can relate to the results as scholars and while only the most well-funded projects will be able to provide a &#8220;companion&#8221; data-tool, there is a real epistemological shift underway. From a teaching perspective, the hands-on approach may actually be even more valuable.</p>
<p>2) We increasingly have very comprehensive available data sets that can be used as concept markers in very different contexts. In this case, the authors used 740.000 names of persons from Wikipedia to study different aspects of fame. But one could easily imagine using <a href="http://www.geonames.org/">GeoNames</a> to perform a similar survey of the ebb and fall of geographic prominence. I am quite sure that linguists will soon bring together the Ngram data with <a href="http://wordnet.princeton.edu/">WordNet</a> to study concept evolution and other things.</p>
<p>3) While the examples developed in the article are fascinating &#8211; and there will certainly be many more &#8211; the epistemological horizon is quite vague for the moment. There is no question that historical linguistics will have a field day plunging into the data, but the intellectual rationale behind the project of culturomics is a bit thin for the moment:</p>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Times; color: #1a1a18} span.s1 {font: 9.0px Helvetica} --></p>
<blockquote><p>Culturomics is the application of high-throughput data collection and analysis to the study of human culture. Books are a beginning, but we must also incorporate newspapers, manuscripts, maps, artwork, and a myriad of other human creations. Of course, many voices—already lost to time— lie forever beyond our reach.</p>
<p>Culturomic results are a new type of evidence in the humanities. As with fossils of ancient creatures, the challenge of culturomics lies in the interpretation of this evidence.</p></blockquote>
<p>I would argue that it is not so much the interpretation of evidence that represents a challenge but the integration of these new computer-based approaches into meaningful research agendas that ask non-trivial questions. While it may be interesting to be able to attach a number to the competence of Nazi censorship efforts, this competence  was never very much in doubt and while numbers and graphs may confer an aura of scientific respectability, the findings will most probably not add anything to our understanding of national socialism.</p>
<p>While it is increasingly unpopular to cite Snow&#8217;s <a href="http://www.nytimes.com/2009/03/22/books/review/Dizikes-t.html">Two Cultures</a>, this early proposal for a quantitative approach to culture (in its historic dimension) will give rise to all kinds of polemics, misunderstandings, and demarkation efforts. The public availability of a query tool is, however, a real reason for hope: humanities scholars will be able to try it out for themselves and with a bit of luck, we will have a broader view on its usefulness for cultural analysis in a couple of month.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/03/studying-culture-the-google-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>counting and counting</title>
		<link>http://thepoliticsofsystems.net/2010/11/counting-and-counting/</link>
		<comments>http://thepoliticsofsystems.net/2010/11/counting-and-counting/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 17:26:14 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[critique]]></category>
		<category><![CDATA[epistemolgy]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=246</guid>
		<description><![CDATA[The use of computers in the humanities has a long and fine history. What is striking though is how lucid scholars reflected on their tools even in the earliest days. Here&#8217;s a beautiful citation by Irwin C. Lieb from a text published in the the inaugural issue of Computers in the Humanities, a journal started]]></description>
			<content:encoded><![CDATA[<p>The use of computers in the humanities has a long and fine history. What is striking though is how lucid scholars reflected on their tools even in the earliest days. Here&#8217;s a beautiful citation by Irwin C. Lieb from a <a href="http://www.springerlink.com/content/ut48k13858644j32/">text</a> published in the the inaugural issue of <em><a href="http://www.springerlink.com/content/100251/">Computers in the Humanities</a></em>, a journal started in 1966.</p>
<blockquote><p>The great advances which have so far been made with computers have been in those fields where we find countable items or have ready substitutes for them. The real or seeming extraneousness of computer studies for the humanities is owed to the fact that, in the humanities, what are most important are, if items at all, items that we can&#8217;t count, or can count only most artificially. We know, for example, how little definite we mean in saying that we have two or three ideas, that there are four themes in a play, or that there were this or that number of historical events. Our &#8220;counting&#8221; is not the counting of items that were somehow there separate, waiting to be pointed out; it is a &#8220;counting&#8221; in which judgments themselves mark out what come to be the items that we count. Apart from the judgments, there are no separate items. Therefore, no technique of counting such items so as to yield, for the first time, a judgment or a summary is possible at all. But, granting that this sort of limitation is inescapable, computers could, it seems, still come to have a more vital use in the humanities than we have seen so far.</p>
<p>[...]</p>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 8.6px Times; color: #ffffff} span.s1 {font: 9.4px Times} -->The suggestion, then, is that some of the simplest but most important work to be done in deepening the usefulness of computers for the humanities will be in imagining those <em>schemas </em>by which we will model what we know cannot be modeled undistortedly: &#8212; ideas, themes, events and even more importantly, insights, appraisals, and appreciations. There are, there must be, revealing models for all of these. And as we think of them, and then use them in the humanities, the achievement for us will come as we feel out just what the distortions are, as we make the right mistakes. For as we see them as mistakes, we will penetrate further and still more appreciate what we are most concerned to understand. With the possibilities for computer studies of depth and importance in the humanities seeming still so genuine, it would be a mistake, I think, to curtail our exploration of them soon.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/11/counting-and-counting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Jacques Barzun on the Digital Humanities</title>
		<link>http://thepoliticsofsystems.net/2010/10/jacques-barzun-on-the-digital-humanities/</link>
		<comments>http://thepoliticsofsystems.net/2010/10/jacques-barzun-on-the-digital-humanities/#comments</comments>
		<pubDate>Fri, 15 Oct 2010 07:01:31 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[computing]]></category>
		<category><![CDATA[critique]]></category>
		<category><![CDATA[epistemolgy]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=232</guid>
		<description><![CDATA[Some debates are just so much older than our short forgetful minds allow us to recognize. In 1965 Jacques Barzun (still alive today at a biblical 102!) made the following statement: What have the humanities been doing for thirty-five years except to do exactly what a computer would do, only with their own unaided card]]></description>
			<content:encoded><![CDATA[<p>Some debates are just so much older than our short forgetful minds allow us to recognize. In 1965 Jacques Barzun (still alive today at a biblical 102!) made the following statement:</p>
<blockquote><p>What have the humanities been doing for thirty-five years except to do exactly what a computer would do, only with their own unaided card indexes and fountain pens? They have taken apart poetry, they have taken apart novels, they have counted images, they have followed symbols that are sometimes non-existent, they have destroyed their own subject matter by a pseudo-computer-like approach, and now they have only themselves to blame if they have to learn the tricks and the jargon of computerizing. (Jacques Barzun at a conference at Yale University, cited in. Taviss (ed.), The Computer Impact, 1970, p.199)</p></blockquote>
<p>While I have not found the original document of Barzun&#8217;s talk, Bowler (ed.), Computers in Humanistic Research, 1967, p.232 has a summary of his three main points of critique:</p>
<blockquote><p>First is the assumption of a false relation between the units defined and written and the reality they are supposed to represent. For example, 20 years ago, someone attempted to study genius by selecting names from <em>Who&#8217;s Who in America</em>, as being indicative of the quality of genius. Second is the fallacy of assessing importance by weight or numbers. The speaker mentioned a published census, again some 20 years ago, which indicated that the number of brownstone or frame houses in New York was much larger than the number of skyscrapers, giving the erroneous impression that the former represented the city&#8217;s characteristic architectural form. The third error is the attribution of meaning based upon only a partial study of the object in question. Two conspicuous examples of the faulty attribution of meaning to partial signs are the cases of machine translation and the objective tests given to school children and the people in business.</p></blockquote>
<p>Would it be very hard to find contemporary examples that fit these three points?</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/10/jacques-barzun-on-the-digital-humanities/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>graphs based on word vector similarity and the trickyness of parameters</title>
		<link>http://thepoliticsofsystems.net/2010/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/</link>
		<comments>http://thepoliticsofsystems.net/2010/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/#comments</comments>
		<pubDate>Sun, 10 Oct 2010 08:38:52 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=197</guid>
		<description><![CDATA[What is a link? From a methodology standpoint, there is no answer to that question but only the recognition that when using graph theory and associated software tools, we project certain aspects of a dataset as nodes and others as links. In my last post, I &#8220;projected&#8221; authors from the air-l list as nodes and]]></description>
			<content:encoded><![CDATA[<p>What is a link? From a methodology standpoint, there is no answer to that question but only the recognition that when using graph theory and associated software tools, we project certain aspects of a dataset as nodes and others as links. In my <a href="http://thepoliticsofsystems.net/2010/10/06/one-network-and-four-algorithms/">last post</a>, I &#8220;projected&#8221; authors from the air-l list as nodes and mail-reply relationships as links. In the example below, I still use authors as nodes but links are derived from a similarity measure of a statistical analysis of each poster&#8217;s mails. Here are two <a href="http://gephi.org/">gephi</a> graphs:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph.png"><img class="alignnone size-full wp-image-199" title="airl_vectorspace_graph_small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph_small.png" alt="" width="500" height="276" /></a></p>
<p>If you are interested in the technique, it&#8217;s a simple similarity measure based on the <a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph.png">vector-space model</a> and my amateur computer scientist&#8217;s PHP implementation can be found <a href="http://code.google.com/p/vectorspacesimilarity/">here</a>. The fact that the two posters who changed their &#8220;from:&#8221; text have both of their accounts close together (can you find them?) is a good indication that the algorithm is not <em>completely</em> botched. The words floating on the links on the right graph are the words that confer the highest value to the similarity calculation, which means that it is a word that is relatively often used by both of the linked authors while being generally rare in the whole corpus. Elis Godard and Dana Boyd for example have both written on air-l about Ron Vietti, a pastor who (rightfully?) thinks the Internet is the devil and because very few other people mentioned the holy warrior, the word &#8220;vietti&#8221; is the highest value &#8220;binder&#8221; between the two.</p>
<p>What is important in networks that are the result of heavily iterative processing is that the algorithms used to create them are full of parameters and changing one of these parameters just little bit may (!) have larger repercussions. In the example above I actually calculate a similarity measure between each two nodes (60^2 / 2 results) but in order to make the graph somewhat readable I inserted a threshold that boils it down to 637 links. The missing measures are not taken into account in the physics simulation that produces the layout &#8211; although they may (!) be significant. I changed the parameter a couple of times to get the graph &#8220;right&#8221;, i.e. to find a good compromise between link density for simulation and readability. But look at what happens when I grow the threshold so than only the 100 strongest similarity measures survive:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_harsh.png"><img class="alignnone size-full wp-image-209" title="airl_vectorspace_harsh_small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_harsh_small.png" alt="" width="300" height="298" /></a></p>
<p>First, a couple of nodes disconnect, two binary stars form around the &#8220;from:&#8221; changers and the large component becomes a lot looser. Second, Jeremy Hunsinger looses the highest PageRank to Chris Heidelberg. Hunsinger had more links when lower similarity scores were taken into account, but when things get rough in the network world, bonding is better than bridging. What is <em>result</em> and what is <em>artifact</em>?</p>
<p>Most advanced algorithmic techniques are riddled with such parameters and getting a &#8220;good&#8221; result not only implies fiddling around a lot (how do I clean the text corpus, what algorithms to look for what kind of structures or dynamics, what parameters, what type of representation, here again, what parameters, and so on&#8230;) but also having implicit ideas about what kind of result would be &#8220;plausible&#8221;. The back and forth with the &#8220;algorithmic microscope&#8221; is always floating against a backdrop of &#8220;domain knowledge&#8221; and this is one of the reasons why the idea of a science based purely on data analysis is positively absurd. I believe that the key challenge is to stay clear of methodological monoculture and to articulate different approaches together whenever possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>one network and four algorithms</title>
		<link>http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/</link>
		<comments>http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 13:54:26 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[softwareproject]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=176</guid>
		<description><![CDATA[The Association of Internet Researchers (AOIR) is an important venue if you&#8217;re interested in, like the name indicates, Internet research. But it is also a good primary source if one wants to inquire into how and why people study the Internet, which aspects of it, etc. Conveniently for the lazy empirical researcher that I am,]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://aoir.org">Association of Internet Researchers</a> (AOIR) is an important venue if you&#8217;re interested in, like the name indicates, Internet research. But it is also a good primary source if one wants to inquire into how and why people study the Internet, which aspects of it, etc. Conveniently for the lazy empirical researcher that I am, the AOIR has an <a href="http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org">archive</a> of its mailing-list, which has about 22K mails posted by 3K addresses, enough for a little playing around with the impatient person&#8217;s tool, the algorithm. I have downloaded the data and I hope I can motivate some of my students to build something interesting with it, but I just had to put it into <a href="http://gephi.org/">gephi</a> right away. Some of the tools we&#8217;ll hopefully build will concentrate more on text mining but using an address as a node and a mail-reply relationship as a link, one can easily build a social graph.</p>
<p>I would like to take this example as an occasion to show how different algorithms can produce quite different views on the same data:</p>
<p><a title="4 network layout algorithms" href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/4-algos-big.png"><img class="alignnone size-full wp-image-177" title="4 algos small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/4-algos-small.png" alt="" width="500" height="396" /></a></p>
<p>So, these are the air-l posters with more than 60 messages posted since 2001. Node size indicates the number of posts, a node&#8217;s color (from blue to red) shows its connectivity in the graph (click on the image to see a much larger version). Link strength, i.e. number of replies between two people, is taken into account. You can download the full <a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/social_undirected.gdf">.gdf</a> here. The only difference between the four graphs is the layout algorithm used (Force Atlas, Force Atlas with attraction distribution, Yifan Hu, and Fruchterman Reingold). You can instantly notice that Yifan Hu pushes nodes with low link count much more strongly to the periphery than the others, while Fruchterman Reingold as always keeps its symmetrical sphere shape, suggesting a more harmonious picture than the rest. Force Atlas&#8217; attraction distribution feature will try to differentiate between <a href="http://en.wikipedia.org/wiki/Hubs_and_authorities">hubs and authorities</a>, pushing the former to the periphery while keeping the latter in the center; just compare Barry Wellman&#8217;s position over the different graphs.</p>
<p>I&#8217;ll probably repeat this experiment with a more segmented graph, but I think this already shows that layout algorithms are not just innocently rendering a graph readable. Every method puts some features of the graph to the forefront and the capacity for critical reading is as important as the willingness for &#8220;critical use&#8221; that does not gloss over the differences in tools used.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Reading Technology 1: Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks, 1970</title>
		<link>http://thepoliticsofsystems.net/2010/09/reading-technology-1-edgar-f-codd-a-relational-model-of-data-for-large-shared-data-banks-1970/</link>
		<comments>http://thepoliticsofsystems.net/2010/09/reading-technology-1-edgar-f-codd-a-relational-model-of-data-for-large-shared-data-banks-1970/#comments</comments>
		<pubDate>Fri, 01 Oct 2010 06:46:06 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[computing]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[reading technology]]></category>
		<category><![CDATA[technological determinism]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=170</guid>
		<description><![CDATA[This blogpost is somewhat of an experiment that I hope will turn into a series. I have started to work seriously on a book that will suggest a somewhat different take on understanding computing and particularly contemporary software deployed on the Internet. A large part of that work consists of historical analysis and in this]]></description>
			<content:encoded><![CDATA[<p>This blogpost is somewhat of an experiment that I hope will turn into a series. I have started to work seriously on a book that will suggest a somewhat different take on understanding computing and particularly contemporary software deployed on the Internet. A large part of that work consists of historical analysis and in this context I am (re)reading many of the seminal papers of the information and computer sciences. What is striking about these texts is not only their content but their far-reaching influence on the landscape of technological concepts and, often enough, on the actual technological developments that followed. Writing software today is in most cases an articulation that takes place in an extremely dense space of established languages, APIs, frameworks, and libraries but also of concepts, methodologies, best practices, tacit assumptions, strategies, and community rules. There is so much &#8220;old&#8221; in every &#8220;new&#8221; but many concepts have become so pervasive, so dominant that we no longer see them as the particularities they in fact are. Being canonical, they become second nature. But many of these path-defining moments can be retraced and given the pervasiveness of computers today, an archeology of computing is, in a way, an archeology of our culture.</p>
<p>One of the ways to do such an archeology may simply consist in trying to read seminal computer and information science papers sideways, not (only) as technological proposals, but as political and cultural <em>projects</em> that combine a (most often critical) analysis of a status quo with a prescriptive take on how a more ideal setting could/should look like. Technology is, in that sense, a way of relating to society, a means of contributing that is political in a very different way than the traditional arenas of governance and debate. What I would like to suggest is that this aspect of technological writing (science papers but also reports, RFCs, norms, proposals, documentation, etc.) is by far not examined enough, particularly when it comes to techniques that are related to software. Our view of technology is still very much shaped by the physical machine &#8211; the box, the screen, the keyboard &#8211; perhaps also because these physical parts are closer to our bodies, more visible and easier to integrate into the cognitive practices of a culture that, paradoxically, is able to produce extremely sophisticated mechanisms while being quite inept when it comes to understanding the role technical objects play in constituting its very fabric.</p>
<p>In my view, the central mistake is to assimilate technology to <em>techné</em> and be done with it. Perhaps I am wrong, but I cannot shake the feeling that very few scholars in the humanities and social sciences are prepared to accord to technological creation the same depth, complexity, variety, the same imbrication in society, the same amount of &#8220;humanity&#8221; than literature or artistic creation in general. This unwillingness to really engage technology beyond the surface leads to the familiar reflex-like reactions, both positive and negative, that seem to dominate public debates on &#8220;hot&#8221; topics like social networking, privacy on the Internet, or computer games.</p>
<p>So what I am looking for is a different way of understanding technology that subscribes neither to an engineering perspective concerned with function nor to a purely &#8220;culturalist&#8221; analysis that sees only imaginaries, symbols, and metaphors, thereby risking to loose the machine in the machine. So, today, first try and why not start with a big one.</p>
<p>In 1970, <a href="http://en.wikipedia.org/wiki/Edgar_F._Codd">Edgar F. Codd</a>, a British computer scientist who moved to the US in the 1940s, published one of the most influential papers in the history of computer science, <em>A Relational Model of Data for Large Shared Data Banks</em> (<a href="http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf">available here</a>, doi:<a href="http://dx.doi.org/10.1145%2F362384.362685">10.1145/362384.362685</a>), in which he proposed a concept for the construction of database systems built around the central idea of <em>separating the logical organization of information</em> from the way it is stored on a physical storage medium. While the usefulness of such a separation may seem very obvious from today&#8217;s viewpoint, Codd&#8217;s paper stirred a virulent debate and his employer, IBM, was quite reluctant when it came to turning the proposal into a product (it took eight years for the first relational database system to make it to the market). When discussing Codd&#8217;s work, we should be very suspicious of the popular narratives of technological development as a series of inventions, or worse, <em>ideas</em>. To separate logical organization from physical storage had been a common practice in libraries for a long time: the library catalogue, in combination with some basic shelf logistics, allows for very different ways of recording books &#8211; alphabetically, by subject, and so on. But technologies are not simply ideas; Gene Roddenberry did not <em>invent</em> beaming. As science and technology studies have shown many times, a successful scientific &#8220;discovery&#8221; or a technological &#8220;invention&#8221; is somewhat of a &#8220;perfect storm&#8221;: many pieces have to fall into place, many different actors have to be mobilized, and most often there is talking, writing, demonstrating, debating, and a whole lot of fuzz. As computer history shows, having an idea (Babbage) or even building a functioning machine (Zuse) may simply not be enough to establish a technology. Since the industrial revolution, technologies are increasingly often systems that require logistics, markets, organizational reform, or an <a href="http://en.wikipedia.org/wiki/Installed_base">installed user base</a>. In our case, the really interesting thing is not necessarily the abstract <em>idea</em> for what has become today&#8217;s omnipresent relational database, but the way Codd <em>builds</em> an idea into a technological concept, as an argument as well as a potential system. To start, let&#8217;s quote the abstract in full:</p>
<blockquote><p>Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.<br />
Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user&#8217;s model. (p. 377)</p></blockquote>
<p>First of all, who are these users that have to be &#8220;protected&#8221;? In 1970, this is obviously not (yet) the manager sitting in front of a screen and keyboard but rather the application programmer that will implement the &#8220;query, update, and report&#8221; functions every larger organizations rely on for management. These users/programmers had been forced to make changes in storage structures whenever requirements changed in a significant way. This was not just an onerous task but also a source of potentially crippling problems as every adaptation risked breaking existing applications. Without explicit reference, Codd&#8217;s work is directly related to what has become to be known as the &#8220;<a href="http://en.wikipedia.org/wiki/Software_crisis">software crisis</a>&#8221; that lead to the emergence of software engineering. The separation of systems into black-boxed modules that communicated via well-specified  interfaces was one of the solutions put forward to counter the explosion of complexity that followed the introduction of computers into large-scale, real-world (business) organizations. Seen in this light, the relational model and the concept of &#8220;data independence&#8221; (p. 377) is an extremely powerful agent for the division of labor that cleanly separates the engineering of a database system from the specification of data structures, adding to the ground work for the concept of end-user software that we know today.</p>
<p>So what is Codd&#8217;s proposal? For a reader trained in the humanities trying to read a paper like the one in question (even the first half, which does not use any formal notation), adaptions to the habitual reading style have to be made to get something useful out of it. Much like mathematics, computer science deploys language quite differently than the humanities (except for analytical philosophy): language, here, is not (only) narrative and argumentative, it aims a building a <em>demonstration</em>, which is most certainly a rhetorical form, but a very formal one that follows a convention consisting of laying out a space of thinking through a series of very precise definitions, which often attribute quite specific significations to words taken from everyday language. Miss one of these definitions and the whole pyramid crumbles. In Codd&#8217;s case, the basic building block is the concept of <em>relation</em> (taken from mathemataical set theory, like most reasoning about databases), which designates a basic form for structuring data where every abstract entity is composed of a series of attributes. This data structure can be &#8220;filled&#8221; with entries (rows). If you&#8217;re familiar with SQL (today&#8217;s standard query language, derivative of Codd&#8217;s work), relation (or rather <em>relationship</em>, the unordered version of relation in Codd&#8217;s paper; nowadays, relation is used for Codd&#8217;s relationship and I&#8217;ll follow that convention) is simply the structure of a table. In practice, Codd suggest to build databases that represent all data in a from that looks like this:</p>
<pre>students:  name  email           major
           Jack  jack@email.com  history
           Mary  mary@email.com  science</pre>
<p>Here, students is a relation composed of three attributes (name, email, number). Jack is a row (entry), Mary is another one. What was new in this definition is obviously not the notion of the table, but rather the idea to define a relation as a purely abstract and unordered structure, a logical construct that did not specify in any way how it was to be stored on a physical medium. An important indicator for this decoupling is Codd&#8217;s comment that &#8220;the ordering of rows is immaterial&#8221; (p. 379). Without stating it explicitly, Codd shifts the construction of order from the storage to the query. More on this later.</p>
<p>The second key concept is the notion of <em>primary key</em> and its corollary, the<em> foreign key</em>. Let&#8217;s add a primary key to our table:</p>
<pre>students:  key  name  email           major
           1    Jack  jack@email.com  history
           2    Mary  mary@email.com  science</pre>
<p>The primary key is a way of addressing a row of data unambiguously (student #1 is Jack and no other student, keys have to be unique). The idea of a foreign key means to simply use a primary key in another table. Instead of doubling information (which may lead to all kinds of update problems as well as storage overhead), we&#8217;re simply &#8220;pointing&#8221; from one table to another. Take the relation (table) &#8220;grades&#8221;:</p>
<pre>grades:   student.id  english  history  geography
          1           C        C        C
          2           B        B        B</pre>
<p>In this case, students.id (relation.attribute is the notation we still use today) is the foreign key linking to the primary key of our &#8220;students&#8221; relation. In practice this means that Jack had all Cs and Mary all Bs in the three classes they took. Codd shows that using this concept of primary/foreign key, very complex organizations of data can be produced while keeping the basic principles very simple. While both of the dominant models of the time, the tree and network models, were based on data hierarchies (that had to be rebuilt if informational practices changed), the relational model is much more flexible.</p>
<p>To put things into perspective: most of the world&#8217;s structured data is currently organized according to this basic form. I would guess that despite the current NoSQL hype (companies like Google and Facebook use even simpler and highly customized data structures for ultra-high speed access) more than 90% of all Web applications have a database backend based on one of the many implementations of the relational model, e.g. Oracle, MS SQL Server, MySQL, PostgreSQL, to name just a few. But data organization is only the first half of the proposal.<br />
The next step in Codd&#8217;s paper is to reflect on a language that would allow for data retrieval and manipulation by <em>addressing the logical organization of the data</em> rather than its physical storage. Rather than specifying the physical location of the data, saying &#8220;I want the entries from address 0&#215;00000 to address 0xfffff&#8221; (and we would have to know these addresses beforehand!), we could simply ask for all the entries in the table students. Remember that above, I indicated that Codd declared entry order as &#8220;immaterial&#8221;? This is because the ordering of data is no longer (merely) a property of the archive. Ordering is done in the language we use to get the data: &#8220;I want all the students, sorted alphabetically  by name&#8221; (SQL: SELECT * FROM students ORDER BY name). The data structure has of course be prepared for the kind of queries we will want to make, but in our example, I could group my list by major, sort it by email, or, by &#8220;joining&#8221; our two tables, order by grade average. More elaborate queries would allow me to select the 25% percent students with the best grade average or to plot the grade evolution over the years if I have that data.<br />
A data retrieval and manipulation language would have to do more than just query and this quote summarizes the requirements:</p>
<blockquote><p>A set so specified may be fetched for query purposes only, or it may be held for possible changes. Insertions take the form of adding new elements to declared relations without regard to any ordering that may be present in their machine representation. Deletions which are effective for the community (as opposed to the individual user or sub- communities) take the form of removing elements from declared relations. (p. 382)</p></blockquote>
<p>These are the four building blocks of every database system I have worked with (again using SQL): SELECT (query a database using different parameters for searching and ordering, e.g. get all students with a certain grade average), INSERT (insert new data into a table, e.g. add a new student into students), UPDATE (change data, e.g. change a student&#8217;s grade after accepting a bribe), DELETE (erase date, e.g. expel a student for offering you a bribe). Such a language &#8211; Codd will propose the <em>Alpha</em> language in the 1970s but IBMs SQL (structured query language; Larry Ellison of Oracle actually was the first to bring a SQL based product to the market and consequently became one of the richest people on the planet) largely won out &#8211; would again &#8220;protect&#8221; the user from having to interact with anything but the data organization specified in the terms of the relational model.<br />
In the rest of the paper, Codd tackles a series of problems that could arise in the implementation of actual systems (and what we would call a &#8220;storage engine&#8221; today) based on the relational model, but this part is less interesting for my purposes.</p>
<p>I would like, however, to propose a couple of comments that may help putting things into a larger perspective:</p>
<p>1) The central critique of Codd&#8217;s proposals came from programmers and engineers that abhorred the loss of control (an potentially performance) over the actual organization of data storage on the physical medium and the dangers such a black-boxing may pose to data integrity in the case of dysfunction or accident. But in the 1980s the demands for more flexibility and cost control won the day, driven by lower hardware costs and better techniques for securing data. This evolution towards layering, modularity, and a general &#8220;abstraction&#8221; from the hardware has happened in all fields of computing and, indeed, the loss of control and visibility is most often the prime concern. In a sense, software has followed a similar trajectory as social organization, from community to society (and back, whenever there is a new frontier to homestead), that is from small-scale teams and organizations to the large-scale efforts of companies like Microsoft or Oracle. Abstraction techniques like Codd&#8217;s played a central role here as enablers of division of labor. It also permitted &#8211; and this is crucial &#8211; a much tighter integration between management processes and information technology. The moment information structures are &#8220;liberated&#8221; from questions of physical storage, they can be implemented in flexible, end-user friendly software packages, which makes it possible for management to interact much more directly with data. The rise of Business Intelligence and Decision Support Systems would have been much less spectacular without the relational model turning &#8220;information&#8221; into the malleable material it has become.</p>
<p>2) While I am of course tempted to write something like &#8220;The decoupling of the logical structure of data from physical storage and the immense power and flexibility afforded by query languages have led to the emergence of late-modern network economies.&#8221;, this would be too quick and easy. The relational database, the powerful query languages, and the business control and intelligence functions they enable are certainly a central part of the informational infrastructure that supports contemporary economic organization. Data, once collected, can be <em>interrogated</em> from every possible angle and automatic reporting (which is no more than a series of very elaborate SQL queries over a large number of tables) has introduced incredible speed into business processes, while <em>keeping up an illusion of control</em>. Illusion, because just like any formal model of reality, data and query models are necessarily reductionist. At the same time, databases are themselves part of a much longer trend in management that started with systems management in the late 19th century. We&#8217;re snowballing from one information age to the next and technologies like the relational model are as much enablers as results, causes and effects.</p>
<p>3) The relational database is part of a much larger transformation in how documents, information, and knowledge are handled. From the library catalog to documentation centers and further on to data banks, information retrieval, and data mining, we see a steady growth in the attention being payed to the logistics, organization, and &#8220;exploitation&#8221; of an always faster growing mountain of texts, images, sounds, and so forth. The relational model not only helps with classic tasks such as storage and retrieval, it shares in the birth of the what could be called the &#8220;automated production of knowledge&#8221;, i.e. the creation of new information from cross-referencing, comparing, statistically examining, synthesizing, and representing large quantities of information. Whether these automated processes (think reporting, data mining, etc.) produce &#8220;real&#8221; knowledge is a rather stale question; it is much more important to emphasize how businesses and other organizations have come to depend on these tools for everyday management and decision-making. Query languages built on Codd&#8217;s proposal constitute the foundation for these developments.</p>
<p>There would be much more to say about Codd&#8217;s work and the relational database but I want to close by going back to the initial question about reading computer science from a humanities perspective. A classic analysis of language and use of metaphors would probably have proceeded quite differently and would have homed in on things like the &#8220;protection&#8221; of users or citations such as this footnote:</p>
<blockquote><p>Naturally, as with any data put into and retrieved from a computer system, the user will normally make far more effective use of the data if he is aware of its meaning. (p. 380)</p></blockquote>
<p>Imaginaries are indeed important aspects of an archeology of computing but even in written form, computer science is, in a way, <em>always looking elsewhere</em>, beyond the text, and Codd points to this &#8220;elsewhere&#8221; in his last paragraph:</p>
<blockquote><p>Nevertheless, the material presented should be adequate for experienced systems programmers to visualize several approaches. (p. 387)</p></blockquote>
<p>What Codd asks the reader to visualize is the <em>laboratory </em>of computer science, the site where things come together, the <em>working system</em>. While the discursive aspects are certainly important, I feel that function is central to the poetics of the technical sciences and if we want to understand their cultural significance we have to read them both as texts and as functional blueprints.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/09/reading-technology-1-edgar-f-codd-a-relational-model-of-data-for-large-shared-data-banks-1970/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

