<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Politics of Systems &#187; algorithms</title>
	<atom:link href="http://thepoliticsofsystems.net/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://thepoliticsofsystems.net</link>
	<description>Thoughts about Software, Power, and Digital Method</description>
	<lastBuildDate>Wed, 11 Jan 2012 09:11:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>what a judge thinks about google instant</title>
		<link>http://thepoliticsofsystems.net/2012/01/10/what-a-judge-thinks-about-google-instant/</link>
		<comments>http://thepoliticsofsystems.net/2012/01/10/what-a-judge-thinks-about-google-instant/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 13:28:07 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=424</guid>
		<description><![CDATA[In the middle of December, a French appeals court published its verdict in a case concerning Google&#8217;s instant/autocomplete/suggest feature and the company was fined $65K. After the holidays, a couple of publications (e.g. searchengineland and Ars Technica) picked up the story and as in every case where French legislation diverts from US sensibilities the comment]]></description>
			<content:encoded><![CDATA[<p>In the middle of December, a French appeals court published its verdict in a case concerning Google&#8217;s instant/autocomplete/suggest feature and the company was fined $65K. After the holidays, a couple of publications (e.g. <a href="http://searchengineland.com/google-instant-costs-google-65000-in-france-106136">searchengineland</a> and <a href="http://arstechnica.com/tech-policy/news/2012/01/french-court-frowns-on-google-autocomplete-issues-65000-fine.ars">Ars Technica</a>) picked up the story and as in every case where French legislation diverts from US sensibilities the comment sections erupted with chauvinistic righteousness. What was the case about? Here is the full text of a notice by the Courthouse News Service:</p>
<blockquote><p>A French court fined Google $65,000 because the search engine&#8217;s autocomplete function prompts the French word for crook when users type the name of a certain company. Lyonnaise de Garantie, an insurance company, said staffers at Google should have monitored linked words better. Google had argued that it was not liable since the word, added under Google Suggest, was the result of an automatic algorithm and did not come from human thought. A Paris court ruled against Google, however, pointing out that the search engine ignored requests to remove the offending word &#8211; &#8220;escroc,&#8221; which means crook in French. In addition to the fine, Google must also remove the term from searches associated with Lyonnaise de Garantie.</p></blockquote>
<p>Unfortunately, this is basically all the information that circulated in English. But it&#8217;s always interesting to have a closer look at how lawmakers and judges look at information-systems-as-media question and so I went to have a look at the <a href="http://www.legalis.net/spip.php?page=jurisprudence-decision&amp;id_article=3303">text</a> of the actual verdict.<br />
There are a couple of points that are really quite remarkable here, and make the case much more interesting than it appears. Google&#8217;s arguments basically made three arguments:</p>
<ul>
<li>We are an American company and therefore&#8230; (I will not go into the questions that are not specific to Web search.)</li>
<li>The suggest feature is purely &#8220;informatic&#8221; and does not represent an &#8220;intellectual act&#8221;, a &#8220;value judgement&#8221; or an &#8220;opinion&#8221;. (This is the common argument, nothing new here.)</li>
<li>The &#8220;average internet user&#8221; knows that search suggestions are not <em>content</em>. In fact, users do not make any interpretations independently from search results. There is &#8220;no confusion in their minds&#8221; about the difference. (Finally, things are getting more interesting!)</li>
</ul>
<p>The judge however did not see things this way and made a series of quite remarkable observations:</p>
<ul>
<li>If the process is fully automated, how does Google remove &#8220;offensive&#8221; and &#8220;vulgar&#8221; terms from the suggestion lists? Obviously, intervention is possible and regularly applied, even for content &#8211; such as vulgarity &#8211; that is not illegal. So why not in this case?</li>
<li>While it would certainly be difficult to find all cases where individuals or companies are put in a bad light in a suggest list, Google was perfectly aware in this case, because the company in question had contacted them repeatedly.</li>
<li>While the procedure may be automatic, the phrase “Lyonnaise de Garantie escroc” is a human judgement and its circulation on the net is made possible by the machinery. Using algorithms is just another way of &#8220;organizing and presenting human thought&#8221;.</li>
<li>The phase appears already at the moment when one types “Lyonnaise de G” and this &#8220;suddenness&#8221; has the effect of &#8220;imposing the expression&#8221; on the user.</li>
<li>When looking at the results for the query, they do not explain why the term &#8220;escroc&#8221; is attributed to the company, i.e. the content does not signal any facts that would justify the term.</li>
</ul>
<p>Now these are some interesting arguments and while I am not qualified to comment on the validity of the judgement, there is a stark contrast between Google&#8217;s and the judge&#8217;s framing of the question. While Google makes an ontological argument (&#8220;an algorithm cannot have an opinion&#8221;), the judge pushes that argument into the background and bases the verdict on the question &#8220;can Google be bothered to remove a text that is injurious?&#8221;. The answer is &#8220;yes&#8221;, because a) intervention is obviously possible and b) they were made aware by the plaintiff. It also treats the &#8220;instant&#8221; feature as living up to its former name: &#8220;suggest&#8221;.</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2012/01/Screen-Shot-2012-01-10-at-14.26.45-.png"><img class="alignleft size-thumbnail wp-image-427" title="Screen Shot 2012-01-10 at 14.26.45" src="http://thepoliticsofsystems.net/wp-content/uploads/2012/01/Screen-Shot-2012-01-10-at-14.26.45--150x150.png" alt="" width="150" height="150" /></a>While regulation of &#8220;indecency&#8221; is much less pronounced in Europe than in the US, libel laws are of course much stricter, but I do not want to comment on that. What I find thoroughly fascinating about this case is that legal professionals are forced to form opinions about questions as ambiguous as algorithmic agency. By choosing to judge outcomes rather than methodology, the judge in this case (and the judges that treated it in the first instance) have created a precedent that may affect the use of statistical and other techniques that often produce unforeseeable effects. On the other hand side, the verdict is largely based on the fact the the plaintiffs requests for removal were ignored. Google is by no means forced to police suggest features in the future.</p>
<p>Automated information systems order information very differently from manually compiled catalogs or category systems. They produce different forms of &#8220;intelligence&#8221; and it is difficult to think about their directness in terms of opinion or partisanship. What just happened in this case however is that, at least on a legal level, the gap between the two elements was closed a little bit. The judge did not require Google to put the algorithm on a leash but told them to pick up its mess.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2012/01/10/what-a-judge-thinks-about-google-instant/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>a two-click like button for more privacy</title>
		<link>http://thepoliticsofsystems.net/2011/09/03/a-two-click-like-button-for-more-privacy/</link>
		<comments>http://thepoliticsofsystems.net/2011/09/03/a-two-click-like-button-for-more-privacy/#comments</comments>
		<pubDate>Sat, 03 Sep 2011 08:46:46 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[critique]]></category>
		<category><![CDATA[economy]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[society oriented design]]></category>
		<category><![CDATA[surveillance]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=391</guid>
		<description><![CDATA[German publisher Heise Verlag is an international curiosity. It publishes a small number of highly influential computer-related magazines that give a voice to a tech ethos that is at the same time extremely competent in the subject matter (I&#8217;ve been a steady subscriber to c&#8217;t magazin for over 15 years now, and I am still]]></description>
			<content:encoded><![CDATA[<p>German publisher <a href="http://www.heise.de">Heise Verlag</a> is an international curiosity. It publishes a small number of highly influential computer-related magazines that give a voice to a tech ethos that is at the same time extremely competent in the subject matter (I&#8217;ve been a steady subscriber to <a href="http://www.heise.de/ct/">c&#8217;t magazin</a> for over 15 years now, and I am still baffled sometimes just how good it is) and very much aware of the social and political implications of computing (their online magazine <a href="http://www.heise.de/tp/">Telepolis</a> testifies to that).</p>
<p>Data protection and privacy are long-standing concerns of the heise editors and true to a spirit of <a href="http://thepoliticsofsystems.net/category/sod/">society-oriented design</a>, they have introduced a concept as well as a technical implementation of a two-step &#8220;like&#8221; button. Such buttons, by Facebook or other companies, have of course become a major vector of user-tracking on the Web. By using an iframe, every button loads some code from Facebook&#8217;s server and sends the referring url (e.g. http://nytimes.com/articlename/blabla) as an information. The iframe being hosted on the facebook.com domain, cross-site privacy protections can be circumvented, the url information connected to an identifier cookie and, consequently, to a user account. Plugins like the <a href="http://priv3.icsi.berkeley.edu/">Priv3</a> project block these mechanisms but a) users have to have a heightened level of awareness to even consider installing something like this and b) the plugin interferes with convenient functions like Google search preferences.</p>
<p><img class="alignright" src="http://www.heise.de/ct/imgs/04/7/0/5/4/3/7/2klick-funktion-d8dc12ea2ce13316.png" alt="" width="200" height="200" />Heise&#8217;s suggestion, which they already implemented on their own sites, is simple: websites can download a small bit of code that implements a two-step procedure: the &#8220;like&#8221; button is greyed out after the page first loads and there is no tracking happening. A first click on the button loads the &#8220;real&#8221; Facebook code, and the second click provides the usual functionality. The solution is very simple to implement and really a very minor inconvenience. Independently from the debate whether &#8220;like&#8221; buttons and such add any real value to the Web, this example shows that &#8220;social&#8221; features like these can be designed in a way that does not necessarily lead to pervasive user tracking.</p>
<p>The echo to this initiative has been very strong (check the Slashdot discussion <a href="http://slashdot.org/story/11/09/03/0115241/Heises-Two-Clicks-For-More-Privacy-vs-Facebook">here</a>), especially in Germany, where privacy (or rather<em> Datenschutz</em>, a concept less centered on the individual but rather on the role of data in society) is an intensely debated issue, due to obvious historical reasons. Facebook apparently threatened to blacklist heise.de at a point, but has since then <a href="http://www.heise.de/newsticker/meldung/Facebook-beschwert-sich-ueber-datenschutzfreundlichen-2-Klick-Button-2-Update-1335658.html">backpedaled</a>. After all, c&#8217;t magazin prints around 600.000 issues of every number and is extremely influential in the German (and Dutch!) computer landscape. I am very curious to see how this story unfolds, because let&#8217;s be clear: Facebook&#8217;s earning potential is closely tied to its capacity to capture, enrich, and analyze user data.</p>
<p>This initiative &#8211; and the Heise ethos in general &#8211; underscores that a &#8220;respectable&#8221; and sober engineering culture does not exclude an explicit normative stance on social and political issues. And is shows that this stance can be translated into technical models, implemented, and shared, <em>both as an idea and as code</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/09/03/a-two-click-like-button-for-more-privacy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>algorithms shaping the world on TED</title>
		<link>http://thepoliticsofsystems.net/2011/07/29/algorithms-shaping-the-world-on-ted/</link>
		<comments>http://thepoliticsofsystems.net/2011/07/29/algorithms-shaping-the-world-on-ted/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 08:56:05 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[economy]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=376</guid>
		<description><![CDATA[The entertaining platform for technology and design, TED, posted a talk by Area/Code chairman and co-Founder Kevin Slavin, entitled &#8220;How algorithms shape our world&#8221;: There are a couple of interesting examples and ideas in there and the analogy between finance algorithms and the larger processing of &#8220;culture&#8221; is well argued. A fun 15 minutes &#8211;]]></description>
			<content:encoded><![CDATA[<p>The entertaining platform for technology and design, <a href="http://www.ted.com">TED</a>, posted a <a href="http://www.ted.com/talks/kevin_slavin_how_algorithms_shape_our_world.html">talk</a> by <em></em> <a href="http://areacodeinc.com/">Area/Code</a> chairman and co-Founder Kevin Slavin, entitled &#8220;How algorithms shape our world&#8221;:</p>
<p><!--copy and paste--><object width="526" height="374" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="bgColor" value="#ffffff" /><param name="flashvars" value="vu=http://video.ted.com/talk/stream/2011G/Blank/KevinSlavin_2011G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/KevinSlavin-2011G.embed_thumbnail.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=1194&amp;lang=eng&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=kevin_slavin_how_algorithms_shape_our_world;year=2011;theme=a_taste_of_tedglobal_2011;theme=new_on_ted_com;theme=what_s_next_in_tech;theme=to_boldly_go;event=TEDGlobal+2011;tag=Technology;tag=complexity;tag=computers;tag=social+change;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><param name="src" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" /><param name="pluginspace" value="http://www.macromedia.com/go/getflashplayer" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><embed width="526" height="374" type="application/x-shockwave-flash" src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" allowFullScreen="true" allowScriptAccess="always" wmode="transparent" bgColor="#ffffff" flashvars="vu=http://video.ted.com/talk/stream/2011G/Blank/KevinSlavin_2011G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/KevinSlavin-2011G.embed_thumbnail.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=1194&amp;lang=eng&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=kevin_slavin_how_algorithms_shape_our_world;year=2011;theme=a_taste_of_tedglobal_2011;theme=new_on_ted_com;theme=what_s_next_in_tech;theme=to_boldly_go;event=TEDGlobal+2011;tag=Technology;tag=complexity;tag=computers;tag=social+change;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" pluginspace="http://www.macromedia.com/go/getflashplayer" allowfullscreen="true" allowscriptaccess="always" /></object></p>
<p>There are a couple of interesting examples and ideas in there and the analogy between finance algorithms and the larger processing of &#8220;culture&#8221; is well argued. A fun 15 minutes &#8211; there&#8217;s even explosions in there!</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/07/29/algorithms-shaping-the-world-on-ted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>somebody slipped me an ontology into my algorithm</title>
		<link>http://thepoliticsofsystems.net/2011/07/11/somebody-slipped-me-an-ontology-into-my-algorithm/</link>
		<comments>http://thepoliticsofsystems.net/2011/07/11/somebody-slipped-me-an-ontology-into-my-algorithm/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 07:55:18 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[ontologies]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=348</guid>
		<description><![CDATA[In the beginning, it was all about the algorithm. PageRank and its &#8220;no humans involved&#8221; mantra dominated Google since its inception. In recent years however, Google has started to expand the role of &#8220;conceptual&#8221; knowledge in different areas of its services. The main search bar and its capacity to do all kinds of little tricks]]></description>
			<content:encoded><![CDATA[<p>In the beginning, it was all about the algorithm. PageRank and its &#8220;no humans involved&#8221; mantra dominated Google since its inception. In recent years however, Google has started to expand the role of &#8220;conceptual&#8221; knowledge in different areas of its services. The main search bar and its capacity to do all kinds of little tricks is a good example, but I was really quite astounded how seamless concept integration has become on my last trip to Google Translate:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/07/smart-google-translate1.png"><img class="alignnone size-full wp-image-353" title="smart google translate" src="http://thepoliticsofsystems.net/wp-content/uploads/2011/07/smart-google-translate1.png" alt="" width="557" height="389" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/07/11/somebody-slipped-me-an-ontology-into-my-algorithm/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>goldman-sachs, software critic</title>
		<link>http://thepoliticsofsystems.net/2011/05/03/goldman-sachs-software-critic/</link>
		<comments>http://thepoliticsofsystems.net/2011/05/03/goldman-sachs-software-critic/#comments</comments>
		<pubDate>Tue, 03 May 2011 10:23:21 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[economy]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[technological determinism]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=332</guid>
		<description><![CDATA[In August 2010, Edinburgh Sociologist Donald MacKenzie (whose book An Engine, not a Camera is an outstanding piece of scholarship) wrote an article in the Financial Times titled Unlocking the Language of Structured Securities where he discusses a software suite for financial analysis called Intex and compares it to a language that allows to see]]></description>
			<content:encoded><![CDATA[<p>In August 2010, Edinburgh Sociologist Donald MacKenzie (whose book <em><a href="http://books.google.com/books?id=M3x5tvAwzrQC">An Engine, not a Camera</a> </em>is an outstanding piece of scholarship) wrote an <a href="http://www.sps.ed.ac.uk/__data/assets/pdf_file/0007/53998/ftaug10.pdf">article</a> in the Financial Times titled <em>Unlocking the Language of Structured Securities</em> where he discusses a software suite for financial analysis called <a href="http://www.intex.com/main/">Intex</a> and compares it to a <em>language </em>that allows to see and interact with the world in certain ways rather than others. MacKenzie describes his first encounter with Intex as a moment of revelation that quickly turned into doubt:</p>
<blockquote><p>The psychological effect was striking: for the first time, I felt I could understand mortgage-backed securities. Of course, my new-found confidence was spurious. The reliability of Intex’s output depends entirely on the validity of the user’s assumptions about prepayment, default and severity. Nevertheless, it is interesting to speculate whether some of the pre-crisis vogue for mortgage-backed securities resulted from having a system that enabled neophytes such as myself to feel they understood them.</p></blockquote>
<p>While MacKenzie does not go as far as imputing the recent financial crisis to a piece of software, he points out that Intex is not recursive in its mode of analysis: when evaluating a complex financial asset, for example one of the now (in)famous <a href="http://en.wikipedia.org/wiki/Collateralized_debt_obligations">CDOs</a> that are made up of other assets, themselves combining further values, and so on, Intex does not follow the trail down to the basic entities (the individual mortgage) but calculates risk only from the rating of the asset in question. MacKenzie argues that Goldman-Sachs&#8217; 2006 decision to basically get out of mortgage-based securities may well be a result of their commitment to go beyond available tools by implementing a (very costly) &#8220;bottom-up&#8221; approach that builds its evaluation of an asset by calculating up from the basic units of value. The card-house character of these financial instruments could become visible by changing tools and thereby changing perspective or language. Software makes it possible to implement very different practices or languages and to make them pervasive; but how does a company chose one strategy over another? What are the organizational and &#8220;cultural&#8221; factors that lead Goldman-Sachs to change its approach? These may be the truly challenging questions here, although they may never get answered. But they lead to a methodological lesson.</p>
<p>The particular strength of systems like Intex lies in their capacity to black-box evaluation strategies behind a neat interface that allows users to immediately operate on the underlying models, weaving these models into their decisions and practices. Conceptually, we understand the ways in which software shapes action better and better but the empirical complexity of concrete settings is positively daunting even outside of the realm of financial markets. What I take from MacKenzie&#8217;s work is that in order to understand the role of software, we have to be very familiar with the specific terrain a system is embedded in, instead of bringing overarching assumptions to the table. Software is a means for building structure and this building is always happening in particular organizational settings that are certainly caught up in larger trends but also full of local challenges, politics, and knowledge. Programs are at the same time structuring backdrop practice and part of a strategic repertoire that actors dispose of.</p>
<p>The case of financial software indicates that market behavior standardizes around available tools which leads to the systemic delegation of certain decision processes to software makers. This may result in a particular type of herd behavior and potentially in imbalance and crisis. Somewhat ironically, it is Goldman-Sachs that showed the potential of going against the grain by questioning programmed wisdom. That the company recently paid $550M in fines for abusing their analytical advantage by betting against a CDO they were selling to customers as an investment indicates that ethics and cunning are unfortunately two pair of shoes&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/05/03/goldman-sachs-software-critic/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>mapping wikipedia: going english</title>
		<link>http://thepoliticsofsystems.net/2011/04/11/mapping-wikipedia-going-english/</link>
		<comments>http://thepoliticsofsystems.net/2011/04/11/mapping-wikipedia-going-english/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 09:32:43 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[method]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=315</guid>
		<description><![CDATA[After trying to map the French version of Wikipedia a couple of days ago, I&#8217;ve played around with the much bigger English version (the dbpedia file I worked with contains 130M links between Wikipedia pages in a cool 20GB) this week-end and thanks to a rare lucid moment I was able to transform that thing]]></description>
			<content:encoded><![CDATA[<p>After trying to <a href="http://thepoliticsofsystems.net/2011/04/09/mapping-wikipedia-my-god-its-full-of-sports/">map</a> the French version of Wikipedia a couple of days ago, I&#8217;ve played around with the much bigger English version (the dbpedia file I worked with contains 130M links between Wikipedia pages in a cool 20GB) this week-end and thanks to a rare lucid moment I was able to transform that thing into a <a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/network_100_page_links_en_1of5.nt.gdf.7z">.gdf</a> that is small enough to be opened in <a href="http://gephi.org">gephi</a>. I settled for the 45K pages with the most links (undirected) and started mapping. All three maps I built use the OpenOrd layout algorithm (1000 iterations). The first uses the modularity measure for &#8220;community&#8221; detection and colors text accordingly (click on the image for a <em>very</em> large version):<br />
<a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_color.png"><img src="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_color_small.png" alt="" /></a></p>
<p>The second uses a grey color scale to express the degree (number of links) of a page:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_grey.png"><img src="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_grey_small.png" alt="" /></a></p>
<p>Finally, the same map, but with a different color scale (light blue =&gt; yellow =&gt; red):</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_heatmap.png"><img src="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_heatmap_small.png" alt="" /></a></p>
<p>Every version helps with certain readability issues and you can download all tree of the maps as a <a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_en_100_text_layers.psd.7z">big .psd</a> so you can easily switch between the different modes.</p>
<p>When comparing these maps with their <a href="http://thepoliticsofsystems.net/2011/04/09/mapping-wikipedia-my-god-its-full-of-sports/">French counterpart</a>, there are several things than are quite remarkable:</p>
<ul>
<li>Most importantly, there is no cluster that I would qualify as &#8220;common culture&#8221; or &#8220;shared knowledge&#8221;. There is most certainly a large, dense zone at the center but while the French one draws in all kinds of topics, this version has worldwide country information only. I would prudently argue that the English version of Wikipedia shows a more globalized picture of the world, even if there is a large zone of pages on the left that deals with the United States. It&#8217;s a bigger and more heterogeneous world that emerges, but there still is a dominant player.</li>
<li>Sports is even bigger on the English version and typically American sports (Baseball, NASCAR, etc.) show up on the left in smaller, denser clusters compared to the gigantic football (soccer) area on the center to bottom right.</li>
<li>The Sciences are smaller but entertainment (TV, popular music, comic books, video games, etc.) is much more present. At least at this level of observation.</li>
<li>There are some seriously &#8220;strange&#8221; clusters, such as the dense yellow zone on the far right halfway between top and center that shows a group of Russian painters I have never heard of. Not that I&#8217;m an expert but I&#8217;ve found little trace of <em>any</em> other painters. This shows the weakness of my selection method by link degree &#8211; if there was a way to select nodes by page-views, the results would probably be very different, at least for our Russian painters. But it also shows that despite having become a rather respectable Encyclopedia with a quite classic subject outlook, Wikipedia still is a space for off-the-track topics and for communities that are so passionate about a certain subject that they will groom it and grow it.</li>
</ul>
<p>I plan on releasing the scripts used to build these maps in the future but I want to try out a couple more things before that, most particularly a version that only takes into account in-links, which should reduce the presence of certain &#8220;distributor&#8221; pages (&#8220;events in 2010&#8243;,&#8221;people alive&#8221;, etc.).</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/04/11/mapping-wikipedia-going-english/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>mapping wikipedia: my god, it&#8217;s full of sports</title>
		<link>http://thepoliticsofsystems.net/2011/04/09/mapping-wikipedia-my-god-its-full-of-sports/</link>
		<comments>http://thepoliticsofsystems.net/2011/04/09/mapping-wikipedia-my-god-its-full-of-sports/#comments</comments>
		<pubDate>Sat, 09 Apr 2011 11:12:01 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[method]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=302</guid>
		<description><![CDATA[Edit: a map of the English Wikipedia is here. Wikipedia is a fascinating object for way too many reasons. The way it is produced, the place it has taken in society, it&#8217;s size and evolution, and many other aspects are truly remarkable. Studying Wikipedia has become a discipline in itself and while there may be]]></description>
			<content:encoded><![CDATA[<p>Edit: a map of the English Wikipedia is <a href="http://thepoliticsofsystems.net/2011/04/11/mapping-wikipedia-going-english/">here</a>.</p>
<p>Wikipedia is a fascinating object for way too many reasons. The way it is produced, the place it has taken in society, it&#8217;s size and evolution, and many other aspects are truly remarkable. Studying Wikipedia has become a discipline in itself and while there may be certain signs of fatigue on the editing front, there is still much to learn and to discover. I have recently started to take an interest in looking at the way knowledge is structured in different contexts and the availability of certain <a href="https://wiki.digitalmethods.net/Dmi/ToolDatabase?cat=DeviceCentric&amp;subcat=Wikipedia">tools</a> and <a href="http://wiki.dbpedia.org/Downloads36">datasets</a> makes Wikipedia a perfect object for scrutiny. If it just wasn&#8217;t that <em>big</em>. Still, it&#8217;s the 21st century and computers <em>are </em>getting really fast, so why not try mapping Wikipedia. All of it.</p>
<p>There are different ways to start such a project, but simply taking the link structure is probably the most obvious. This allows for bypassing the internal taxonomy and may lead to a more &#8220;organic&#8221; expression of underlying knowledge structures. Unfortunately, computers are not <em>that</em> fast &#8211; at least not mine &#8211; and so I had to make two concessions: I took a non English variant (I settled for French) and reduced the number of nodes to a (barely) manageable amount. The final <a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wp_network_100.gdf.7z">graph file</a> (.gdf &#8211; do not even think about working with it with less than 4GB of RAM) was built by taking pages that had at least 100 connections with other pages. From an initial 183K pages and 11.5M links I went down to a more manageable 40K and 2M respectively. To make things workable, I chose to visualize the page names only, no nodes, no edges. The result looks like this (click on the image for a <em>very</em> big .png):<br />
<a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_fr_100_text_color.png"><img src="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_fr_100_text_color_small.png" alt="" /></a><br />
Reliable <a href="http://gephi.org">gephi</a> did not only do the graph layout (OpenOrd plugin, 1000 iterations) but dutifully detected &#8220;communities&#8221; in the network, which actually did work really well. And here is a version in elegant grayscale, this time without community detection:<br />
<a href="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_fr_100_text.png"><img src="http://thepoliticsofsystems.net/wp-content/uploads/2011/04/wikipedia_map_fr_100_text_small.png" alt="" /></a><br />
The graph shows a big dense zone in the middle that is quite unreadable but composed out of world history, politics, geography, and other elements that constitute a core set of knowledge elements that are highly interlinked. While France plays and important role here, these elements are actually very globalized and include countries from all over the world. Could we interpret this as a field of &#8220;common&#8221; or &#8220;shared&#8221; knowledge? A set of topics that transcend specialization and form the very core of what our culture considers essential?</p>
<p>To the close right of the very center, there is a rather visible (in orange) cluster on the United States. Around the center you&#8217;ll find major historic events and periods (WWII, middle ages, renaissance, etc.). The arts are on the right (mostly music) and France&#8217;s most popular art form &#8211; Cinema &#8211; starts at the top right, in a highly dense orange cluster and goes to the top left, tellingly fusing with theatre. The Sciences form a rather strange blue band the goes from the center top to the top right.</p>
<p>And then there is sports. I was a bit surprised by how much of it there is and how well the clustering and community detection works for identifying individual fields &#8211; football, tennis, car racing, and so on. The second surprise was how few &#8220;geek&#8221; subjects appear on the map. There is a digital technology cluster on the top right but I haven&#8217;t found any traces of the legendary Star Trek cluster. In the end, French Wikipedia appears to be a rather classic encyclopedia if you look at it from a subject angle. Could we use such maps to compare subject prominence between cultures?</p>
<p>Obviously, the method for mapping Wikipedia has to be refined to make maps more readable but the results are actually already quite telling. Let&#8217;s see whether the same approach can work for the English version &#8211; which is a cool 10 times bigger&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2011/04/09/mapping-wikipedia-my-god-its-full-of-sports/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Google downranking nasty merchants, but how?</title>
		<link>http://thepoliticsofsystems.net/2010/12/02/google-downranking-nasty-merchants-but-how/</link>
		<comments>http://thepoliticsofsystems.net/2010/12/02/google-downranking-nasty-merchants-but-how/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 08:58:00 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[search engines]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=262</guid>
		<description><![CDATA[The Official Google Blog has recently written about changes to the ranking procedure that were introduced after a NYT article wrote about an online retailer that had apparently found out that being nasty to your customers would help getting good search rankings because all of the complaints and bad user reviews would get you links]]></description>
			<content:encoded><![CDATA[<p>The Official Google Blog has recently <a href="http://googleblog.blogspot.com/2010/12/being-bad-to-your-customers-is-bad-for.html">written about</a> changes to the ranking procedure that were introduced after a <a href="http://www.nytimes.com/2010/11/28/business/28borker.html?_r=2&amp;pagewanted=all">NYT article</a> wrote about an online retailer that had apparently found out that being nasty to your customers would help getting good search rankings because all of the complaints and bad user reviews would get you links and boost PageRank. While Google denies that this logic would work, they have added a ranking layer to their search results that specifically targets online merchants. The interesting thing about the blog post is that the author details several things that the company could have done but didn&#8217;t do while actually revealing very little about what the &#8220;algorithmic solution&#8221; they implemented actually consists of. From the post:</p>
<blockquote><p>Instead, in the last few days we developed an algorithmic solution which detects the merchant from the Times article along with hundreds of other merchants that, in our opinion, provide an extremely poor user experience. The algorithm we incorporated into our search rankings represents an initial solution to this issue, and Google users are now getting a better experience as a result.</p></blockquote>
<p>While I do not believe that transparency is the prime solution to the gatekeeper issues surrounding search, this paragraph really is strikingly vague. Has Google compiled a list of merchants that are systematically downranked? How is this list compiled? What does &#8220;in our opinion&#8221; mean? Is this &#8220;opinion&#8221; expressed in the form of an algorithmic procedure (one could imagine using the <a href="http://microformats.org/wiki/hreview">hReview microformat</a> to collect reviews on merchants)?</p>
<p>We&#8217;ll probably not get any answers to these questions but the case really shows how murky the whole ranking thing really has become: in an always growing online world, search visibility has extremely important financial ramifications (despite the social media hype) and I believe that companies like Google will increasingly rely on human judgment as a complement to algorithmic procedures (which are just another form of human judgment BTW). This will certainly lead to more legal activity around ranking in the future because courts still understand human meddling a lot better than software design&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/12/02/google-downranking-nasty-merchants-but-how/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>graphs based on word vector similarity and the trickyness of parameters</title>
		<link>http://thepoliticsofsystems.net/2010/10/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/</link>
		<comments>http://thepoliticsofsystems.net/2010/10/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/#comments</comments>
		<pubDate>Sun, 10 Oct 2010 08:38:52 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=197</guid>
		<description><![CDATA[What is a link? From a methodology standpoint, there is no answer to that question but only the recognition that when using graph theory and associated software tools, we project certain aspects of a dataset as nodes and others as links. In my last post, I &#8220;projected&#8221; authors from the air-l list as nodes and]]></description>
			<content:encoded><![CDATA[<p>What is a link? From a methodology standpoint, there is no answer to that question but only the recognition that when using graph theory and associated software tools, we project certain aspects of a dataset as nodes and others as links. In my <a href="http://thepoliticsofsystems.net/2010/10/06/one-network-and-four-algorithms/">last post</a>, I &#8220;projected&#8221; authors from the air-l list as nodes and mail-reply relationships as links. In the example below, I still use authors as nodes but links are derived from a similarity measure of a statistical analysis of each poster&#8217;s mails. Here are two <a href="http://gephi.org/">gephi</a> graphs:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph.png"><img class="alignnone size-full wp-image-199" title="airl_vectorspace_graph_small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph_small.png" alt="" width="500" height="276" /></a></p>
<p>If you are interested in the technique, it&#8217;s a simple similarity measure based on the <a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_graph.png">vector-space model</a> and my amateur computer scientist&#8217;s PHP implementation can be found <a href="http://code.google.com/p/vectorspacesimilarity/">here</a>. The fact that the two posters who changed their &#8220;from:&#8221; text have both of their accounts close together (can you find them?) is a good indication that the algorithm is not <em>completely</em> botched. The words floating on the links on the right graph are the words that confer the highest value to the similarity calculation, which means that it is a word that is relatively often used by both of the linked authors while being generally rare in the whole corpus. Elis Godard and Dana Boyd for example have both written on air-l about Ron Vietti, a pastor who (rightfully?) thinks the Internet is the devil and because very few other people mentioned the holy warrior, the word &#8220;vietti&#8221; is the highest value &#8220;binder&#8221; between the two.</p>
<p>What is important in networks that are the result of heavily iterative processing is that the algorithms used to create them are full of parameters and changing one of these parameters just little bit may (!) have larger repercussions. In the example above I actually calculate a similarity measure between each two nodes (60^2 / 2 results) but in order to make the graph somewhat readable I inserted a threshold that boils it down to 637 links. The missing measures are not taken into account in the physics simulation that produces the layout &#8211; although they may (!) be significant. I changed the parameter a couple of times to get the graph &#8220;right&#8221;, i.e. to find a good compromise between link density for simulation and readability. But look at what happens when I grow the threshold so than only the 100 strongest similarity measures survive:</p>
<p><a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_harsh.png"><img class="alignnone size-full wp-image-209" title="airl_vectorspace_harsh_small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/airl_vectorspace_harsh_small.png" alt="" width="300" height="298" /></a></p>
<p>First, a couple of nodes disconnect, two binary stars form around the &#8220;from:&#8221; changers and the large component becomes a lot looser. Second, Jeremy Hunsinger looses the highest PageRank to Chris Heidelberg. Hunsinger had more links when lower similarity scores were taken into account, but when things get rough in the network world, bonding is better than bridging. What is <em>result</em> and what is <em>artifact</em>?</p>
<p>Most advanced algorithmic techniques are riddled with such parameters and getting a &#8220;good&#8221; result not only implies fiddling around a lot (how do I clean the text corpus, what algorithms to look for what kind of structures or dynamics, what parameters, what type of representation, here again, what parameters, and so on&#8230;) but also having implicit ideas about what kind of result would be &#8220;plausible&#8221;. The back and forth with the &#8220;algorithmic microscope&#8221; is always floating against a backdrop of &#8220;domain knowledge&#8221; and this is one of the reasons why the idea of a science based purely on data analysis is positively absurd. I believe that the key challenge is to stay clear of methodological monoculture and to articulate different approaches together whenever possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/10/10/graphs-based-on-word-vector-similarity-and-the-trickyness-of-parameters/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>one network and four algorithms</title>
		<link>http://thepoliticsofsystems.net/2010/10/06/one-network-and-four-algorithms/</link>
		<comments>http://thepoliticsofsystems.net/2010/10/06/one-network-and-four-algorithms/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 13:54:26 +0000</pubDate>
		<dc:creator>Bernhard</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[epistemolgy]]></category>
		<category><![CDATA[network theory]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[softwareproject]]></category>

		<guid isPermaLink="false">http://thepoliticsofsystems.net/?p=176</guid>
		<description><![CDATA[The Association of Internet Researchers (AOIR) is an important venue if you&#8217;re interested in, like the name indicates, Internet research. But it is also a good primary source if one wants to inquire into how and why people study the Internet, which aspects of it, etc. Conveniently for the lazy empirical researcher that I am,]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://aoir.org">Association of Internet Researchers</a> (AOIR) is an important venue if you&#8217;re interested in, like the name indicates, Internet research. But it is also a good primary source if one wants to inquire into how and why people study the Internet, which aspects of it, etc. Conveniently for the lazy empirical researcher that I am, the AOIR has an <a href="http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org">archive</a> of its mailing-list, which has about 22K mails posted by 3K addresses, enough for a little playing around with the impatient person&#8217;s tool, the algorithm. I have downloaded the data and I hope I can motivate some of my students to build something interesting with it, but I just had to put it into <a href="http://gephi.org/">gephi</a> right away. Some of the tools we&#8217;ll hopefully build will concentrate more on text mining but using an address as a node and a mail-reply relationship as a link, one can easily build a social graph.</p>
<p>I would like to take this example as an occasion to show how different algorithms can produce quite different views on the same data:</p>
<p><a title="4 network layout algorithms" href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/4-algos-big.png"><img class="alignnone size-full wp-image-177" title="4 algos small" src="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/4-algos-small.png" alt="" width="500" height="396" /></a></p>
<p>So, these are the air-l posters with more than 60 messages posted since 2001. Node size indicates the number of posts, a node&#8217;s color (from blue to red) shows its connectivity in the graph (click on the image to see a much larger version). Link strength, i.e. number of replies between two people, is taken into account. You can download the full <a href="http://thepoliticsofsystems.net/wp-content/uploads/2010/10/social_undirected.gdf">.gdf</a> here. The only difference between the four graphs is the layout algorithm used (Force Atlas, Force Atlas with attraction distribution, Yifan Hu, and Fruchterman Reingold). You can instantly notice that Yifan Hu pushes nodes with low link count much more strongly to the periphery than the others, while Fruchterman Reingold as always keeps its symmetrical sphere shape, suggesting a more harmonious picture than the rest. Force Atlas&#8217; attraction distribution feature will try to differentiate between <a href="http://en.wikipedia.org/wiki/Hubs_and_authorities">hubs and authorities</a>, pushing the former to the periphery while keeping the latter in the center; just compare Barry Wellman&#8217;s position over the different graphs.</p>
<p>I&#8217;ll probably repeat this experiment with a more segmented graph, but I think this already shows that layout algorithms are not just innocently rendering a graph readable. Every method puts some features of the graph to the forefront and the capacity for critical reading is as important as the willingness for &#8220;critical use&#8221; that does not gloss over the differences in tools used.</p>
]]></content:encoded>
			<wfw:commentRss>http://thepoliticsofsystems.net/2010/10/06/one-network-and-four-algorithms/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

