A couple of weeks ago, Google released App Engine a Web hosting platform that makes the company’s extensive knowledge in datacenter technology available to the general public. The service is free for the moment (including 500MB in data storage and a quite generous contingent in CPU cycles) but there is a commercial service in preparation. Apps use Google Passport Google’s account system for user identification and are currently limited to (lovely) Python as programming language. I don’t want to write about the usual Google über alles matter but kind of restate an idea I proposed in a paper in 2005. When criticizing search engine companies, authors generally demand more inclusive search algorithms, less commercial results, transparent ranking algorithms or non-commercial alternatives to the dominant service(s). This is all very important but I fear that a) there cannot be search without bias, b) transparency would not reduce the commercial coloring of search results, and c) open source efforts would have difficulties mustering the support on the hardware and datacenter front to provide services to billions of users and effectively take on the big players. In 2005 I suggested the following:

Instead of trying to mechanize equality, we should obligate search engine companies to perform a much less ambiguous public service by demanding that they grant access to their indexes and server farms. If users have no choice but to place confidence in search engines, why not ask these corporations to return the trust by allowing users to create their own search mechanisms? This would give the public the possibility to develop search algorithms that do not focus on commercial interest: search techniques that build on criteria that render commercial hijacking very difficult. Lately we have seen some action to promote more user participation and control, but the measures undertaken are not going very far. From a technical point of view, it would be easy for the big players to propose programming frameworks that allow writing safe code for execution in their server environment; the conceptual layers already are modules and replacing one search (or representation) module with another should not be a problem. The open source movement as part of the civil society has already proven it’s capabilities in various fields and where control is impossible, choice might be the only answer. To counter complete fragmentation and provide orientation, we could imagine that respected civic organizations like the FSF endorse specific proposals from the chaotic field of search algorithms that would emerge. In France, television networks have to invest a percentage of their revenue in cinema, why not make search engine companies dedicate a percentage of their computer power to algorithms written by the public? This would provide the necessary processing capabilities to civil society without endangering the business model of those companies; they could still place advertising and even keep their own search algorithms a secret. But there would be alternatives – alternative (noncommercial) viewpoints and hierarchies – to choose from.

I believe that the Google App Engine could be the technical basis for what could be called the Google Search Sandbox, a hosting platform equipped with either an API to the company’s vast indexes or even something as simple as a means to change weights for parameters in the existing set of algorithms. A simple JSON input like {“shop”:”-1″, “checkout”:”-1″,”price”:”-1″,”cart”:”-1″,”bestseller”:”-1″} could be enough to e.g. eliminate amazon pages from the result list. SEOing for these scripts would be difficult because there would be many different varieties (one of the first would be bernosworld.google.com – we aim to displease! no useful results guaranteed!). It is of course not in Google’s best interest to implement something like this because many scripts might direct users away from commercial pages using AdSense, the foundation of the company’s revenue stream. But this is why we have governments. Hoping for or even legislating more transparency and “inclusive” search might be less effective than people wish. I demand access to the index!

Post filed under algorithms, search engines, society oriented design.

2 Comments

  1. While well short of full access to the index, Yahoo’s new SearchMonkey platform is a step toward the kind of open search platform you describe. Google’s App Engine, on the other hand, appears to me to be geared toward easily scaling up (and wedding to the Google platform) standard database-driven websites, not opening up search. In fact, Google’s trajectory has been to gradually provide *less* access to search data, as for example when they replaced their REST search API with a less flexible but more easily controlled (by Google) Javascript API. (The REST API is now back, but with very restrictive terms of service that essentially disallow using it for anything other than displaying Google search results on a web page.) Yahoo perceives a potential strategic advantage in building an open platform where Google refuses to be fully open; time will tell if they are correct or not.

  2. bernhard says:

    Hi Ryan,

    True, SearchMonkey is interesting and handy (played around with it a little bit some time ago) but it really doesn’t go that much further than the old Google SOAP API (I don’t think they ever went REST, no?) in the sense that you cannot bypass the company’s ranking. These services just allow for application level access to the normal search interface. The SearchMonkey overview states:

    “SearchMonkey does NOT enable you to reorder results on a search page. You can use SearchMonkey to change your search result display so that they are more attractive and useful, but SearchMonkey does not change algorithmic rankings.”

    Sure, you can use AND NOT operators to exclude certain terms or rerank results (I used that in a tool I built for my thesis) but you cannot directly access the index of crawled sites without passing through a ranking layer. What would be an interesting compromise though would be to have access to thousands or tens of thousands of results with one API call (through REST or SOAP or whatever) so you could do some serious reranking. But neither Yahoo nor Google allow that :-(.

    For the Google Sandbox I have in mind, you’d have to run code on Google’s servers and that’s why I find the App platform interesting. But sure, currently it’s definitely not about opening up search. Just wishful thinking…

Leave a Reply

Your email address will not be published. Required fields are marked *


Tech support questions will not be answered. Please refer to the FAQ of the tool.