Sunday, February 6, 2011

Microsoft's Copying Habits

As a Googler I've been following the copygate debacle with interest. Even though I am biased towards Google and tend to side with them, I do think there are good reasons to do so in this case. From all the coverage I've been reading, I feel there are a few claims being made by Microsoft and their supporters that are not being scrutinized properly.

One of Microsoft's main defense is that the data it uses is voluntarily disclosed by Internet Explorer and Bing Toolbar users performing searches on Google. The Mountain View Search Company, it says, has no ownership rights over that data.

This argument is as credible as a college candidate's claim that his application essay can't be called a copy because it was derived from the voluntary, and detailed, account provided by the girlfriend of the original's author, who have read the source piece. Using indirect sources, instead of the original material, doesn't make the derivative work any less of a copy. I don't think that argument would fly in front of the application assessment committee.

Another defense Microsoft brings up is the fact that link mining is a general solution they apply to searches performed by users across all kinds of websites, not just Google. Amazon and eBay are their premier examples. The flaw with this argument is that neither Amazon nor eBay are in the business of monetizing search results. Amazon executives probably cheer Bing's search signal technology, since it likely means more of Amazon's books sold to Bing users.

But Google is in the business of monetizing search results. In fact, I argue that, if Bing's technology doesn't special case Google, it should. It should add specific logic to either exclude links harvested from google.com (and other web-search engines) or offer links to the original results page when competing search algorithms are the strongest signals it has (like for the infamous torsoraphy query).

To help illustrating how potentially damaging this practice is, allow me to provide a straightforward extrapolation of the consequences of Bing's copying technology to the web search ecosystem.

Imagine that a search start-up begins making gradual advancements towards semantic understanding of the web, being able to answer natural language queries with increasing accuracy. If using the competition's results as input to one's search algorithms is regarded as acceptable practice, the start-up's differentiated results would, in no time, begin showing up in the results of the big search companies, neutralizing all of its competitive advantage and squashing the incentive for innovation in the process.

Yes, I know that Google is not a start-up and that spelling correction is not semantic understanding, but the practical and moral implications are the same, nonetheless.