Friday, July 25, 2008

CNET on BrowseRank: An informative article with a nonsensical premise

It's great to see a well-written, informative article, "Microsoft tries to one-up Google PageRank," about an innovation in search ranking, published at a major tech news outlet like CNET. Stephen Shankland's piece on Microsoft's BrowseRank is definitely all that. It thoughtfully discusses the concept that people's click behavior is a very powerful (and different) voting popularity mechanism than the link graph of PageRank to assess web page search relevance. All good.

However it's maddening to see yet another naive or deliberately misleading article on an innovation in search ranking, that perpetuates persistent misunderstanding of how search works and what makes search better. Maybe it's just a standard media trope to essentialize any topic to the point of parody, but I'm so tired of seeing pieces that fetishize "The Algorithm" as some singular magical trump card by which search is won and lost. Combining scientific models to produce a ranking function is difficult, obscure, and incredibly important, so to some extent I can understand why the media keeps writing stories focused on this. But it's sort of like watching a Saturn V take off for the moon, and turning confidently to your neighbor and saying, "that thing takes off because NASA figured out The Engine... I hear the Russians are working on something better than "The Engine."

Please. Great search is made up of a couple major areas of competence:
* Scaled aggregation of content. It doesn't matter how good your matching might be if you don't have what the user wants in your cupboard of goodies.
* Scaled user voting behavior to assess value. Be this the ability to crawl and assess hyperlinks, access to and the ability to assess user clicks, or one of (as Udi Manber rightly says) hundreds of other variously valuable and tractable methods, you need access to behavior metadata, crystallized in one form or another
* A scientific process and platform by which you can run many experiments to fine-tune the value of various voting behavior signals.
* A technological platform to rapidly and cost-effectively perform this mind-boggling level of computation, faster than the answers and the questions are evolving on a global scale.
* A bunch of really great scalability engineers to build that platform
* A bunch of really great search scientists to conceive, build, and test models on a continuous basis
* Oh, and a very effective monetization effort to pay for all of this incredibly expensive infrastructure, people, and time cost.

It sounds a lot more like General Motors circa 1955 than it does "genius in a garage cooking up the next great thing," doesn't it? Perhaps that's why the media always falls into this trap - the brilliant loner or breakthrough insight that changes the world is just such a powerful narrative hook, whereas a discipline, competence, process story is just kind of... boring.

The reality of world-class search today is that it's big, complicated, and multi-faceted. It has emerged into a discipline of technology all its own, and advances will tend to be subtle and hard to explain. Remember that whenever you read the next excited story about the Next Great Algorithm.

Update: I've written a new post that channels my grumpiness to productive ends -- a few more thoughts on the data-driven web and an index to my various posts on search relevance. Reporters, please read them all... twice!

No comments: