Monday, April 24, 2006

Search Trends: The Search For User Intent

There are several interesting trends afoot in the world of search. As I have previously written, concepts as disparate as editorial news and personalization are simply different ways to get to a better search answer. I'm now ready to extend this thesis a bit further, and claim that the next big leap in search is going to be driven by a superior ability to acquire and understand user intent.

In any query:response system, the quality of the output is fundamentally limited by the information contained in the query. In a mathematical sense, it's impossible to have more significant digits in your answer than you had in your input data. No matter how well you tune your response algorithm, not only will "Garbage In, Garbage Out" always hold true, but the more general case of "lack of precision in, lack of precision out" will limit how good your results can be.

On a general search engine (Google, Yahoo, etc.) an inbound user query can, quite literally, be about anything. Presented with a blank box that can encompass anything, the user then types in an average of 2.4 words, and the search engine must give back an ordered list of 10 relevant results (having worked very hard to remove spam, porn, etc.) that may approximate what the user is looking for. Suppose the user types in "Peru". Is the user a 4th grader writing a report for school? A college student planning a backpacking trip? A consultant searching for economic data related to a project? A Peruvian looking for things related to their home country in their local area? All these, and many more, are possible.

It's a very tough problem, and a key reason that search quality has ground to a halt, or at least stopped improving dramatically, is that in this completely general context, it's very hard to understand more about the user's intent than they grudgingly give you in their 2.4 words.

So how can search engines solve this problem of paucity of intent?

Answer #1 is to verticalize. Zvents, my startup, is an example of this trend -- as are Simply Hired, Kosmix, and many others. When someone types Peru into Zvents, we know -- simply because we're a search engine for local events, and nothing else -- that the user is looking for something to do in their local area having to do with Peru. In order for Google to get that kind of intent, the user would have had to type, "something to do in my local area having to do with Peru" or a similarly complex query. Zvents gets a great deal of intent information simply by being a specialist. When someone shows up in our search interface -- either directly, or via a media partner -- we can be confident they're looking for stuff to do. Simply Hired, similarly, would know that the user was looking for jobs either in Peru, or related to it; and Kosmix, under its healthcare filter, could infer that the user cared about health issues related to Peru.

Answer #2 is to personalize. Google is the most interesting case emerging at the moment. Go to the Google homepage and in the upper right you'll find a link to the new personalized home. It's a HUGE departure for Google -- who has built a multi-billion dollar business on a purely "visitor" experience -- to be moving to a registered user experience. It's likely that Google is doing this is to get a much clearer sense of search intent, based on personal past search history. At least one very senior technical person involved in the Google personalized home page has a background in mining extremely large log files to extract business intelligence -- exactly the kind of resume you'd want if you were attempting to derive intent from search history. I've also had some discussions with folks at Yahoo (who are also very happy to let you log in while searching) about the extremely sophisticated state information that Yahoo maintains as you move through the Yahoo portal; all aimed at improving the targeting both of ads and products.

Answer #3 is to get more intent directly from users. Many of you might recall the original concept of Ask Jeeves, which was to "answer your questions in plain English". In a recent conversation with some folks at Ask, they mentioned that one of the reasons they'd retired Jeeves was that delivering on this "brand promise" was nearly impossible - but that because of that historical promise, the length of their typical search query was vastly greater than comparables at the other big search engines. That begs the question -- what if you could successfully parse natural language English like "something to do in my local area having to do with Peru"? It's an incredibly hard problem, but the ability to accurately extract enough user intent to deliver a truly better search result would yield enormous market benefits. I know of at least one current serious search startup that is taking exactly this approach -- building a general-purpose search engine that can better parse complex intent, and accurately respond to it. If they succeed in their quest, and establish a "brand promise" similar to the original Jeeves, they'll have a significant advantage in actually giving users what they want.

There are other answers, but they look less and less like search. Aggregate Knowledge, for instance, is a matching engine for supplementary navigation based on heterogenous item:item matching in a "people who viewed this, also viewed that..." sense. At scale, AgKnow can derive intent from the cumulative behavior of multiple users who were exposed to similar information -- and as their scale increases further, they could slice even more finely to begin to distinguish path dependencies as well. I think we'll see an explosion of search alternatives in addition to advances in all three categories mentioned above, as dozens of companies large and small strive to crack the problem of too much information, not enough user intent.

No comments: