Jesse Stay's excellent post on the search potential of Facebook's Lexicon has inspired me to put down a few quick thoughts on Facebook's nearly unlimited potential to capture the future of what John Battelle calls the "database of intentions".
Google's extraordinary accomplishment is that they used superb statistical analysis to make some vague sense out of the complete mishmash that makes up the flat-text Web. But while that accomplishment is considerable, at the end of the day, they're still dealing with mush.
Facebook's great opportunity is that everything within Facebook is structured; and increasingly, users express their intentions against this structured data at scale in a way that can be very productively mined -- for product improvement, for user retention, for advertising. For insight.
Riddle yourself this: You have 200 Facebook friends. They are all pretty active. Does your FB feed actually show every single event from every single one of them? No, it doesn't. FB is algorithmically determining what is most interesting to you - dynamically - based on how much attention you pay to what those users do, and how you interact with them. Facebook knows how much you care about each of your friends. It knows whether you pay more attention to people near or far, to men or to women, to people you work with, went to high school with, or went to college with. It knows because you explicitly describe all those relationships, in a way that Google can never grasp no matter how world-beating its science and how vast its server farms.
Or consider the Lexicon graphs that Jesse highlights in his post. Google Trends can handily generate one of those for you from their painstakingly de-mishmashed dataset. But they can't tell you the demographic breakdown of that interest, because they don't know who's male and who's female. Nor do they know whether that interest is coming from people directly associated with the topic in question; for instance Ohio State, my alma mater.
Here's the Ohio State Lexicon graph, which I have annotated to show the precision of Facebook's read on the importance of a topic:
Here's the term 'Football' as a proxy from the new Lexicon, which doesn't yet allow analysis of arbitrary search terms.
As you can see, FB could allow you to slice and dice the 'Ohio State' search by any number of associations -- male vs. female, by age, and whether the person had attended Ohio State. Google can't do that. No one else can do that, because no one else has assembled a gigantic graph of defined and structured entities within which users apply their attention and annotation.
The implications for local search alone boggle my mind - that's food for another post.
It's worth noting that Lexicon is really, really slow right now. My hat is off to FB for making it work at all -- I assume that some implementation of Cassandra is behind the current Lexicon, and one reason they may not be allowing open-text searching in the new Lexicon is because while they're pushing the envelope developing it, they're crunching big batch jobs on a limited set of terms in Hadoop for the more sophisticated analysis presented there. Zvents has developed some pretty sophisticated internal analytics based on Hypertable, and I'm familiar with the challenges that this sort of slice-and-dice presentation presents -- they are considerable.
Google has taken the statistical analysis of flat text about as far as it can go. The question is, what next? Powerset attempted one approach, which was the semantic analysis of that same flat text. We'll see whether Microsoft and Powerset can make a go of that - the jury is definitely out whether it adds value in a computationally and commercially tractable manner. But in the meantime, my bet is on Facebook -- because the information potential of a structured system is vastly greater than that of a flat corpus, and it is far more tractable to parsing.
Internet, watch out. Here comes Facebook.