I’m still near the starting point in my travels through recommendation services and their underlying algorithms. It’s always a great help therefore to meet a more experienced knowledge hiker returning from the other direction who can offer a better sense of the terrain ahead.
We received a comment from Sachin Kamdar, founder of recommendation startup Parse.ly, in response to a post last week on Freebase and knowledge networks that gave us just such an insight.
Kamdar’s point is that you can get pretty far—but not all the way, of course—by extracting patterns from datasets. Even a simple pattern matching algorithm can be useful.
Parse.ly, by the way, employs both data mining techniques and language processing in generating its recommendations.
So how far can you go with pattern matching and a little semantic analysis?
To find out we tried Google Sets.
Last week I wrote about Freebase, a network database of knowledge, whose complex relationships can be queried to generate something pretty close to suggestions. Unlike other recommendation software, Freebase doesn’t explicitly require the crowd to provide user profiles—lists of related items—that are the food for the underlying engines.
Freebase is a large, sprawling database of knowledge about films, people, books, and other topics areas. It does use a Wikipedia model of content generation—the crowd populates the database.
Freebase is now part of Google and reflects this search company’s ambitious plans in mining semantic value from raw web content.
We may be able to get a glimpse of Google’s larger goals. Hidden in the backroom of Google Labs, where the Googlers reveal their not-ready-for-general-use projects, is an app called Google Sets.
It’s a pretty simple interface: you enter a list of related items, Sets then tries to extrapolate from this subset and predict entries in a larger set. Think of it as Google’s proto-suggestion service.
So how does this thing work? Since this is Google, the initial list of words entered are used as search keys to pull up web pages. In a YouTube video I happened upon, Peter Norvig, Director of Google Research, explains at a very high level how by using these kewords as a “seed pattern,” Set can guess broader patterns, pull out semantic value, and then do more precise searching.
To test Sets, I gave it a list of track titles from Miles Davis’s albums (see above). No mention of “Miles Davis” in any of my keywords.
From the results Google Sets generated, the software clearly figured out that “Miles Davis” and “jazz” were the overarching concepts, and it was able to then pull up other relevant titles.
A very credible performance.
The other key thing to remember is that search patterns do well when they have lots of data to work with and the processing to power to match the data size. Obviously, this is one of Google’s advantages.
Startups such as Parse.ly are riding the right wave, IMO. By combining semantic value with user profile information they should be able to generate some very, very good suggestions.