R is Everywhere (Just Hidden)

After losing to Watson at IBM’s booth during the Predictive Analytics World conference, I stopped by Revolution Analytics digs to regain my strength. Revolution, by the way, provides support (consulting, training) for the R statistical language, as well as offering enterprise-class analysis products for wrangling “big data” sets.

While chatting with Mike Minelli, their VP of Sales, I had to ask what place R had in social network startups. After all, these no-business-model-yet entities pine for the elusive hockey stick growth curve that will lead to hundreds of thousands of visitors. While I’ll often hear about programming languages, APIs, and various libraries at demo and pitch events, no one ever talks in detail about the analytics tools, like R, used in massaging and ultimately monetizing the collected social data.

It turns out that R is well known in the startup world, it is just not a sexy thing to chat about–kind of like boasting about Excel spreadsheets. Minelli told me that Foursquare and OMGPOP, the multi-player game site, are just two of R’s more prominent users. One initiative by Revolution that may raise R’s profile among marketers, analysts, and perhaps data-oriented journalists is a friendlier web-based interface that will soon be introduced.

Then along came Kirk Mettler, whose title is “The Fixer” at his consulting firm Big Computing, to fill in more details of R as a hidden, but widely used data analysis engine.

As a member of the pre-Internet generation, I recall SPSS ruling statistical problem solving. Mettler (who was Revolution Analytics’ former COO) told me recent comp sci and science-oriented graduates invariably have learned R in their courses, so it’s practically assumed by everyone that R is the go-to tool in the startup world.

Skilled R developers are so prevalent that some larger companies have had trouble finding trained analysts of SPSS vintage.

After I returned from the conference, I took a peek at Big Computing’s site to get a taste of what’s being done with R.

There are impressive open-source projects out there, including fast calculation of SVDs, and parallel computation of statistics for ginormous data. Check out the cool stuff section of the Big Computing for details and inspiration.