One of the more interesting apps I came across last month during Gigaom Structure 2014 is called BIME. Imported from France, BIME is the work of a startup from Montpelier. Those who attended this Big Data event probably don’t want to hear those two words again for awhile.
For me, though, missing from all the presentations at Structure was a way to see how this idea could work for non-data scientists–marketers, analysts, general bizness types, data journalists, and the occasional empirically-minded blogger. And this is where BIME comes in.
I received a quick demo of BIME from Rachel Delacour, the company’s CEO. It’s hard to encompass BIME’s powers within a brief tour of this web-based app. After seeing it in action, I was curious to try this out on my own. The apps strength are in data aggregation of huge datasets, and allowing users to tap into the power of cloud-based datastore services, particularly Google’s BigQuery, Amazon’s Redshift, and Vertica. If you’re a large company–BIME’s target customer–you can use BIME to connect with existing internal corporate databases as well.
For my own experiments as a non-corporate entity, I rented a little bit of space in Google’s cloud. BIME excels in creating connections to different datastore formats (Mysql,SQL Server), flat files (spreadsheets,Goggle Drive, Box, Dropbox, etc.), and online services (Omniture, Facebook, Google Analytics. And then by using it’s special QueryBlender for doing database “joins” across these datasets, you get to directly manipulate the information, which would normally have required lots of IT support. After trying BIME, I found it to be a heady experience.
BIME, by the way, is smart about offloading calculations to the database servers–for example, using backend SQL engines to do sums of columns–and also caching of results in either the cloud or partially in your own browser to speed up processing.
Just to see what was possible, I first connected to a sample word database of Shakespeare’s complete works that I found in Google BigQuery. There’s all kind of data analysis that one can do with these results: comparing word distributions among the different tragedies, finding words that were only used once in play or perhaps showing a power-law distributions, known also as Zipf’s Law, which often shows up in a large collections of text. Yes, I’m very aware that even this minor exercise would have required the resources of a university’s tech support department pre-Google.
I decided to try something a little more ambitious. A few weeks back, I was at a Hacks/Hacker meetup where I heard The New York Times’ Bob Gebeloff talk about analyzing US census data with the University of Minnesota’s IPUMS. Inspired by his presentation, I decided to use BIME to connect with IPUM’s census results from 2012.
I ended up working with a dataset of over three million rows of occupation stats across 50 US states, which is Big Data enough for me. My goal was to compare the number of espresso workers, as represented by US occupation code 438 (“food counter”), with metal workers, occupation code 634 (“tool and die workers”). It shouldn’t come as a surprise that we’re in a post-industrial society. Conclusion: there are a lot more people employed in restaurants than in factory support, and I found only two states, Ohio and Minnesota, where it was wasn’t a complete mismatch in the numbers.
BIME also provides several basic visualizations of your data (similar to Tableau) including bar, pie, bubble, lines, and even geo. While I was a little stumped by the bubble parameters, which would have supported a better visualization of my data, I’m sure someone with more data science chops could have gotten it to work.
I had a few minor glitches along the way, which I attribute to my lack of understanding of BigQuery. Overall, I think that BIME more than delivers on its ability to give data analysis to non-data scientists and certainly allows corporate employees to do some very sophisticated work without having to deal with an IT department.
BIME offers a free trial period, and pricing starts at $180 per month. For companies or more likely departments within larger organization that want to analyze huge sets, BIME is worth exploring.
Nice write-up, specially the census data. Would you share the dataset in BigQuery with others too? That would be cool.
Thanks!