Decoding the FCC (with Help from CrowdFlower)

Until recently, when confronted with a very limited budget for a critical task that had to be completed yesterday, you had few options. You could face mutiny by squeezing more person-hours from your staff or else turn to budget-busting temp agencies.

I recently took on the job of analyzing some of FCC Chairman Genachowski’s public statements on broadband policy as recorded on YouTube. My goal was to gain more insight into the  policy course that’s been roughly charted in the National Broadband Plan. (Well, someone had to do this, it might as well be The Technoverse.)

We don’t have much staff, we have  limited $s. What to to?

wrote about CrowdlFlower, the crowdsourcing solution company. Why not put them to work for me?

As a novice user of a CrowdFlower, I was slightly uneasy with trusting work to an anonymous mob. So I had to remind myself of the philosophical underpinnings of the crowd.

I won’t repeat in this space the well-told story about the  father of crowd wisdom, Mr. Francis Galton (1822-1911),  using a small English town to guess the weight of a cow.

The key point in understanding how a large number of imperfect measurements works out to a very accurate result is variability,  as in the reduction of uncertainty by averaging out results from a large sample pool ( see strong law of large numbers).

Yes, I’m familiar with the mean-median controversy swirling around Galton’s data, but bear with me.

In other words, one person’s hunch can be all over the place, but take a large number of guesstimates and the “noiseness” is reduced: the result fits into a much tighter band.

Companies such as  Crowdflower are well aware of this phenomena. In fact, one way to improve the accuracy of the job I handed off to the crowd is to request more “judgements” or reviews by the workers.

Back to the work I had pushed off to CrowdFlower.  My original idea was to ask the crowd to analyze some FCC documents.  Realizing that there are  limits to the crowd and even telecom savvy attorneys, I decided on easier task that I thought would have more predictive value.

I was asking the crowd to listen to Chairman Genachowski’s public utterances while monitoring for certain words or phrases (“network neutrality”, “open internet”,etc.) and report back a timestamp.

Crowdflower had worked out that my 12 videos or “units’ of work would require 50 reviews by workers to achieve a good quality level  —fyi, Crowdflower allows a single worker to report multiple results for the same unit.

To reach better accuracy, I seeded my order with two Crowdflower-style Gold questions.   These are videos for which I knew the timestamp of the keywords.  Those workers who failed these special questions were designated as less trustworthy: it’s really a way to tighten variability by giving less weight to some members of the crowd.

The cost of an assignment that would take me hours of elapsed time to complete? Under $15.

I’ll report the results of my FCC analysis next week.

Reblog this post [with Zemanta]