It is interesting and vaguely rewarding in a bloggy kind of way to watch The New York Times start connecting the dots and realizing that phone numbers are only part of what the NSA is collecting in its digital mining operation. I am, of course, referring to yesterday’s front page article on how the NSA is building an “enhanced” social network. I “called” this last month in my post on Paul Revere and Social Attribute networks wherein I boldly stated such a thing.
I didn’t really call this, of course: any sociologist, computer scientist, and others in the social network science community probably saw this coming after the first leaks started appearing about the NSA’s vacuuming up metadata and phone numbers. For those who want to catch up very quickly, they can review my “Paul Revere” post. But anyone who wants a somewhat technical, but nicely written overview on link prediction, which is really what this is all about, can read this from Jon Kleinberg, founding father of social networking science.
Here are a few of my own notes on the article:
The agency can augment the communications data with material from public, commercial and other sources, including bank codes, insurance information, Facebook profiles, passenger manifests, voter registration rolls and GPS location information, as well as property records and unspecified tax data, according to the documents
Translation: The NSA created a SAN by adding a rich set of attributes to the social network they built from scooping up email address and phone numbers.
A series of agency PowerPoint presentations and memos describe how the N.S.A. has been able to develop software and other tools — one document cited a new generation of programs that “revolutionize” data collection and analysis — to unlock as many secrets about individuals as possible.
Translation: They are building on a very deep foundation of existing software and algorithms for analyzing giant networks. And no doubt they have excellent, cheap computing power not available to a typical academic researchers.
Analysts can exploit that information to develop a portrait of an individual, one that is perhaps more complete and predictive of behavior than could be obtained by listening to phone conversations or reading e-mails, experts say.
Translation: The NSA programs attempts to predict new attribute links based on analyzing the social nodes of neighbors along with the existing attributes connected to those nodes. There’s no mention in the Times article about the accuracy of the predictions– e.g., how many false positives come up. I’m waiting for Times editors to start asking computer science types rather than law professors about this. The current research shows that this technique is not anywhere near having very high reliability–good enough for consumer use, say, by Facebook but as the basis for a massive intelligence program?
Knowing things like the number someone just dialed or the location of the person’s cellphone is going to allow them to assemble a picture of what someone is up to. It’s the digital equivalent of tailing a suspect.
Translation: “Birds of a feather” principle or the fancier name for this is homophily. See McPherson.
The data is automatically computed to speed queries and discover new targets for surveillance.
Translation: It’s an inference or machine learning tool that lets you explore various ideas and theories and no doubt reporting back with a likelihood probability estimate.
A 2009 PowerPoint presentation provided more examples of data sources available in the “enrichment” process, including location-based services like GPS and TomTom, online social networks, billing records and bank codes for transactions in the United States and overseas.
Translation:More specifics about the kind of attributes they are scooping up. And of course this is all coming from a PowerPoint, which I’m guessing was used to market this program within NSA. And we all know how reflective of reality organizational PowerPoints are.