Interview with NYU-Poly’s Professor Keith Ross

Last month, I had the chance to interview Professor Keith Ross about his paper describing a novel profiling attack against Facebook users. The attack’s goal is to determine the student makeup of a US high-school. I described the actual details of the paper’s hacking algorithm here. When I first read about Ross and his colleague’s work, it seemed to me that is was impossible for a Facebook stranger to learn that level of detail about teenagers who were not even FB-style friends.

But Professor Ross, NYU-Poly Professor of Computer Science, showed that it was not only possible but his experimental results prove this algorithm makes for a feasible and highly-productive attack. The attack’s success depends heavily on a pool of 10- to 12-year olds who initially lie about their age to gain entry into Facebook, leading a few years down the road to public Facebook profiles with lots of information, even though the students are still, say sophomores and juniors. It’s these “adult” high-schoolers that Ross exploits–by leveraging their friends lists–to discover the rest of the graduating class they belong.

In reading the interview below, keep in mind that Facebook actually avoids having to comply with the Children’s Online Privacy Protection Act or COPPA by refusing to accept legal minors–children below the age of 13. If the site did accept minors, the registration mechanism would have to include a COPPA-mandated process involving parental approval.

It’s Ross contention that in a world without COPPA in which legal minors wouldn’t have to deal with an age restriction in the registration, it wouldn’t be necessary for students to lie, and this particular attack would not be as successful. While Ross may be narrowly right on this point, as you’ll see in the interview, Facebook’s privacy policies also come in for serious criticism.

I was just curious about other work you’ve done that may have led this current research?

“We looked at problems that were somewhat less exciting but led into this current work. One was we collected the public profiles of over 1 million users in NYC. Very often in the public profile age and gender are not specified. So we looked at different ways of trying to infer the age and gender of all the1 million users. We had two papers on that last year.

The larger area we’re interested in is the issue of “third-party” inference. We’re not the only researchers working on this– there are other research teams now looking at this from different angles. The basic issue is: now and ten years into the future, what can third parties infer about us? Clearly they’ll be able to create profiles about us — but how extensive and detailed will those profiles be?

That’s the kind of questions we’re trying to answer. When we were doing that last year–trying to infer the age of people–we noticed that inferring children’s ages is a different problem all together. And then we put it on the back burner. Then we came back to it, and that’s how we got into all this current research.

In the research you were talking about, did you leverage age information you can get from voter registration rolls?

“We haven’t actually touched that. We allude to it in our paper, but we didn’t do that yet. We do intend to do that in a follow up paper we’re working on now. We actually obtained the voter registration records from one city, and we’re going to be obtaining it from some others as well. Voter registration records do not seem very difficult to get. We paid $35 to get the records for an entire county.”

That’s always a small piece of some of these, efforts, to determine more information about someone, at some point voting record comes in. They give the full birth data I believe, and of course address.

“Right, the voting registration records doesn’t include children in them. What we’re going to do is first use the last name of the child and then try to correlate that with the last name of the voter registration record. And then for the cases where we can access the child’s friend list, we search for their parents as friends. We’ll do this because there may be multiple matches for a name like ‘Smith’ and so if we can have the exact name of the parent we can match with greater certainty.”

What scares me about Facebook and social media is how much information is out there. To get back to your research, I was a little unclear about why you thought that by lying about age, children increases the chances of being discovered in the attack you describe?

“First thing is that Facebook does take measures to protect the privacy of minors. In particular, when a minor’s registered age is under 18, no matter how the minor configures his or her privacy settings, only a very limited amount of information is available in the public profile. That information is only a picture, a name, maybe gender, and occasionally networks. Nothing else is available — no friend list is available, no high school name is mentioned. None of that is publicly available.

Indirectly because of the COPPA law, many children under 13 lie about there age when they register. Sometimes they’ll say right off the bat they are over 18, or they’ll say they are a few years older than they are and when they turn, say, 16, Facebook will consider them over 18. Either situation can occur. As soon as the registered age becomes 18 or older, Facebook considers them adults. And if a stranger visits one of their public profiles, they’ll no longer have the privacy protection provided by Facebook. By lying they increase their own privacy risks!

But there are still a lot of kids who aren’t lying– they are minors and registered as such. Although these people are protected by Facebook, since they may friends who are lying minors, we can circumvent Facebook’s protection mechanisms and profile them nevertheless. By collecting all the friend lists and then doing statistical processing on the friend list, we infer the other students, and then build profiles–we find high school, city, and of course friends.”

Why does COPPA makes things worse?

“If COPPA didn’t exist, most kids wouldn’t lie about their ages, and their privacy would then be protected by Facebook.”

I have some quibbles about that hypothesis. One of the problems is that Facebook mixes minors and “hardened” adults in the same social network, which may be more of a root cause.

“In some ways, but it’s hard to split the two groups: people under 18 want to be friends with people over 18 and vice-versa.”

Does the COPAA law come up right away, when you register, where there asking you to verify your age?

“When you go to create a Facebook account, you have to put in your name and full birth date. And if you put in a birthdate that puts you below 13, Facebook comes back and tells you’re not permitted to register.”

One point I wanted to ask you about is that it seems to be a flaw in Facebook– you can be above the age of 18 as far as Facebook is concerned and still list you’re graduating from a high school in the future?

“That’s a good point. Here’s what’s happening. When you first register you have to provide your birthdate, which FB is going to use to let you get in. After that, in the second phase, you fill out profile information, and you can go ahead and fill out whatever you want– your high school, your current city and year of graduation. And it can be a contradiction, as you said!

You can say you’re 22 and yet graduating in 2014. Facebook doesn’t seem to be looking at this contradiction. It’s just ignoring that for the time being. So one measure Facebook could take is look at those contradictions and do something about it.”

Even without lying there’s the problem of recent alumni, who are older than 18, perhaps just in college, but still have friends in high school who are minors. It’s another way an attacker can get in. The person doesn’t have to lie.

“In the paper we look at that case. It’s true you can find recent alumni who have friends in the high school, and you can use that to discover some of the students in the high school. But you get a lot less! We tried to quantify this– for the worlds with and without COPPA.

One other point, I’d like to add. We’re looking now, in a follow up study, at middle school kids. In middle schools —most of the 18 or 19 year olds have very few friends in middle schools. In a world without COPPA it would be very hard to find middle schools. But in a world with COPPA, you can find the middle school kids, but not as easy as in the case of high schools.”

Of course, an attacker can always lie about his age and infiltrate a network. But what’s interesting about your study is the attack is passive–you don’t have to create fake profiles to get the information. Are there any lessons for other social networks?

“This could apply to other sites, ones that involve children– say Google Plus–and popular social networks in other countries. In China there’s “RenRen” with something like 400 million users. And a similar kind of attack could be done in RenRen.”