June 14, 2013 at 4:27 AM ET
You may want to stay anonymous online — that doesn't mean you can. Data crunching has grown so sophisticated and powerful, privacy researchers now warn that tracing identities from a pool of supposedly "anonymized" data is not just a possibility, it's a certainty.
"It is depressingly hard to try to anonymize data in a way that resists identification by a committed adversary," Arvind Narayanan, a privacy researcher at Princeton University, told NBC News. Nevertheless, Narayanan and others are testing ways to protect the identity of Internet citizens. Sure, users can adjust their own behaviors to be less trackable, but true anonymity would require the participation of companies — and some new technologies.
Anonymous? No such thing
In late 1990s, Massachusetts' Group Insurance Commission released the medical data of 135,000 state employees for research, after removing sensitive information like names, addresses and social security numbers. Latanya Sweeney, then an MIT grad student, re-identified Governor William Weld's health data by correlating his birthdate and zip code with public voting records. Sweeney has since shown that you can identify 87 percent of all Americans with just birthdate, zip code and gender.
And online trackers, planted by about 100 companies, collect a lot more than that. "Longitudinal data" is gathered over weeks and months, and includes what ads you’re clicking on, or what products you’re buying. Advertisers are keen to mine this data for what the online ad folks call behavioral targeting.
"When you look at data that extensive, that ends up being a very accurate profile of a person," says Narayanan — who himself once identified a batch of "anonymous" Netflix users by their IMDB postings.
Less data, more privacy
Alas, the path to truly anonymous online living leads straight through the heart of corporate America.
One approach is for companies to just collect less data. Narayanan and a few colleagues demonstrated one way this could play out. They built a browser extension called Adnostic which could track your user behavior, but would restrict it to your computer. A company could use it to watch your behavior, and effectively serve you appropriate ads, without beaming your information back to servers, or selling it to anyone else.
If companies want to share collected data, Narayanan proposes that they sequester it, so that analysts need to make explicit queries to get information. "You can monitor the queries that people are running. If the analysts are doing something malicious there's a chance that they will be found out," he said.
A third approach is a slick mathematical process called "differential privacy." When linked together, identifiers like birth dates and zip codes form a unique identity "fingerprint" for a person. When a differential privacy algorithm is applied to a data set, those links get blurred, and bits of data can no longer be traced to their source. This would let companies or researchers conduct "sophisticated data analyses," whether for marketing or public health purposes, "while having some sort of mathematical guarantee against a privacy breach," Narayanan explains.
Differential privacy is now applied in situations where sensitive data needs to be shared for a common good. For example, the OnTheMap project, hosted by the U.S. Census Bureau, makes anonymized data publicly available while keeping sensitive information about citizens intact.
Differential privacy could be applied to targeted advertising, says Adam Smith, associate professor of computer science at Penn State University. In their current forms, "even if I trust Microsoft or Google to do the right thing with my data, Google may be inadvertently leaking my data" to third parties, he told NBC News.
But though the concept has been in development for more than a decade, the tools aren't quite ready for the market yet. Also, there still is no economic incentive for companies that collect and store and share Web-tracking data to use any of these options. Perhaps if more businesses — such as the DuckDuckGo search engine, whose motto is "We don't track you" — gain popularity, an incentive would arise.
Politics and 'Dissent'
For some Internet users, anonymity is not optional. Political activists in countries with less tolerant Internet laws and government-controlled ISPs stand to land in jail (or worse) if their online postings are associated with their identities. Tools like TOR — a software that lets you browse, send emails, send instant messages behind a wall of anonymity — offer some privacy, but researchers say trackers can bypass such technology.
"The adversary has changed," Joan Feigenbaum, professor of computer science at Yale University told NBC News. "The relevant adversary has become much stronger." To keep the new generation of online activist anonymous, Feigenbaum, her colleague Bryan Ford, and a group of other researchers built an anonymous online communication tool code-named "Dissent."
They claim that Dissent is extremely resistant to traffic analysis — analyzing not the data itself but the flow of data through a network — a kind of detection now available to authoritarian regimes, in which software like TOR has shown vulnerabilities.
Dissent is designed for communication delivered by groups. For example, "Everybody knows the message came from some member of the group, but no one knows which member it came from," Ford, an assistant professor of computer science at Yale, explained. Even in the case of persistent surveillance — in fact, even if an enemy infiltrates a group — Dissent can keep identities secret.
The software has been under development for two years and has been prototyped for testing, but the team doesn't yet recommend folks trust their lives to it yet.
"Collaborators have been using Dissent and have been independently testing it under attack conditions," Ford told NBC News. That's shaken out a few vulnerabilities but the foundation on which Dissent was built remains solid.
Microsoft, Apple, Google, Facebook and Yahoo aren't likely to make changes to the way they collect or share data overnight. Some online tools politely request third-party trackers to stop, but such requests are like a "gentleman's anonymity based on a handshake," Ford said. Of course, if you're signed into a service like Facebook that asks for your real name up front, you've already checked anonymity at the door.