July 6, 2009 at 7:59 PM ET
There's a new reason to worry about the security of your Social Security number. Turns out, they can be guessed with relative ease.
A group of researchers at Carnegie-Mellon University say they've discovered patterns in the issuance of numbers that make it relatively easy to deduce the personal information using publicly available information and some basic statistical analysis.
The research could have far-ranging implications for financial institutions and other firms that rely on Social Security numbers to ward off identity theft. It could also unleash a wave of criminal imitators who will try to duplicate the research.
Details of the research were published Monday in the Proceedings of the National Academy of Sciences journal and will be explained at the annual Black Hat computer hacker convention in Las Vegas later this month.
The report means companies and other agencies should once and for all stop using Social Security numbers as passwords or unique identifiers, said Professor Alessandro Acquisti, who authored the report.
"We keep living as if they are secure, a secret," he said. "They're not a secret."
The Social Security Administration says SSNs are issued using a complex process that is effectively random, making them impossible to guess in practical terms. But Acquisti and fellow researcher Ralph Gross used public lists of Social Security numbers to look for patterns. They found several. The two say they can guess the first 5 digits of the Social Security number of anyone born after 1988 within two guesses, knowing only birth date and location. The last four digits, while harder to guess, can be had within a few hundred guesses in many situations -- a trivial hurdle for criminals using automated tools.
"Someone filling out credit card applications using a Web site and a botnet could easily succeed (in getting someone's number)," he said.
'Public should not be alarmed'
Acquisti shared the report with the Social Security Administration's office before publication. He said he could not disclose what steps the agency is taking in response to the research.
The Social Security administration played down the discovery. In a statement to msnbc.com, Social Security spokesman Mark Lassiter called any suggestion that Acquisti had cracked the code for predicting Social Security numbers "a dramatic exaggeration."
"The public should not be alarmed by this report because there is no foolproof method for predicting a person's Social Security Number," the statement read.
But privacy expert Daniel Solove, a law professor at George Washington University who reviewed the report, called the discovery a "really big deal."
"If you have a password and you can readily figure it out, that's absurd," he said. "This paper points out just how ridiculous it is that we think there's a way to really keep Social Security numbers confidential. There effectively is no way you can keep them totally confidential. It's just not possible."
How it works
Acquisti said he's discovered simple patterns in the Social Security numbering system. It involves the elusive concept of randomness. To most people, a number is either random or it's not. But to mathematicians, randomness is a sliding scale. Developing perfectly random numbers -- the science of cryptography -- is nearly impossible. Often, software programs designed to create random numbers erroneously spit them out with a faintly distinguishable pattern. With a large enough sample, the numbers begin to form clusters. Even a small discovery of such a cluster can make an enormous difference to someone trying to crack a crypto code, making predictions of supposedly random numbers an order of magnitude easier.
That's what the Carnegie Mellon researchers found.
A completely random guess at a 9-digit SSN should be a one in one billion chance. But instead, their newly educated guesses have narrowed the odds down to roughly 1 in 1,000. Making matters worse, because of changes in the way the numbers have been issued since 1988, the numbers are getting easier and easier to guess as time passes. In one example, the researchers said, they can uncover a Delaware resident's 9-digit SSN within 10 guesses about 5 percent of the time.
The SSN is actually broken up into three parts - the first three digits are the "area number," the second two are "group number" and the last four are the "serial number." The Social Security Administration already offers considerable information about the first part of the number. The area number is based on the zip code used in the application for an SSN. High population states have many area numbers -- New York has 85, for instance – but many others, like Delaware, have only one.
The other two parts the number, however, are assigned in a way that the Social Security Administration believes it nearly impossible for someone to guess. But the Carnegie Mellon work shows they are not.
He took the largest publicly available list of SSNs -- the agency's master death file, which publishes numbers of the deceased to make them hard to use by imposters -- and sorted the list by state and date of birth. Immediately, it became clear that the second portion -- the group number -- was sequentially issued and also trivial to guess. For example, every SSN issued in Pennsylvania during 1996 contains the middle two numbers 76.
That made guessing the first 5 digits of someone's SSN easy in some cases. During a test, the group was able to predict the first five digits of Vermont residents born in 1995 with 90 percent accuracy.
That's important, because there are many ways to determine the last four digits of someone's Social Security number. Some data brokers sell truncated SSNs, with either the first five or the last four numbers visible to the purchaser. And many financial firms use those numbers as a PIN code for verification.
Also, endless customer service operators ask for the last four digits when consumers call for help. Any agent who knows where and when a caller was born could quickly amass a large set of complete Social Security numbers.
The report contains even more bad news.
The serial numbers -- the last four digits -- can often be guessed using formulas and patterns, he said. It turns out that the Social Security Administration doesn't utilize true randomization to create serial numbers. For example, a graph plotting the numbers issued to Oregon residents in 1996, shown below, shows bands that cluster around certain numbers. In fact, there are five discernable lines. A truly random issue would show dots scattered throughout the chart.
With additional analysis, Acquisti said, the researchers were able to discern that the serial numbers are issued sequentially, in a way that ties them to the holder's birth date.
"The SSA believes that scheme is so complex that it's sufficiently random," he said. "We show it is way less random than apparently they believe." As a result, instead of a the four digits yielding a 1 in 10,000 chance in guessing SSNs, he said he can improve the odds to at least 1 in 1,000, and in some cases, far less than that.
The Social Security Administration seems to agree with Acquisti on this issue. In its statement to msnbc.com, the agency said that "for reasons unrelated to this report, the agency has been developing a system to randomly assign SSNs. This system will be in place next year."
Birth dates easy to obtain
For now, an attacker who wanted to guess someone's SSN would still need a birthday and hometown, but these data points are readily available from a number of sources. Many people volunteer such information on social networking sites like Facebook. Voter registration lists and other public databases also include such information, and it is often available for a small charge (or free) from data brokers that operate on the Internet.
There are additional challenges in guessing SSNs for residents born before 1988, because many older Americans did not receive a Social Security number at birth -- so their hometown and their Social Security number application zip code might differ. But beginning that year -- in a move ironically intended to combat fraud -- the Social Security Administration began forcing many families to order SSNs at birth, thereby eliminating one more element of chance for a would be SSN-guesser. It's far easier to guess SSNs for anyone born in 1988 or later, Acquisti said.
The formula for issuing the numbers is, in fact, not designed to withstand attacks from cryptography experts or mathematicians. It was invented in 1936 as a simple numbering system for paper file cabinets.
"This was before there were computers," Acquisti said. "SSNs were never designed for the purpose we use them."
The group is not disclosing the precise formula, because doing so would be akin to publishing the list of all Social Security numbers. But Acquisti said one "provocative" strategy that government officials might take: Setting a date in the future -- perhaps in three to five years -- where all SSNs are made public, so companies and government agencies stop using SSNs for security purposes.
He called current efforts to protect Social Security numbers from public view "well intentioned, but misguided."
The researchers recommend that the Social Security Administration immediately implement a much more random formula for generation SSNs. But that won't help the millions of Americans whose SSNs are now easily guessable. For that, there is only one answer, the report says:
"Industry and policy-makers may need, instead, to finally reassess our perilous reliance on SSNs for authentication and on consumers' impossible duty to protect them," it said.