Until Wednesday morning, you'd probably never heard of something called "XKeyscore," a program that the National Security Agency itself describes as its "widest reaching" means of gathering data from across the Internet. According to reports shared by NSA leaker Edward Snowden with the Guardian, is that in addition to all of the other recent revelations about the NSA's surveillance programs, by using XKeyscore, "analysts can also search by name, telephone number, IP address, keywords, the language in which the Internet activity was conducted or the type of browser used."
David Brown, who co-authored the recent book "Deep State: Inside the Government Secrecy Industry" under the pseudonym D.B. Grady, told NBC News Wednesday the main value of XKeyscore is that it serves as a first point of collection for massive amounts of data the NSA can now cull from digital activities, such as a person's email or Web browsing.
"I like to think of it as plumbing," Brown said. "The pipes come in through XKeyscore, which then diverts the data through different channels, because there's just an awful lot of data."
Basically, XKeyscore gives analysts a tool by which they can pluck individual data points out of a massive indexed database. Collecting a wealth of Web activity from unencrypted Web traffic — typically, where a Web address starts with 'HTTP' instead of "HTTPS" — it serves as a first stop in a larger data collection and mining process that can then serve to pinpoint subjects (say, suspected terrorists) for further inquiry.
"Quantity" is a crucial factor here, given that the Guardian noted in Wednesday's report that the sheer amount of "communications accessible through programs such as XKeyscore is staggeringly large." Indeed, one of the slides from a set of XKeyscore training documents shared by the Guardian showed that in a single 30-day period last year, the data included “at least 41 billion total records.”
"The XKeyscore system is continuously collecting so much Internet data that it can be stored only for short periods of time," the Guardian said. "Content remains on the system for only three to five days," while metadata — the data behind the data, information like email headers or the location from where you last access your email "is stored for 30 days. One document explains: 'At some sites, the amount of data we receive per day (20+ terabytes) can only be stored for as little as 24 hours.'"
That's where additional databases come in. One NSA database known as "Pinwale," for instance, stores recorded signals for up to five years. Meanwhile, metadata goes into a database known as MARINA.
But even with these channels in place, Brown said that there's simply too much information to process right now.
"One of the things in (the) article was that the NSA can't just pull up every email that's been sent through America Online or whatever," Brown said. "There's just too much data."
The NSA, he said, "is playing the long game here. They've got this data today, but they don't need to process it today" with a data cataloguing system like XKeyscore. "That data can sit around until the technology is there" to automate its processing, Brown said.
Yannick LeJacq is a contributing writer for NBC News who has also covered technology for Kill Screen, The Wall Street Journal and The Atlantic. You can follow him on Twitter at @YannickLeJacq and reach him by email at: Yannick.LeJacq@nbcuni.com.