As I kind of expected, when I suggested on Friday that we could now foresee an age where mass surveillance by governments was so cheap and easy that it would be effectively impossible to prevent, the main objection was that while it might be easy to collect essentially unlimited information, it would be impossible to process it, and hence the threat of some sort of grand Big Brother database is being overblown. I’d like to respond to that in detail.
One of the most chilling things about the leaked documents, at least to people who know a little of the (alleged) history of the NSA, is the following detail, which kind of got slipped in unnoticed, probably even by the journalists who wrote it. The following text courtesy of the Washington Post:
The Silicon Valley operation works alongside a parallel program, code-named BLARNEY, that gathers up “metadata” — technical information about communications traffic and network devices — as it streams past choke points along the backbone of the Internet. BLARNEY’s top-secret program summary, set down in the slides alongside a cartoon insignia of a shamrock and a leprechaun hat,
It’s probably a coincidence rather than being meant as a homage, but the logo of PRISM’s sister program BLARNEY isn’t the first time that shamrocks figured prominently in NSA operations. SHAMROCK, as it happens, was one of the codenames for a program which ran from the late 1940s to the mid-1970s, under which the NSA is said to have collected essentially every international telegram entering or leaving the United States.
As you may have guessed by now, my decision to use text messages as the example for building a hypothetical surveillance database wasn’t entirely accidental. Text messages are the informalized descendants of telegrams. And the problem is the same. At the time, NSA reportedly took in millions of messages per day, but chucked out 90% of them without even reading them. We can probably assume a similar triage is applied today.
That said, while the amount of data to be collected has increased exponentially, so has the capacity to process that information — to the point that we’re really not sure what processing capacity the NSA actually has. This was brought home to me a year ago when a professional social media website, LinkedIn, had its list of user passwords leaked. The passwords were in an encrypted form, which theoretically limited the damage.
Which sounds good, but, by last year, it was possible for a fairly run-of-the-mill personal computer to brute-force passwords encrypted in that format by running through about one billion guesses per second. This machine was built with commercial off-the-shelf parts, with what would be some sort of five-figure budget for parts, could make 63 billion guesses per second using the encryption protocol used by LinkedIn. That’s with a five-figure budget. NSA has an eleven-figure budget.
That’s all a side issue, of course, because the NSA isn’t breaking encrypted passwords when it comes to Skype, Facebook, Google, etc. At least according to the latest leaks, it doesn’t need to break anything: it already has an all-access pass.
Not being a computer scientist I’m really not prepared to say what is possible and what isn’t here. I’m just trying to point out that the cost and the difficulty of obtaining, storing, and processing huge amounts of data has fallen exponentially, and, more importantly, that there is every reason to believe it will continue falling exponentially in the future. You reach a point where collecting a massive database of personal communications is so cheap that it gets implausible to assume that a large and secretive intelligence bureaucracy won’t step up over the line, especially when its opponents consist of a compliant opposition, a lacklustre media, and an uninterested public.
All that said, let’s return to my hypothetical text message database for a moment. If NSA wants to perform the equivalent of about one billion operations for every text message in the United States, and it had a supercomputer equal to the fastest publicly documented computer on the planet in order to do this task, then by my back-of-the-napkin math it could run through the whole list in about a couple of hours.*
After that, it could move on to some other equally daunting task, like maybe running through every message posted on Facebook over the previous day.
* 42 text messages per day times 350 million Americans times 1 billion operations divided by 30 petaflops, which is the current computing capacity of Tianhe-2 in China.