The NSA Searches Ten Times as Much of the Internet as It Said It Does
The NSA's assurances that it is surveilling as little information on the web as possible rely on pointing at the tiny percentages involved in what it does. One key percentage it uses, though, is off by an order of magnitude.
The National Security Agency assured Americans last week that it only surveils a tiny percentage of the web data it collects. But it turns out the NSA screwed up the math, and that percentage was off by an order of magnitude.
That error is in a document released by the agency on the heels of the president's speech earlier this month announcing measures to review NSA surveillance. We described the math at stake last week, but the pertinent section is this:
Unfortunately, if you do the math in suggested by that paragraph, you don't get that tiny percentage, 0.00004 percent, or 4 parts per 10 million. It's actually 0.0004 percent, with one fewer zero — or 10 times as much as the NSA suggested. It's ten dimes on the basketball court, not one. (See the math at the bottom of this post.)
That's significant largely because of the weight the NSA puts on its percentages. In a New York Times article last Friday, the agency used similar tiny numbers to respond to The Washington Post's blockbuster report indicating that it had repeatedly violated Americans' privacy. (We spotted this via Mother Jones' Kevin Drum.)
The official, John DeLong, the N.S.A. director of compliance, said that the number of mistakes by the agency was extremely low compared with its overall activities. The report showed about 100 errors by analysts in making queries of databases of already-collected communications data; by comparison, he said, the agency performs about 20 million such queries each month.
Twenty million queries, as Drum points out, is a lot of daily queries (or, if you prefer, database searches). It's about 666,000, in fact, in a 31-day month. That's about seven queries every second. (How the NSA defines "query" in this context isn't clear.) In the context of the amount of data the NSA processes, it's also significant. Each day, using the 0.025 percent of 1.6 percent figure above, the government reviews about 7.304 terabytes of data. If you're curious, the ratio of data reviewed to number of queries is about 12.2 megabytes — meaning that the government sets aside 12 megabytes for every query it runs.
Between the second quarter of 2011 and the first of 2012, the NSA committed about 7.5 privacy violations each day. Which was the NSA's point: of the 20 million queries a month, only a tiny, tiny percentage violate Americans' privacy. But a tiny percentage of a big number gives you seven privacy violations every 24 hours.
The NSA's incorrect .00004 percent figure was picked up by a variety of outlets — at CNN and the Daily Mail, for example. Ten times a very, very small number is still a very, very small number, but it's a small number that represents 10 times as much surveillance as the NSA originally indicated.
Update, 5:00 p.m.: Vanee' Vines of the NSA responded to our question about the calculation over email:
Our figure is valid; the classified information that goes into the number is more complicated than what’s in your calculation.
We asked for further clarification of the discrepancy between the numbers. Vines replied:
Our overall number is valid. I’m not sure why you’re calling this a “discrepancy” when the number in the white paper is valid.
|Daily internet traffic|| 1,826 petabytes, or |
| Amount the NSA "touches" |
(.016 * 1826000)
| Amount selected for review |
(.00025 * 29216)
| Review amount as percentage of daily |
(7.304 / 1826000)