The Wrong Way to Figure Out If the NSA Is Abusing Its Power

Simply comparing the number of queries to the number of rules violations doesn't really tell us very much.

John Pavelka/Flickr

Commentators as diverse as Ben Wittes, Kevin Drum and Jennifer Rubin have tried to understand NSA rules violations in recent days in part by attempting to compare the number of violations to the overall number of queries, as if the percentage of errors is a useful metric to use.

The NSA itself is encouraging the same analytic framework:

In a conference call with reporters Friday, NSA Compliance Director John DeLong repeatedly said that the agency takes compliance seriously and that the audit's existence proved that. "People need to understand there's no willful violations here," he said. The mistakes are in the "parts-per-million or parts-per-billion range," he said. "We really do look for them, detect them and correct them."

Added DeLong: "No one at NSA, not me or anyone else, thinks they are okay."

When pressed, he said there have been willful violations, but the number is "minuscule . . . a couple over the past decade."

He also said the agency makes 20 million queries a month of its databases.

For purposes of this article, let's set aside the incompleteness of the recent report on NSA rule-breaking, and the certainty that there are many more incidents of rule-breaking that we know nothing about. Let's also assume, for the sake of argument, that the NSA has yet to commit any "abuses," and discuss their surveillance generally, rather than just the phone dragnet.

What I want to focus on is the strangeness of comparing the number of NSA queries to number of violations -- as if a low percentage of violations is enough to reassure us about NSA behavior.

According to the Washington Post, some NSA violations involve individuals whose cell phones are were surveilled legally when they were overseas, but who enter the U.S. without the NSA realizing it. There is information we don't have about these violations, and it's easy to sometimes misunderstand the complicated, fragmentary information that we do have, but as best I can tell, one "violation" in this case would seem to mean that a single individual was affected.

By way of contrast, consider another sort of violation:

In one instance, the NSA decided that it need not report the unintended surveillance of Americans. A notable example in 2008 was the interception of a "large number" of calls placed from Washington when a programming error confused the U.S. area code 202 for 20, the international dialing code for Egypt, according to a "quality assurance" review that was not distributed to the NSA's oversight staff.

That error also counts as one violation. But it seems to have affected thousands, maybe hundreds of thousands of people in the U.S. city where data could most easily be abused for political purposes.

Now consider a hypothetical. Say that the NSA carried out 1,000 queries in a year, and the only violation involved a foreigner who brought his cell phone over from China. Now say that they carried out 5,000 queries per year, and the only violation affected most residents of Washington, D.C. It would be very strange to say that the NSA of hypothetical one is better than the NSA of hypothetical two.

But you wouldn't know it from the "queries to violations" ratio.

Now imagine a non-accidental violation. Say that an NSA analyst queries the surveillance agency's data in service of playing a joke on his best friend, or stalking his ex-wife, or blackmailing a U.S. senator. The violation that resulted would vary in seriousness depending on which one it was, but any of them would be alarming.

So our analyst goes ahead and stalks his ex, committing one violation. And then he spends the rest of the month making legitimate queries. Does our notion of how alarming or serious his transgression was depend at all on whether the rest of his month involves 50 or 50,000 legitimate queries?

Hypothetical examples can only take us so far. But it seems obvious to me that the relevant metric isn't the percentage of violations committed, whether mild or serious, accidental or intentional.

What's relevant are how many Americans have their privacy rights violated.

As the Washington Post stated, "There is no reliable way to calculate from the number of recorded compliance issues how many Americans have had their communications improperly collected, stored or distributed by the NSA." That is the information that Americans deserve. And I've yet to hear a plausible explanation for why it would unduly endanger national security to make that information public. But I am inclined to think it would reflect poorly on the NSA.

The Obama Administration tends to release classified information that makes it look good.

Doesn't analyzing what little information we do have by comparing NSA queries to violations create a perverse incentive? Say the agency's director, General Keith Alexander, finds out that violations have doubled in the month of September 2013 compared to last year. So he calls up a computer whiz in his employ, orders him to have the machines carry out 10 million automated queries in North Korea, and suddenly the NSA can proudly report that its September violation to query ratio is the lowest ever.

Am I missing something? Or does queries-to-violations tell us very little about how worried we should be?