Body Counting

Why even the most-dubious statistics influence our thinking

How many Iraqis have died because of the American invasion? It would be nice to know the local price of Saddam Hussein’s ouster, five years on. Many researchers have produced estimates. Unfortunately, these range from 81,020 to 1 million. The wide variance, of course, speaks to considerable uncertainty, although the individual figures are often absurdly precise.

The figure most often quoted, and until recently regarded by many as the most scientific, comes from a study published in TheLancet, a prominent British medical journal, just before the 2006 election. That study, which made headlines worldwide and was cited by war opponents from Ted Kennedy to Al-Jazeera, found that a shocking 601,027 Iraqis had died violent deaths since the U.S. invasion. But the timing of the study’s publication and the size of its estimate have attracted a great deal of criticism; its authors, mostly researchers at Johns Hopkins University, have been accused of everything from bias to outright fraud.

Research by the World Health Organization, published in January in TheNew England Journal of Medicine, has cast further doubt. It covered basically the same time period and used similar statistical techniques, but with a much larger sample and more-rigorous interview methods. It found that the Lancet study’s violent-death count was roughly four times too high. This has a familiar ring to it. A smaller study, released by the Johns Hopkins team in 2004, had been quickly contradicted by a larger UN survey suggesting that it had overstated excess mortality by, yes, about a factor of four.

“Conflict epidemiology,” the study of war’s health effects, is by its nature an inexact science. War and anarchy are not friends to careful, by-the-book research. We have little idea how many people now live in Iraq; ascertaining the number who have died there is a tall order. And huge disparities in death estimates are not unique to the conflict in Iraq; cluster sampling, the best-regarded survey technique for use in war-torn places, has produced estimates in other conflict zones, such as Darfur, that vary by factors of three or more.

All casualty studies have problems. But the Johns Hopkins study’s methodology was particularly troublesome. The number of neighborhoods the team sampled was just above the minimum needed for statistical significance, and the field interviewers rushed through their work. The interviewers were also given some discretion over which households they surveyed, a practice generally regarded as unwise. And though such latitude calls for closer-than-normal supervision of field interviewers, the Johns Hopkins team seems to have provided little. Any of these choices can be defended because of the dangers, and the authors have said as much, claiming, basically, that this was the best they could do in a bad situation.

But that raises an unwelcome question: If this is the best we can do, should we be doing this at all? Cluster sampling was developed for studying vaccination; it has never been validated for mortality. Because of the wide variance in the estimates it produces, some researchers are now questioning its usefulness.

Yet though its compromises made it particularly unreliable, the Lancet study remains the most widely known. Its conclusions were the earliest and most shocking of the scientific estimates and thus generated enormous media attention. The more-careful counts that followed prompted fewer, and less prominent, articles. There’s little doubt that the larger number will live on for years in the writings of antiwar activists. But the rest of us, too, were influenced by it, perhaps more than we realize. We will have to live with its legacy.

Most data create what cognitive scien- tists call “anchoring effects”: we fixate on numbers we’ve heard, even if they’re arbitrary or wrong. In one 1970s experiment, Amos Tversky and Daniel Kahneman (whose work won a Nobel Prize) famously picked a number at random in front of their subjects, by spinning a wheel, and then asked them to guess whether the percentage of African nations in the UN was higher or lower than that number. Next, they asked for a hard estimate of the actual percentage. The higher the random number, the higher the final estimate tended to be, even though the first number had been obviously irrelevant.

These effects persist, infecting our related views, even when the “facts” are subsequently discredited. In one study, for example, experimenters gave students false, negative information about a teacher, but then told them it was incorrect. Nonetheless, when subsequently asked to evaluate that teacher, the students generally turned in worse ratings than did students in a control group that had not heard the bogus information.

We anchor most strongly on the first number we hear, particularly when it is shocking and precise—like, say, 601,027 violent deaths in Iraq. And even when such a number is presented only as a central estimate in a wide range of statistical possibilities (as the Lancet study’s figure was), we tend to ignore the range, focusing instead on the lovely, hard number in the middle. Human beings are terrible at dealing with uncertainty, and besides, headlines seldom highlight margins of error.

When information supports positions we already hold, we of course tend to accept it less critically; when the opposite is true, we can be quite good at shutting the information out. “Motivated reasoning” is a mighty force, as anyone who has argued politics in a bar at 2 a.m. can attest. Scientists have observed the process, using a functional MRI machine to peer into the brain while it processes political statements, and their report is unsurprising. When we are assessing neutral statements, activity is concentrated in the areas that control higher reasoning. But when we process statements with political valence, suddenly our emotional cortices light up as well. Indeed, some research indicates that the emotion precedes, and governs, the higher cognition—that logic is, literally, an afterthought.

But cognitive bias is not limited to partisans; we all anchor on the numbers we hear. The Lancet article’s central estimate exerts a gravitational pull on even its harshest critics, who seem to be mentally benchmarking their estimates by how much they differ from that 601,027. Others who are not motivated to disprove that number tend to orbit even closer.

Once people make an estimate, they have a strong tendency to confirm it. If I ask you whether it is plausible that there are 600,000 Canada geese in Chicago, your thought process might go something like this: Big lake … a lot of parks … very near Canada … OK, sure. Once you’ve said yes to that 600,000 figure, psychological studies show, you’ll continue “recruiting evidence” for it, perhaps noticing an article on a goose refuge near the city. Eventually you’ll wind up surrounded by a little army of facts that support the theory. What most people don’t do is look for ways to falsify it: Shouldn’t the geese still be in Florida at this time of year?

This psychological quirk can create motivated reasoning even in an initially disinterested observer. By the time we’ve finished affirming the figure’s plausibility, it has become ours, and we’ll fight to defend it. Being challenged—say, arguing with a skeptical friend—now makes us dig in. Once mustered, the troops are hard to disperse.

All of this calls into question the idea that even a flawed study is better than no study. Like most people, I believe that more information is usually better; when facts or theories conflict, air the differences and let the facts fight it out. But not every number is a fact. And when the data fall below some threshold of quality, it’s better to have no numbers at all.

When researchers try to collect data in the heat of conflict, the necessary compromises make shocking outliers more likely. Yet early, messy studies are first to press, so the worse the data, the more likely we are to hear about it. Even when bad numbers are overthrown, the correction often comes too late. Appallingly high and, it turns out, inaccurate estimates of deaths resulting from sanctions were among the reasons advanced for invading Iraq. And ultimately, inflated early estimates of casualties can trivialize the very problem they were meant to highlight. When the initial estimates of 250,000 dead in the former Yugoslavia were revised downward, one conflict researcher complains, suddenly the number became “only” 100,000.

Witness the Johns Hopkins team’s critics, who triumphantly waved the WHO results at their opponents. But even if “only” 150,000 people have been killed by violence in Iraq, that’s a damn high price. Conversely, few of the study’s supporters expressed much pleasure at the news that an extra 450,000 people might be walking around in Iraq. After a year and a half of bitter argument, all that anyone seemed interested in was proving they had been right. In counting, we somehow lost track of the mountain of dead bodies piling up beneath our numbers.