A new report revises the civil war's casualty count upward by tens of thousands -- but even that might be too low.
On Wednesday morning, when U.N. High Commissioner for Human Rights Navi Pillay publicly stated that 60,000 people had been killed during Syria's ongoing civil war, it seemed like an already-vague and often-quoted estimate of 40,000 deaths had been revised upward for intuitive but more or less arbitrary reasons. The 40,000 number has been kicked around for months; nudging it a few tens of thousands upward in light of an intensifying and seemingly intractable conflict seemed like a perfectly acceptable, if imprecise, reflection of reality. In truth, Pillay's 60,000 number was arrived at through a painstaking and highly systematic process. And even then, the already staggering number is largely useful as an indicator of broader horrors unfolding within the country.
"This isn't an estimate," emphasized Patrick Ball, chief scientist and vice president of the human rights program at Benetech, a Silicon Valley startup that develops and applies open-source technology to human rights-related uses. "We have the names of 59,000-odd people. Let's be really clear here. This is a very conservative undercount."
In a report commissioned by Pillay's office, Ball and a team of computer scientists and developers attempted to determine how many of the 147,000 total deaths documented by various on-the-ground observers and reporting networks, such as the Violations Documentation Center, the Syrian Network for Human Rights and even the government of Syria itself, could be connected to non-redundant names, places and dates. Ball explained that Benetech created a computer program designed to filter out overlapping pieces of information in the available datasets. Their greatest methodological challenge was developing a program that could master the intricacies of Syrian Arabic, and then determine which of the 147,000 deaths reported across the various datasets corresponded to the same individual.
"With each individual record you have to match it to the other 147,000 records," said Ball. "So there's actually 147,000-squared possible combinations. That's the rough approximation. Computationally, it's a very complicated problem."
Ball said Benetech had already built similar programs for conflicts in places as diverse as Kosovo, Colombia and Timor-Leste, protocols which sweep the available data in order to tell, for instance, if one source's "Jim" was likely the same individual as another source's "Jimmy." Once the organization had created a similar process for Syrian Arabic sources, it could compare all reported deaths to every other reported death, using a variety of data points to filter out redundancies. Names were hardly the only challenge. The program had to account for the sometimes vague ways in which people process time -- for instance, was a Jimmy killed on October 3 the same person as a Jim reported killed "a few days ago" in early October?
"The software uses comparators to figure out what the human beings are doing," said Ball. Through a process called "semi-supervised machine learning," Benetech trained computers to effectively filter through an enormous volume of reported information on deaths in Syria. Around eight months later, they produced a final non-redundant dataset of 59,648 names.
There's plenty to suggest that the report dramatically undercounts the number of actual deaths, something that the document's authors are careful to point out. The report includes a timeline of deaths by week -- but concedes that a decrease in the number of reported deaths might belie an increase in violence. An apparently less violent week might indicate "that documentation has weakened over time, which would mean that violence has increased even more than show," the report says. John Page, a Virginia-based IT professional who runs Syria Tracker, a site that sorts through tips and news reports to determine the scope of the Syria conflict, said that there is some precedent for this. In February 2012, the Bab Amar neighborhood in Aleppo experienced some of the worst fighting of the war up to that point. The local death toll as reported to Syria Tracker actually plunged.
"I wish I could say the number of deaths went down, but I think the reporting network was wiped out," Page said.
The Benetech report is only a reflection of available data -- not a projection, estimate or demographic study. But there is information in the actual dataset itself that points towards a higher -- and maybe even much higher -- number of dead. Ball says that the age of up to three-fourths of recorded victims is missing. Yet a heavy proportion of victims for which age could be determined were between 20 and 30 years old. Meanwhile, only 7.5 percent of the identified victims were female; all of the more than 2,500 dead reported by the Syrian government are men. The dataset tends heavily towards males of traditional fighting age. Either this means that a large percentage of the people killed in the Syria conflict are combatants, or that the current documentation actually under-counts the number of civilian dead.
The data hints at another reason to view Benetech's confirmed 60,000 deaths as a low if systematically established figure. A large share of the report's data come from organizations with some kind of formal connection to the Syrian opposition. Benetech only counted victims for whom they had a name, date and place of death, and the vast majority of deaths are confirmed through multiple sources. But opposition groups could be under-counting deaths in areas they can't access -- Page, for instance, noted that casualty reporting from the Deir Zur area is currently lacking compared to other parts of the country, despite evidence of shelling and heavy combat. The opposition might not have accurate data on the number of government forces killed over the course of the war, and they might even have incentives to deliberately under-count combatant casualties on both sides.
A spokesperson for the Free Syria Database, a human rights monitor whose work was consulted for the Benetech report, raises another reason to believe that available counts are low: "Since the start, many families bury their dead and don't mention the names because of fear that the family members will be targeted by the government," the group said in an email, adding that they "think the number of government force deaths are the most under-reported because it is in the interest of both sides to keep them reported as being low."
There are even reported deaths in the dataset that the Benetech report doesn't count at all. The 60,000 number doesn't include deaths if there is no corresponding name, date and location of death. Neither does the 60,000 number account for anecdotal reports: if a reported death lacks one of those pieces of information, or if a death is based only on a secondhand description of an event ("three people were killed in Homs on Wednesday," for example) it didn't make it in to the final total.
There are also deaths that, by virtue of the very limited aims of the Benetech report (namely, weeding out duplicates in existing reports), fall outside the project's scope. Ball explained that his report didn't count excess deaths -- deaths above the peacetime death rate that are not necessarily violent, yet still attributable to environmental conditions imposed by constant war. Determining the excess death rate would require demographic or epidemiological surveys akin to The Lancet's highly controversial Iraq mortality estimate from 2006. The International Rescue Committee counted 5.4 million excess deaths during the Democratic Republic of Congo's ongoing conflict, and studies have found that only 2 percent of the war's victims were killed as a result of direct violence.
As Page explained, there are clearly public health issues at play in Syria that a straight body count would miss. Page says Syria Tracker is trying to "track disease outbreak"; for instance, "lack of garbage pickup in Aleppo has created a huge problem there." Even when the conflict ends, establishing a population baseline for an excess death study could prove difficult: Page said the last reliable population count of the country is from 2006, and the country's Alawite population hasn't been tallied since the 1930s.
The Benetech report is the best available account of the number of deaths that human rights monitors definitively know about. It is not an estimate or even a reliable death toll, and it is not designed to be. For those who have been following the conflict from afar, this raises a sobering possibility. There may be no conflict in history so painstakingly recorded in real time. Finding up-to-date video or photographs on social media is effortless; massacres, shillings and attacks become worldwide news almost as soon as they happen. Even then, the true scope of the country's tragedy remains unknown. This week's report should provide future death estimates with a firm and scientifically-established starting count. But the final number will likely be much, much higher than that.