A new report revises the civil war's casualty count upward by tens of thousands -- but even that might be too low.
On Wednesday morning, when U.N. High Commissioner for Human Rights Navi Pillay publicly stated that 60,000 people had been killed during Syria's ongoing civil war, it seemed like an already-vague and often-quoted estimate of 40,000 deaths had been revised upward for intuitive but more or less arbitrary reasons. The 40,000 number has been kicked around for months; nudging it a few tens of thousands upward in light of an intensifying and seemingly intractable conflict seemed like a perfectly acceptable, if imprecise, reflection of reality. In truth, Pillay's 60,000 number was arrived at through a painstaking and highly systematic process. And even then, the already staggering number is largely useful as an indicator of broader horrors unfolding within the country.
"This isn't an estimate," emphasized Patrick Ball, chief scientist and vice president of the human rights program at Benetech, a Silicon Valley startup that develops and applies open-source technology to human rights-related uses. "We have the names of 59,000-odd people. Let's be really clear here. This is a very conservative undercount."
In a report commissioned by Pillay's office, Ball and a team of computer scientists and developers attempted to determine how many of the 147,000 total deaths documented by various on-the-ground observers and reporting networks, such as the Violations Documentation Center, the Syrian Network for Human Rights and even the government of Syria itself, could be connected to non-redundant names, places and dates. Ball explained that Benetech created a computer program designed to filter out overlapping pieces of information in the available datasets. Their greatest methodological challenge was developing a program that could master the intricacies of Syrian Arabic, and then determine which of the 147,000 deaths reported across the various datasets corresponded to the same individual.
"With each individual record you have to match it to the other 147,000 records," said Ball. "So there's actually 147,000-squared possible combinations. That's the rough approximation. Computationally, it's a very complicated problem."
Ball said Benetech had already built similar programs for conflicts in places as diverse as Kosovo, Colombia and Timor-Leste, protocols which sweep the available data in order to tell, for instance, if one source's "Jim" was likely the same individual as another source's "Jimmy." Once the organization had created a similar process for Syrian Arabic sources, it could compare all reported deaths to every other reported death, using a variety of data points to filter out redundancies. Names were hardly the only challenge. The program had to account for the sometimes vague ways in which people process time -- for instance, was a Jimmy killed on October 3 the same person as a Jim reported killed "a few days ago" in early October?
"The software uses comparators to figure out what the human beings are doing," said Ball. Through a process called "semi-supervised machine learning," Benetech trained computers to effectively filter through an enormous volume of reported information on deaths in Syria. Around eight months later, they produced a final non-redundant dataset of 59,648 names.
There's plenty to suggest that the report dramatically undercounts the number of actual deaths, something that the document's authors are careful to point out. The report includes a timeline of deaths by week -- but concedes that a decrease in the number of reported deaths might belie an increase in violence. An apparently less violent week might indicate "that documentation has weakened over time, which would mean that violence has increased even more than show," the report says. John Page, a Virginia-based IT professional who runs Syria Tracker, a site that sorts through tips and news reports to determine the scope of the Syria conflict, said that there is some precedent for this. In February 2012, the Bab Amar neighborhood in Aleppo experienced some of the worst fighting of the war up to that point. The local death toll as reported to Syria Tracker actually plunged.
"I wish I could say the number of deaths went down, but I think the reporting network was wiped out," Page said.
The Benetech report is only a reflection of available data -- not a projection, estimate or demographic study. But there is information in the actual dataset itself that points towards a higher -- and maybe even much higher -- number of dead. Ball says that the age of up to three-fourths of recorded victims is missing. Yet a heavy proportion of victims for which age could be determined were between 20 and 30 years old. Meanwhile, only 7.5 percent of the identified victims were female; all of the more than 2,500 dead reported by the Syrian government are men. The dataset tends heavily towards males of traditional fighting age. Either this means that a large percentage of the people killed in the Syria conflict are combatants, or that the current documentation actually under-counts the number of civilian dead.