In a report commissioned by Pillay's office, Ball and
a team of computer scientists and developers attempted to determine how many of the 147,000 total deaths documented by various on-the-ground observers and
reporting networks, such as the Violations Documentation Center, the Syrian Network for Human Rights and even the government of Syria itself, could be connected to non-redundant names,
places and dates. Ball explained that Benetech created a computer program designed to filter out overlapping pieces of information in the available
datasets. Their greatest methodological challenge was developing a program that could master the intricacies of Syrian Arabic, and then determine which of
the 147,000 deaths reported across the various datasets corresponded to the same individual.
"With each individual record you have to match it to the other 147,000 records," said Ball. "So there's actually 147,000-squared possible
combinations. That's the rough approximation. Computationally, it's a very complicated problem."
Ball said Benetech had already built similar programs for conflicts in places as diverse as Kosovo, Colombia and Timor-Leste, protocols which
sweep the available data in order to tell, for instance, if one source's "Jim" was likely the same individual as another source's "Jimmy." Once the
organization had created a similar process for Syrian Arabic sources, it could compare all reported deaths to every other reported death, using a variety
of data points to filter out redundancies. Names were hardly the only challenge. The program had to account for the sometimes vague ways in which people
process time -- for instance, was a Jimmy killed on October 3 the same person as a Jim reported killed "a few days ago" in early October?
"The software uses comparators to figure out what the human beings are doing," said Ball. Through a process called "semi-supervised machine learning,"
Benetech trained computers to effectively filter through an enormous volume of reported information on deaths in Syria. Around eight months later, they
produced a final non-redundant dataset of 59,648 names.
There's plenty to suggest that the report dramatically undercounts the number of actual deaths, something that the document's authors are careful to point out. The report includes a timeline of deaths by week -- but concedes that a decrease in the number of reported deaths might belie an increase in
violence. An apparently less violent week might indicate "that documentation has weakened over time, which would mean that violence has increased even
more than show," the report says. John Page, a Virginia-based IT professional who runs Syria Tracker, a site that sorts
through tips and news reports to determine the scope of the Syria conflict, said that there is some precedent for this. In February 2012, the Bab Amar
neighborhood in Aleppo experienced some of the worst fighting of the war up to that point. The local death toll as reported to Syria Tracker actually
plunged.