In 2001, an analyst in the DNA unit of Arizona’s state crime laboratory noticed something interesting. Two seemingly unrelated individuals—one white and one black—shared the same two markers at nine of the 13 places in the standard DNA profile. Yet that particular genetic profile should have been exceedingly rare.

According to the standard method of computing how often one might expect to encounter a particular DNA profile in the population at large—what is known as the “random match probability”—if you plucked a non-Hispanic white person at random from the population, there would be only a 1 in 754 million chance of finding that profile. For African Americans, the number was 1 in 561 billion. And yet here, in a database of less than 100,000 people, it was appearing twice—and in people of different races.

The DNA-unit analyst wrote up a quick summary of her findings and submitted the results to a major international forensic-DNA conference. Her observations came to the attention of a public defender in San Francisco, who held a master’s degree in genetics and was in midst of defending a California man, John Puckett, accused of a rape and murder from decades earlier. Police had collected forensic evidence in 1972, when a nurse was found sexually assaulted and fatally stabbed, but DNA typing was still decades away. The case sat open until, more than 30 years later, investigators dusted off the badly degraded DNA samples, tested them, and ran the results through the state database. A partial match linked then 70-year-old, wheelchair-bound John Puckett to the only testable evidence—sperm found on the body. On the basis of this match, prosecutors charged Puckett with murder.

Puckett’s defense lawyer contacted the Arizona lab for more information about their findings, but the head of the lab denied the request. After a court issued a subpoena to compel the lab to disclose its findings, the analyst who had found the matching nine-locus pair testified that she had actually found ninety others within the database. When the lab offered no explanation for why 1 in 1 trillion events were happening regularly, the court ordered them to conduct a full search of the known-offender database and report back all matching pairs.

Ultimately, the lab’s report showed that there were actually quite a large number of these matches. The Arizona database had only 65,493 people in it, each identified by the two markers at 13 places that constituted his or her DNA profile. Yet 122 sets of people shared the same genetic markers at nine places of the 13, and some even shared markers at 10, 11, or 12 places. It’s like assuming that you have a fairly unique identifier—such as 26 digits that represent birthday, bank account, and social-security numbers all combined together—only to learn that a significant number of people share most of those numbers, and in the same order, as you.

As news of these unexpected pairings swept the nation, lawyers in other cities pressed for similar searches. If there were 122 matches in a 65,000-person sized database, how many such matches might be found in the 11 million-person national database? But rather than embrace the inquiry, the FBI called the Arizona results “misleading” and “meaningless,” and suppressed the findings. FBI leaders reprimanded the Arizona lab, claiming that disclosing the results violated its agreement with the FBI. They further threatened to cut off access to the national database to any lab that independently conducted their own such studies.

Why were the findings from the Arizona lab so explosive? The answer turns half on an understanding of math, and half on an understanding of law. And as is so often the case with forensic evidence, the gap between those two worlds proved critical.

* * *

At the time of the Arizona findings, state and national DNA databases had started to blossom. In its early days, most people thought of DNA testing as a tool to confirm the identity of a person that police had identified as a suspect in a crime. But it was on the brink of becoming something much more significant. The idea of “big data”—using vast networks of computers to churn unprecedented amounts of information—was on the cusp of taking off. For instance, although law-enforcement agencies had an incredible trove of fingerprint data, computerized searching didn’t become commonplace until 1999.

The FBI built a computerized network for its large national repository of DNA profiles—known as the Combined DNA Index System, or CODIS—and then built software to look for associations between all the profiles it contained. This meant that a new kind of “cold hit” case—one, like John Puckett’s, that was prompted by genetic identity rather than conventional investigative leads—came to the fore.

Some cold hit cases become “hot” immediately upon investigation, with the non-genetic evidence falling into place. But some cold hit cases, like Puckett’s, stay cold. None of the many fingerprints found at the scene matched him.

Yet prosecutors were willing to press for conviction based on the DNA match alone. Puckett mostly matched the description given by the sole eyewitness—he was the right age, gender, and race, and had been in the area at that time. He had also previously sexually assaulted three women around the same time, the convictions that had landed him in the database in the first place.

This case illustrates the importance of the 2001 Arizona findings, and the resulting national debate among mathematicians, lawyers, and forensic scientists. The simple explanation for the seemingly improbable matches—which a forensic or statistical expert would see straight away, but police, prosecutors, and testifying lab analysts would not—lies in a mathematical parable known as the birthday problem: How many people must there be in a group to have more than a 50 percent chance that two of them will have the same birthday? Despite the intuitive answer (a very large group), the correct answer is that it takes only 23 people.

It’s key to note that the question of the birthday problem is different than asking what the likelihood is that, picking a person at random on the street, that person would have a particular birthday. Similarly, the difference between “Does anyone in the database match anyone else?” and “Does anyone in the database match this evidence?” explains why nine-locus matches were likely to be common in a large database like Arizona’s.

Even so, cases around the country routinely proceeded on the basis of only a nine-locus database match, treated by lawyers and courts alike as conclusive proof of guilt.

In John Puckett’s case, that’s exactly what happened. Before the trial, the prosecutor proposed to tell the jury the random match probability, which was calculated as 1 in 1.1 million. His defense lawyer pressed the court to allow her to present an alternative match statistic, one in three.

The defense’s alternative statistic, known as the “database match probability” (DMP), accounts for the difference between a truly random match, and a match made among a finite pool of candidates, like those contained within the database. DMP was put forward in 1996 as the proper method by a blue-ribbon panel of experts at the National Academy of Sciences in what is considered the single most authoritative report on DNA evidence in criminal cases. But there were other ways that the statistical significance of Puckett’s match could have been presented. Another approach—and probably the one most helpful to the jury—would have been to ask, “Of all the men who lived in the metropolitan area at the time of the killing, and who were the right age to have committed the offense, how many would likely match the crime scene evidence?” In Puckett’s case, the result of this approach, nicknamed the “n*p” statistic, was that at least two other people living in the area at that time matched the evidence.

Each of these statistics has a very different interpretation of the significance of the DNA-database match. Yet all are legitimate in one way or another, and there remains a lack of consensus among statisticians as to which one deserves priority within the criminal-justice system. Some defense lawyers have argued that this disagreement requires courts to reject database-match cases altogether. Others have sought additional confirming testing, or at the very least, presentation of conflicting statistics.

As U.S. databases continue to expand, and cold-hit searches continue, this disagreement becomes increasingly important. A 2014 report by the European Network of Forensic Science Institutes spelled it out in plain language: “[a]s DNA-databases become larger, the chance of finding adventitious matches also increases, especially with partial and mixed profiles and DNA-profiles of relatives, which have higher random match probabilities.” The organization recommended last year that additional DNA testing be done in cases where a database match is the only thing linking someone to a crime. They also recommended that database managers keep a record of the number of adventitious matches, along with the conditions under which they were found (such as size of the database, number of searches) for future analysis.

Right now, the only “bad” cold hits that receive attention are those in which law enforcement seriously blunders—cases in which the suspects are fortunate enough to have ironclad alibis. For example, in 2000 in the U.K., police used a six-locus match to arrest a 49-year-old man for a burglary that occurred 200 miles away. One account placed the rarity of that profile as 1 in 37 million. Trouble was, the man was severely disabled by late-stage Parkinson’s disease, and physically incapable of committing the crime. Additional testing eventually exonerated him.

The judge in Puckett’s case ruled that the jury should hear only part of the story recounted here. Jurors heard only the prosecution’s probability statistic—that there was a 1 in 1.1 million chance that a person picked at random would match the crime-scene DNA. But the jurors never heard that Puckett had been picked as a result of a nonrandom trawl through a police database. They also never heard about the Arizona matches, or the fact that sharing alleles at nine loci is not uncommon. They did not learn that, even using the government’s own probability statistic, around 40 other people in California matched that crime scene evidence, or that, according to the database-match statistic endorsed by that 1996 report—the bible of forensic DNA—the probability of a match in the database searched by the government was 1 in 3. They never learned that it was likely that two other people in the area also matched the same evidence.

Indeed, just over five years after the trial, the FBI announced that the tables it created to compute DNA statistics—the data that the Arizona matches had called into question—contained errors. In a case with high quality and quantity DNA, the mistakes appear negligible. But in cases involving incomplete results like Puckett’s, the error’s effects are dramatic. Last month, one jurisdiction in Texas reported that a DNA-match probability computed according to the erroneous table was 1 in a billion; when corrected, the accurate figure was 1 in 100.

But the news came too late for Puckett: By the time the FBI admitted its mistake, he had already begun serving his sentence of life without parole.

This article is adapted from Erin Murphy's book, Inside the Cell: The Dark Side of Forensic DNA.