Richard Berk likes to think he knows what criminals will do—even before they know. The statistics professor, who teaches at the University of Pennsylvania, was recently willing to show off his skills. “What is the highest-risk age for re-offending?” he asked. I hazarded the early 20s, and was quickly corrected. “Teens,” he responded. “Actually, the [rate of re-offending] falls off very quickly in the early 20s.” But the trend line doesn’t hold, Berk explained. Violent activity starts to increase again in individuals a decade or so older. “You’re picking up the domestic-violence offenders,” Berk surmised. “They need someone to beat up on, and they’re in their late 30s.”
This sort of behavioral analysis is at the center of Berk’s expanding work as something of a crime predictor—a number cruncher whose algorithms are helping police and corrections officials forecast recidivism. The Pennsylvania Board of Probation and Parole, for instance, has been working with the professor for the past two years.
The state’s prison system, which for years has exceeded its capacity, paroles nearly 10,000 inmates every year. The parole board relies upon interviews with offenders, a survey, recommendations from prison officials, and victim statements to decide whom to free. The cost of mistakes is high: in 2010, six months after he was released on parole, Michael Ballard killed four people in eastern Pennsylvania. After he was caught, Ballard told the investigating officer to “blame the parole board.”
The public did. But against the backdrop of statewide calls for reform, Berk was already quietly working on a fix: an algorithm that could spit out a prediction of how likely it is that a would-be parolee will re-offend. Berk had begun building a similar algorithm for Philadelphia’s criminal-justice system in 2006, the year Philadelphia logged the highest murder rate among major cities. At the time, Philadelphia’s Adult Probation and Parole Department had 295 officers supervising nearly 50,000 individuals. The department asked Berk to predict which of the 50,000 would commit a serious crime within two years. “Our vision was that every single person, when they walked through the door, would be scored by a computer,” says Ellen Kurtz, the department’s director of research. The department would then use the score—low-, medium-, or high-risk—to decide how intensively to supervise released offenders. Officers assigned to low-risk individuals would handle up to 400 cases, and those monitoring high-risk offenders would have about 50.
Drawing from criminal databases dating to the 1960s, Berk initially modeled the Philadelphia algorithm on more than 100,000 old cases, relying on three dozen predictors, including the perpetrator’s age, gender, neighborhood, and number of prior crimes. To develop an algorithm that forecasts a particular outcome—someone committing murder, for example—Berk applied a subset of the data to “train” the computer on which qualities are associated with that outcome. “If I could use sun spots or shoe size or the size of the wristband on their wrist, I would,” Berk said. “If I give the algorithm enough predictors to get it started, it finds things that you wouldn’t anticipate.” Philadelphia’s parole officers were surprised to learn, for example, that the crime for which an offender was sentenced—whether it was murder or simple drug possession—does not predict whether he or she will commit a violent crime in the future. Far more predictive is the age at which he (yes, gender matters) committed his first crime, and the amount of time between other offenses and the latest one—the earlier the first crime and the more recent the last, the greater the chance for another offense.
Risk assessment in the justice system isn’t new. In 1927, Ernest Burgess, a sociologist at the University of Chicago, drew on the records of 3,000 parolees in Illinois to estimate an individual’s likelihood of recidivism. Today, the LSI-R, a 54-question survey developed in Canada (the same one used by the Pennsylvania parole board), and COMPAS, a similar tool created by a Michigan-based company, are the most popular of hundreds of risk-assessment instruments. But Berk’s methods may represent a significant advance. “I use tens of thousands of cases to build the system, [as well as] asymmetric costs of false positives and false negatives, real tests of forecasting accuracy, the discovery of new forecasting relationships, and yes, machine learning,” he said. How do the old methods stack up against Berk’s? “It’s like comparing a Ford Focus to a Ferrari,” he told me.
Berk’s expertise is being sought at nearly every stage of the criminal-justice process. Maryland is running an algorithm like Philadelphia’s that predicts who under supervision will kill—or be killed. The state has asked Berk to develop a similar algorithm for juveniles. He is also mining data from the Occupational Safety and Health Administration to forecast which businesses nationwide are most likely to be breaking OSHA rules. Back in Philadelphia, he is introducing statistics to the district attorney’s office, helping prosecutors decide which charges to pursue and whether to ask for bail. He may also work with the Pennsylvania sentencing commission to help determine whether and how long to incarcerate those convicted of crimes.
Is this a good thing? Berk’s algorithms evaluate offenders not as individuals, but as members of a group, about whom certain statistical probabilities exist. But most of us believe that individuals should be punished for what they do, not who they are. Consider race. In Berk’s experience, no institution has used it as a predictor, but it can enter the algorithm indirectly. Philadelphia, for example, factors in zip code, which often correlates with race.
Moreover, Philadelphia’s algorithm—like most other risk-assessment tools—relies heavily on variables related to the perpetrator’s criminal record. “When you live in a world in which juveniles are much more likely to be stopped—or, if stopped, be arrested, or, if arrested, be adjudicated—if they are black, then all of the indicators associated with prior criminal history are going to be serving effectively as a proxy for race,” said Bernard Harcourt, a law and political-science professor at the University of Chicago, who wrote Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. By using prior record to predict dangerousness, he insisted, “you just inscribe the racial discrimination you have today into the future.”
Ellen Kurtz has a ready response. “The commission of crime is not randomly or evenly distributed in our society,” she told me. “If you wanted to remove everything correlated with race, you couldn’t use anything. That’s the reality of life in America.” Harcourt counters that actuarial prediction inserts race into the analysis by over-sampling from a high-offending population.
In September, the Supreme Court appeared ready to take sides in the debate. It issued a last-minute stay of Duane Buck’s impending execution in Texas, saying that it would consider reviewing an appeal from his lawyers objecting to an expert’s testimony about Buck’s future dangerousness. Two months later, however, the Court decided not to review the appeal and lifted the stay, thereby allowing the expert’s testimony to stand. What, precisely, did Buck’s lawyers say the expert did wrong? He testified that blacks are more likely to commit violence.