Doctors Still Struggle to Diagnose a Condition That Kills More Americans Than Stroke

Can computers crack the code of sepsis?

A cross filled with electronic-health-record jargon
Joanne Imperio / The Atlantic

This article was originally published in Undark Magazine.

Ten years ago, 12-year-old Rory Staunton dove for a ball in gym class and scraped his arm. He woke up the next day with a 104-degree Fahrenheit fever, so his parents took him to the pediatrician and eventually the emergency room. It was just the stomach flu, they were told. Three days later, Rory died of sepsis after bacteria from the scrape infiltrated his blood and triggered organ failure.

“How does that happen in a modern society?” his father, Ciaran Staunton, asked me.

Each year in the United States, sepsis kills more than a quarter million people—more than stroke, diabetes, or lung cancer. One reason for all this carnage is that if sepsis is not detected in time, it’s essentially a death sentence. Consequently, much research has focused on catching sepsis early, but the condition’s complexity has plagued existing clinical support systems—electronic tools that use pop-up alerts to improve patient care—with low accuracy and high rates of false alarm.

That may soon change. Back in July, Johns Hopkins researchers published a trio of studies in Nature Medicine and npj Digital Medicine showcasing an early-warning system that uses artificial intelligence. The system caught 82 percent of sepsis cases and significantly reduced mortality. While AI—in this case, machine learning—has long promised to improve health care, most studies demonstrating its benefits have been conducted using historical data sets. Sources told me that, to the best of their knowledge, when used on patients in real time, no AI algorithm has shown success at scale. Suchi Saria, the director of the Machine Learning and Healthcare Lab at Johns Hopkins University and the senior author of the studies, said in an interview that the novelty of this research is how “AI is implemented at the bedside, used by thousands of providers, and where we’re seeing lives saved.”

The Targeted Real-Time Early Warning System scans through hospitals’ electronic health records—digital versions of patients’ medical histories—to identify clinical signs that predict sepsis, alert providers about at-risk patients, and facilitate early treatment. Leveraging vast amounts of data, TREWS provides real-time patient insights and a unique level of transparency in its reasoning, according to the Johns Hopkins internal-medicine physician Albert Wu, a co-author of the study.

Wu says that this system also offers a glimpse into a new age of medical electronization. Since their introduction in the 1960s, electronic health records have reshaped how physicians document clinical information; nowadays, however, these systems primarily serve as “an electronic notepad,” he added. With a series of machine-learning projects on the horizon, both from Johns Hopkins and other groups, Saria says that using electronic records in new ways could transform health-care delivery, providing physicians with an extra set of eyes and ears—and helping them make better decisions.

It’s an enticing vision, but one in which Saria, the CEO of the company developing TREWS, has a financial stake. This vision also discounts the difficulties of implementing any new medical technology: Providers might be reluctant to trust machine-learning tools, and these systems might not work as well outside controlled research settings. Electronic health records also come with many existing problems, from burying providers under administrative work to risking patient safety because of software glitches.

Saria is nevertheless optimistic. “The technology exists; the data is there,” she says. “We really need high-quality care-augmentation tools that will allow providers to do more with less.”


Currently, there’s no single test for sepsis, so health-care providers have to piece together their diagnoses by reviewing a patient’s medical history, conducting a physical exam, running tests, and relying on their own clinical impressions. Given such complexity, over the past decade, doctors have increasingly leaned on electronic health records to help diagnose sepsis, mostly by employing a rules-based criteria—if this, then that.

One such example, known as the SIRS criteria, says a patient is at risk of sepsis if two of four clinical signs—body temperature, heart rate, breathing rate, white-blood-cell count—are abnormal. This broadness, although helpful for catching the various ways sepsis might present itself, triggers countless false positives. Take a patient with a broken arm: “A computerized system might say, ‘Hey, look, fast heart rate, breathing fast.’ It might throw an alert,” says Cyrus Shariat, an ICU physician at Washington Hospital in California. The patient almost certainly doesn’t have sepsis but would nonetheless trip the alarm.

These alerts also appear on providers’ computer screens as a pop-up, which forces them to stop whatever they’re doing to respond. So, despite these rules-based systems occasionally reducing mortality, there’s a risk of alert fatigue, where health-care workers start ignoring the flood of irritating reminders. According to M. Michael Shabot, a surgeon and the former chief clinical officer of Memorial Hermann Health System, “It’s like a fire alarm going off all the time. You tend to be desensitized. You don’t pay attention to it.”

Already, electronic records aren’t particularly popular among doctors. In a 2018 survey, 71 percent of physicians said that the records greatly contribute to burnout, and 69 percent said that they take valuable time away from patients. Another 2016 study found that, for every hour spent on patient care, physicians have to devote two extra hours to electronic health records and desk work. James Adams, the chair of the Department of Emergency Medicine at Northwestern University, calls electronic health records a “congested morass of information.”

But Adams also says that the health-care industry is at an inflection point to transform the files. An electronic record doesn’t have to simply involve a doctor or nurse putting data in, he says; instead, it “needs to transform to be a clinical-care-delivery tool.” With their universal deployment and real-time patient data, electronic records could warn providers about sepsis and various other conditions—but that will require more than a rules-based approach.

What doctors need, according to Shabot, is an algorithm that can integrate various streams of clinical information to offer a clearer, more accurate picture when something’s wrong.


Machine-learning algorithms work by looking for patterns in data to predict a particular outcome, like a patient’s risk of sepsis. Researchers train the algorithms on existing data sets, which helps the algorithms create a model for how that world works and then make predictions on new data sets. The algorithms can also actively adapt and improve over time, without the interference of humans.

TREWS follows this general mold. The researchers first trained the algorithm on historical electronic-records data so that it could recognize early signs of sepsis. After this testing showed that TREWS could have identified patients with sepsis hours before they actually got treatment, the algorithm was deployed inside hospitals to influence patient care in real time.

Saria and Wu published three studies on TREWS. The first tried to determine how accurate the system was, whether providers would actually use it, and if use led to earlier sepsis treatment. The second went a step further to see if using TREWS actually reduced patient mortality. And the third interviewed 20 providers who tested the tool on what they thought about machine learning, including what factors facilitate versus hinder trust.

In these studies, TREWS monitored patients in the emergency department and inpatient wards, scanning through their data—vital signs, lab results, medications, clinical histories, and provider notes—for early signals of sepsis. (Providers could do this themselves, Saria says, but it might take them about 20 to 40 minutes.) If the system suspected organ dysfunction based on its analysis of millions of other data points, it flagged the patient and prompted providers to confirm sepsis, dismiss the alert, or temporarily pause the alert.

“This is a colleague telling you, based upon data and having reviewed all this person’s chart, why they believe there’s reason for concern,” Saria says. “We very much want our frontline providers to disagree, because they have ultimately their eyes on the patient.” And TREWS continuously learns from these providers’ feedback. Such real-time improvements, as well as the diversity of data TREWS considers, are what distinguish it from other electronic-records tools for sepsis.

In addition to these functional differences, TREWS doesn’t alert providers with incessant pop-up boxes. Instead, the system uses a more passive approach, with alerts arriving as icons on the patient list that providers can click on later. Initially, Saria was worried this might be too passive: “Providers aren’t going to listen. They’re not going to agree. You’re mostly going to get ignored.” However, clinicians responded to 89 percent of the system’s alerts. One physician interviewed for the third study described TREWS as less “irritating” than the previous rules-based system.

Saria says that TREWS’s high adoption rate shows that providers will trust AI tools. But Fei Wang, an associate professor of health informatics at Weill Cornell Medicine, is more skeptical about how these findings will hold up if TREWS is deployed more broadly. Although he calls these studies first-of-a-kind and thinks their results are encouraging, he notes that providers can be conservative and resistant to change: “It’s just not easy to convince physicians to use another tool they are not familiar with,” Wang says. Any new system is a burden until proven otherwise. Trust takes time.

TREWS is further limited because it only knows what’s been inputted into the electronic health record—the system is not actually at the patient’s bedside. As one emergency-department physician put it, in an interview for the third study, the system “can’t help you with what it can’t see.” And even what it can see is filled with missing, faulty, and out-of-date data, according to Wang.

But Saria says that TREWS’s strengths and limitations complement those of health-care providers. Although the algorithm can analyze massive amounts of clinical data in real time, it will always be limited by the quality and comprehensiveness of the electronic health record. The goal, Saria adds, is not to replace physicians, but to partner with them and augment their capabilities.


The most impressive aspect of TREWS, according to Zachary Lipton, an assistant professor of machine learning and operations research at Carnegie Mellon University, is not the model’s novelty, but the effort it must have taken to deploy it on 590,736 patients across five hospitals over the course of the study. “In this area, there is a tremendous amount of offline research,” Lipton says, but relatively few studies “actually make it to the level of being deployed widely in a major health system.” It’s so difficult to perform research like this “in the wild,” he adds, because it requires collaborations across various disciplines, from product designers to systems engineers to administrators.

As such, by demonstrating how well the algorithm worked in a large clinical study, TREWS has joined an exclusive club. But this uniqueness may be fleeting. Duke University’s Sepsis Watch algorithm, for one, is currently being tested across three hospitals following a successful pilot phase, with more data forthcoming. In contrast with TREWS, Sepsis Watch uses a type of machine learning called deep learning. Although this can provide more powerful insights, how the deep-learning algorithm comes to its conclusions is unexplainable—a situation that computer scientists call the black-box problem. The inputs and outputs are visible, but the process in between is impenetrable.

On the one hand, there’s the question of whether this is really a problem: Doctors don’t always know how drugs work, Adams says, “but at some point, we have to trust what the medicine is doing.” Lithium, for example, is a widely used, effective treatment for bipolar disorder, but nobody really understands exactly how it works. If an AI system is similarly useful, maybe interpretability doesn’t matter.

Wang suggests that that’s a dangerous conclusion. “How can you confidently say your algorithm is accurate?” he asks. After all, it’s difficult to know anything for sure when a model’s mechanics are a black box. That’s why TREWS, a simpler algorithm that can explain itself, might be a more promising approach. “If you have this set of rules,” Wang says, “people can easily validate that everywhere.”

Indeed, providers trusted TREWS largely because they could see descriptions of the system’s process. Of the clinicians interviewed, none fully understood machine learning, but that level of comprehension wasn’t necessary.


In machine learning, although the specific algorithmic design is important, the results have to speak for themselves. By catching 82 percent of sepsis cases and reducing time to antibiotics by 1.85 hours, TREWS ultimately reduced patient deaths. “This tool is, No. 1, very good; No. 2, received well by clinicians; and No. 3, impacts mortality,” Adams says. “That combination makes it very special.”

However, Shariat, the ICU physician at Washington Hospital in California, was more cautious about these findings. For one, these studies only compared patients with sepsis who had the TREWS alert confirmed within three hours to those who didn’t. “They’re just telling us that this alert system that we’re studying is more effective if someone responds to it,” Shariat says. A more robust approach would have been to conduct a randomized controlled trial—the gold standard of medical research—where half of patients got TREWS in their electronic record while the other half didn’t. Saria says that randomization would have been difficult to do given patient-safety concerns, and Shariat agrees. Even so, he says that the absence “makes the data less rigorous.”

Shariat also worries that the sheer volume of alerts, with about two out of three being false positives, might contribute to alert fatigue—and potentially overtreatment with fluids and antibiotics, which can lead to serious medical complications such as pulmonary edema and antibiotic resistance. Saria acknowledges that TREWS’s false-positive rate, although lower than that of existing electronic-health-record systems, could certainly improve, but says it will always be crucial for clinicians to continue to use their own judgment.

The studies also have a conflict of interest: Saria is entitled to revenue distribution from TREWS, as is Johns Hopkins. “If this goes prime time, and they sell it to every hospital, there’s so much money,” Shariat says. “It’s billions and billions of dollars.”

Saria maintains that these studies went through rigorous internal and external review processes to manage conflicts of interest, and that the vast majority of study authors don’t have a financial stake in this research. Regardless, Shariat says it will be crucial to have independent validation to confirm these findings and ensure the system is truly generalizable.

The Epic Sepsis Model, a widely used algorithm that scans through electronic records but doesn’t use machine learning, is a cautionary example here, according to David Bates, the chief of general internal medicine at Brigham and Women’s Hospital. He explains that the model was developed at a few health systems with promising results before being deployed at hundreds of others. The model then deteriorated, missing two-thirds of patients with sepsis and having a concerningly high false-positive rate. “You can’t really predict how much the performance is going to degrade,” Bates says, “without actually going and looking.”

Despite the potential drawbacks, Orlaith Staunton, Rory’s mother, told me that TREWS could have saved her son’s life. “There was complete breakdown in my son’s situation,” she said; none of his clinicians considered sepsis until it was too late. An early-warning system that alerted them about the condition, she added, “would make the world of difference.”

After Rory’s death, the Stauntons started the organization End Sepsis to ensure that no other family would have to go through their pain. In part because of their efforts, New York State mandated that hospitals develop sepsis protocols, and the CDC launched a sepsis-education campaign. But none of this will ever bring back Rory, Ciaran Staunton said: “We will never be happy again.”

This research is personal for Saria as well. Almost a decade ago, her nephew died of sepsis. By the time it was discovered, there was nothing his doctors could do. “It all happened too quickly, and we lost him,” she says. That’s precisely why early detection is so important—life and death can be mere minutes away. “Last year, we flew helicopters on Mars,” Saria says, “but we’re still freaking killing patients every day.”