America’s Most Reliable Pandemic Data Are Now at Risk

The Biden administration has to make a choice: Should it undo a vital system that Trump’s health department created?

A cross made of white number in columns fades away against a red background
Katie Martin / The Atlantic

When a hospital is in trouble, the signs are unmistakable. The number of COVID-19 admissions rises quickly. The number of patients who remain hospitalized grows steadily—and the bar to be admitted gets higher. The percentage of patients in intensive-care units increases. Supplies run low. As an ICU nears capacity, sick people get less care than they would have. More people suffer, and more people die. Right now, in Alabama, Arizona, and California—Los Angeles, especially—this is exactly what’s happening. We know this because of the data system that’s now in place.

But until recently, we did not have this national picture. Who had the most COVID-19 patients? Which hospitals’ ICUs were overrun? Who had staffing shortages? No one could say. Even assuming that the federal government could have executed a competent pandemic response, it couldn’t know where help was needed.

The government needed a national hospital-data system. So multiple teams scrambled to build one. In a short time, control over this hospitalization data became one of the most hotly contested elements of the American response, as fears of Trump-administration meddling cast doubt on the Department of Health and Human Services. Now the Biden administration is poised to take over as the country faces the worst surge yet, and hospitalization data may be the most important information it will have in the fight to save lives. The administration must decide where those data will live.

Since July, these data have been routed through the Department of Health and Human Services, but some officials inside the CDC are trying to regain control. It might seem obvious that the CDC, the traditional repository of infectious-disease information, should win this intragovernmental battle, but the reality is much more complicated. The current, HHS-run system works—unlike so much else in the response—and with these data flowing in, the federal government can dispatch help to hospitals that need it. If the new administration changed that system, it would be setting aside the best available data about the pandemic, and gambling that it could build a better system when it cannot afford to lose.

Since March, I’ve run the COVID Tracking Project with Erin Kissane, dozens of staffers, and hundreds of volunteers. We have pieced together national data sets on tests, cases, hospitalizations, and deaths by compiling the information that states publish. The hospitalization data that we’ve pulled from the states became the de facto national standard for the majority of the pandemic. Our team has made hundreds of contacts with local, state, and federal officials to clarify what the numbers on all those dashboards actually mean. And through that work, we’ve been able to compare what states say is happening with whatever the federal government publishes.

Hospitalization data reveal the condition of the country’s hospitals: COVID-19 admissions, currently hospitalized patients, ICU availability, and access to personal protective equipment and other supplies. In July, the Department of Health and Human Services directed hospitals to send information directly to an HHS database, bypassing the CDC, which prompted a series of articles in The New York Times casting doubt on the HHS system. There was reason to worry: HHS officials had tried to pressure infectious-disease experts, including Anthony Fauci, to echo President Donald Trump’s misleading public messaging about the pandemic. And Secretary Alex Azar and Deborah Birx, the White House’s coronavirus-response coordinator, were seen as beholden to Trump. Anonymous CDC officials said the change had been a surprise, and insinuated that perhaps the data would be manipulated.

But what really happened is widely, wildly misunderstood. Although the CDC did not respond to multiple requests for comment on this story, internal communications show that the agency agreed to the change because of the limitations of its own system. And while the switch was rocky at first, over time the HHS system has become the most reliable source of federal pandemic data.

“I’m not going to pretend that the data wasn’t messy at first, but the aspiration was valuable, and particularly over the last few months, you could see the data getting better and better,” David Rubin, the director of the PolicyLab at Children’s Hospital of Philadelphia, who has worked extensively with COVID-19 data, told me. “I think it would be a grave mistake to throw it out and go back to what we were doing before.”

At the COVID Tracking Project, we were initially dismayed by the HHS changeover, but we watched closely as the system stabilized and began to become more reliable. In a series of analyses that we ran over the past several months, we came to nearly the opposite conclusion of other media outlets. The hospitalization data coming out of HHS are now the best and most granular publicly available data on the pandemic. This information has changed the response to the pandemic for the better.

“Hospitals are now beginning to see how folks from Operation Warp Speed are using the data to identify specific shortages of specific supplies and reach out: ‘Are you okay? Can you get them from your supplier? Or can we help you in some way?’” Nancy Foster, the vice president for quality and patient-safety policy at the American Hospital Association, told me.

Amid the United States’ overall failure to contain the pandemic, the small data team at HHS did a good thing. Biden’s team did not respond to a request for comment on this story, but starting on Wednesday his administration will have the power to choose what happens to this hospital data. Disrupting the flow now—when 124,000 people are hospitalized with COVID-19 and more than 3,300 people are dying each day—is a risk the country doesn’t need to and should not take.

Pieces of data do not simply exist. They must be extracted from reality and processed into usable forms. From the molecules of the virus on up, measurements have to be taken and facts tabulated. These numbers have to flow from labs and medical examiners, hospitals and public-health departments, into larger systems, where they then get summarized into statistics.

For example, most states identify most cases electronically, based on lab results. But not every state’s electronic reporting is equal. Some use advanced electronic-case-reporting protocols, while others rely at least in part on forms sent via fax. America’s system is incredibly decentralized, with thousands of different sources of data, and it is rife with idiosyncrasies and potential points of failure. Its complexity and heterogeneity are key weaknesses in U.S. public-health surveillance.

It didn’t have to be this way. Years ago, the CDC prioritized data modernization as part of a plan to be ready for a possible pandemic, and the agency appeared to be making good progress. In 2019, a group of public-health experts even ranked the United States No. 1 out of every country in the world for pandemic preparedness, including data collection. We now know that the CDC and the rest of the federal government were not ready to confront the real thing.

“As a country, we are really underprepared for large, real-time data collection and sharing,” Nahid Bhadelia, an infectious-disease physician at Boston University School of Medicine, told me. “And real-time outbreak analytics? Well, that’s like asking a Model T to compete on the Autobahn.”

In the chaotic early days of the coronavirus crisis, the United States probably confirmed only 10 percent—or perhaps as little as 5 percent—of cases, though no one will ever know for sure. Even now some deaths are being reported weeks after they actually occur. Media outlets and government officials often say, as shorthand, “3,000 people died from COVID-19 yesterday,” but that actually means 3,000 deaths were reported yesterday. Though we usually don’t know exactly, the people represented by that number may have died two, five, 15, or 50 days ago.

Hospital data began in even worse shape than testing, case, or death data. COVID-Net, a system for estimating hospital strain, drew on a network of only about 250 hospitals in 14 states. It did not provide granular national data.

Beginning in March, different pieces of the federal government tried to stand up hospital-data systems. The CDC took a system created for tracking infections transmitted in hospitals, the National Healthcare Safety Network, and jerry-rigged it to take in COVID-19 patient data. HHS contracted with a small health-care-IT firm, Teletracking, to create a similar system. And Deborah Birx’s team worked with FEMA’s National Response Coordination Center, HHS, and the CDC to contract with Palantir, which built software called HHS Protect.

Hospitals or their intermediaries—such as state hospital associations—could send information to any of the three systems, and eventually that data would drop into HHS Protect.

Much has been made of the decision to use Palantir for HHS Protect, not least because one of Palantir’s co-founders, Peter Thiel, is a high-profile Trump supporter. The concern was reasonable enough. But HHS officials say they went with the company because the CDC already worked with Palantir. In fact, HHS Protect is an offshoot of another system Palantir produced, known as DCIPHER Cloud, which began under President Barack Obama. “It was really about using what was already in-house,” Kevin Duvall, the deputy chief data officer at HHS, told me.

Throughout the spring, hospitals and states worked to create systems for reporting data to the federal government. States published their own accounting, too, which we gathered at the COVID Tracking Project. Those state hospitalization data did not match what we saw the federal authorities reporting. When we looked at May and June, we could see the CDC estimates for hospitalizations bouncing up and down. They look like a seismograph during an earthquake. Given that the states were reporting fairly smooth curves, we concluded that the fluctuations in the CDC data did not reflect reality, but were artifacts of the reporting process. If fewer hospitals reported to the CDC, then it could push down the number of hospitalizations, even if there were still sick patients in those facilities. It was impossible to know for certain, but the state data were almost certainly more reliable.

In mid-July, as the Sun Belt teemed with infections, members of the White House Coronavirus Task Force realized that they needed to ask hospitals a new question: How much remdesivir did they have on hand? (Clinical trials had shown that the drug was more effective when administered earlier in the course of a COVID-19 infection, not later, after more severe illness had set in.) According to correspondence obtained by The Atlantic from a source who requested anonymity because they were not authorized to speak about the communication, the Data Strategy and Execution Workgroup at HHS—the team tasked with providing data for the federal coronavirus response—requested that the CDC add a single data field about remdesivir to the National Healthcare Safety Network (NHSN), its hospital-data-collection system. This is what you might think of as a new column in a spreadsheet, but officials ran into a problem: The CDC staff said that change would take more than three weeks, at a time when hospitalizations were approaching the highest levels of the pandemic to that date.

NHSN was an old system, snapped together from other IT components in 2005 to track infections spreading in hospitals. Hospitals were familiar with it, and it came with a preexisting $60 million contract with a major federal contractor, Leidos, to keep it running. But it had not been built for the kind of flexible emergency response that this unprecedented pandemic required. The request to add the data field went all the way to Sherri Berger, chief operating officer of the CDC. But the word came back: The field could not be added faster. So the CDC gave the team within HHS the go-ahead to change the reporting system itself.

This switch to HHS Protect was rushed—hospitals had just five days to figure out the new system before it went live—and hospital reporting fell rapidly, according to a dashboard HHS maintains. The change caused hiccups in state data, too, and the COVID Tracking Project noticed major reporting problems right as hospitalizations were peaking in hard-hit regions.

There were other ominous signs of malfunction, or worse: A previously public dashboard showing hospital capacity blinked offline. A big story ran in The New York Times suggesting that the changeover had surprised the CDC and focusing on the possibility of political interference with the data. A CNBC headline read, “Coronavirus Data Has Already Disappeared After Trump Administration Shifted Control From CDC.” No one seemed to believe what the CDC’s director, Robert Redfield, said at a press conference: “In order to meet this need for flexible data gathering, CDC agreed that we needed to remove NHSN from the collection process.”

The idea that the Trump administration would try to suppress COVID-19 data was not far-fetched, but the HHS staffers I spoke with said that public perception was misguided. The people on the team were not Trump-administration loyalists, but civil servants from across the federal government. Its leader, Amy Gleason, came from the U.S. Digital Service, a signature achievement of the Obama administration that brings technologists into the government from private industry.

“It’s truly interagency,” Gleason said. “Every day I work with people from 13 different agencies and components, side by side.”

As outside pressure mounted, they were scrambling to build a complicated system in a moment of national crisis.

“I know there have been lots of stories written about the relationship of CDC to HHS. But I will say this: We weren’t prepared from the data perspective for the challenge that awaited us,” said Rubin from the Children’s Hospital of Philadelphia. “The systems for influenza based on sentinel surveillance were not sufficient for a pandemic of this magnitude, so creating a public-health war room is a noble goal. The question is why we didn’t have something like this previously.”

Immediately after the switch-over to HHS Protect, the discrepancies between the federal data and state data really could be enormous. On some days in late July, HHS reported 200 percent more hospitalized patients than some jurisdictions themselves were reporting. While this was disturbing, an overcount of hospitalizations, making the pandemic seem more severe, was also a sign that the problems were unlikely to be purely political in nature.

As the HHS data became public, we at the COVID Tracking Project found that news organizations and many public-health professionals were continuing to rely on our hospitalization numbers, even though an official government entity now provided similar data. HHS had major logistical problems to deal with. After the changeover, the team had essentially no hospitals reporting all of the data requested every day for the month of July. Many hospitals were unhappy that there had been disruptive changes. But Nancy Foster credited Amy Gleason with putting a moratorium on tweaks to the system. She also built a troubleshooting team with hospital liaisons drawn from staff at the CDC, HHS, and other parts of the federal government. “It was really under Amy Gleason’s leadership that the folks from HHS started to work with states and other data intermediaries between hospitals and HHS Protect to understand where there were glitches in their processes and to help the states straighten those out,” Foster said.

Jim Jirjis, the chief health-information officer at HCA Healthcare, which runs 185 hospitals across the country, considers the HHS effort highly competent. “The fact that there was listening and the ability to pivot and change was very, very reassuring that our government can do a really good job of modifying in the middle of a pandemic,” Jirjis told me.

The improvements didn’t happen all at once. The federal government still had not released the granular data that it was receiving from hospitals and that underlay the state statistics. Civil servants across the government might have been striving to understand the spread of COVID-19 with great specificity, but their work was not reaching the public.

At the COVID Tracking Project, we were keenly aware of how little information the public was receiving. And we, like many other people, worried that HHS officials would attempt to influence the data. While hospitalization data were trickling out, other information remained locked up inside the government.

“As soon as COVID became a political issue, the administration willingly withheld data that showed how severe COVID was spreading in our communities,” says Ryan Panchadsaram, the former deputy chief technology officer of the United States under Obama and a co-founder of COVID Exit Strategy, which tracks the government’s response. “While internal reports were highlighting the ‘red zones’ and ‘areas of concern,’ the president and vice president continued to share that the reaction to COVID was ‘overblown.’”

So at the end of the summer, we decided to look for signs of cooking the books in the federal hospitalization data. First, we simply looked to see if there were obviously political patterns in the data—say, red states with lower hospitalization numbers than anticipated, or overall depressed numbers. We didn’t see anything like that. Then we ran statistical tests looking at the variance in data from different states.

What we found surprised us: The data that were flowing through HHS were much less spiky than what had flowed primarily through NHSN. In fact, at least on initial inspection, the HHS data looked a lot like our patchwork of data from states, which for the most part was not riddled with weird jumps or unexplained phenomena that were obviously not reflective of reality. When cases rose, hospitalizations did shortly thereafter. As the HHS data came to resemble the state data, we began to suspect that perhaps the HHS data had, as we put it in an internal report on August 20, “enormous potential to be the Federal numbers we’ve always wanted.”

Stitching together state reporting into a national data set is an incredibly research-intensive way to produce those statistics. We have to figure out precisely what information 56 states and territories are reporting, and even then, we cannot guarantee perfectly comparable data. HHS, for its part, simply asked states to report all confirmed and suspected COVID-19 hospitalizations in the same way, creating a consistent and standardized data set. Once hospitals learned the system, the data solidified. Jason Salemi, an epidemiologist at the University of South Florida, described the changes as “amazing improvements.”

“For a long while, there was very little help from federal data—it was a massive disappointment and failure to serve the public at a time when such information was direly needed,” Salemi told me. Since then, HHS “has stepped up to the challenge in a major way.”

Some critiques of the HHS-generated information have called its accuracy into question. There are many data sets in HHS Protect that originate in many different places, so we cannot speak to all of them. However, the COVID Tracking Project can check HHS against the state reports. In late November, we found that the data had come to match almost perfectly. Not all states report precisely the same way, and the COVID Tracking Project runs one day behind HHS, but after we took those factors into account, we found that HHS and state data were now falling within 2 percent of each other. If the HHS data were off, then the data produced by every state were also off.

For the week of December 28, the most recent data available, 96 percent of hospitals reported every data point to HHS every day. The interagency team led by HHS has done what had seemed impossible: gotten every hospital in America to tell the federal government what’s going on.

“This pandemic shined a bright light on the data gaps we had in our understanding of the magnitude, spread, and burden of disease, across each community, county, city, and state,” Irum Zaidi, the White House coronavirus-response coordinator and chief epidemiologist, told me. “The system we needed and have set up makes every patient visible across the U.S. in order to provide the limited resources such as remdesivir, supplies, and staffing to every rural and urban hospital.”

As the data improved, they became more and more available to the public. First up, HHS published the “metadata” about how facilities were reporting. This let us see for the first time how many hospitals were reporting. Then it released staffing-shortage details. Bit by bit, as the fall turned to winter, HHS published much of the crucial data that critics of the administration had been asking for. The capstone came last month, when HHS released data for every hospital in the country, exactly the kind of granular information that is necessary to understand where hospital systems are being overwhelmed. And given the general disaster of COVID-19 in America, there are many places that desperately need help from the federal government to secure supplies and shore up staffing.

This data set is not perfect—no data set is—but it is the best available. “I am heartened to see facility-level [information] because we can also get a sense of how the same facilities are doing over time,” BU’s Nahid Bhadelia said. “This level of granularity also allows researchers to create a better evidence base for policy recommendations.”

Examining the COVID Tracking Project’s map of the HHS release, one can zoom in on Dallas, say, and find data about how full any given hospital’s ICU is that week. It provides an unprecedented look at precisely how much pressure COVID-19 places on our health-care systems. The White House’s Coronavirus Task Force is now using this kind of information to dispatch medical support teams to hard-hit areas.

This week, a new administration will take control of the HHS system, and is facing public pressure to change it once again. Switching back to the National Healthcare Safety Network, at this point, would likely undo the progress the HHS data team has made in the past six months, at the worst moment of the pandemic so far. “Going forward, it’ll be important for the next administration to pick up the baton and build off of what’s been created for this response,” Ryan Panchadsaram told me. “Is it perfect? No. But it is better than what we had before.”

“My feeling is do not make any changes unless they are absolutely necessary,” Foster said. “Change is disruption.”