We Now Can See a Virus Mutate Like Never Before

An illustration of an eyeball looking at genetic sequencing.
Getty / Paul Spella / The Atlantic

In the beginning, there was one.

The first genome for the virus causing a mysterious illness we had not yet named COVID-19 was shared by scientists on January 10, 2020. That single genome alerted the world to the danger of a novel coronavirus. It was the basis of new tests as countries scrambled to find the virus within their own borders. And it became the template for vaccines, the same ones now making their way to millions of people every day. That first coronavirus genome may have been the most important 30,000 letters published in all of 2020.

Since then, the number of sequenced genomes has simply exploded, to 700,000. In just over a year, the virus that causes COVID-19 has become the most sequenced virus of all time—soaring past such longtime contenders as HIV and influenza. Thousands of coronavirus genomes are sequenced around the world every day; several were generated in just the minute it’s taken for you to read these three paragraphs. “It’s been a revolution,” says Judith Breuer, a virologist at University College London.

We are now living through the first pandemic in human history where scientists can sequence fast and furiously enough to track a novel virus’s evolution in real time—and to act decisively on that information. Viruses constantly acquire mutations—genetic typos—and occasionally they mutate into a variant of interest. It was sequencing that identified a distinct and more transmissible variant in the UK. It was sequencing that prompted stricter lockdowns in response. And it’s now sequencing that is tracking the spread of variants, including those first found in South Africa and Brazil, that have mutations blunting immunity from vaccines and previous infections.

Recommended Reading

Without sequencing, we might have seen mysterious new surges or reinfections, only to speculate wildly and retrospectively about why; now scientists can trace these epidemiological trends to letter-by-letter changes in the coronavirus’s genetic code—and do it fast enough to influence policy, potentially slowing the variants before they take over. This systematic sequencing of positive COVID-19 tests is called genomic surveillance.

The rise of genomic surveillance has caught even scientists a bit by surprise. If they had known that variants would become front-page news, they might have come up with better names than B.1.1.7, 20I/501Y.V1, and VOC 202012/01, which are, confusingly, all names for the same U.K. variant. (No wonder the public has continued to simply call it “the U.K. variant,” despite official admonitions to avoid geographic names.) The WHO is just now discussing a new variant-naming scheme. Such a scheme does not exist, because we’d never had to talk about variants in public before—because we’d never identified variants quickly enough to matter for pandemic response before.

The inscrutable variant names are a small, if illustrative, example of the challenges that come with yanking genomic surveillance out of academic labs and thrusting it into public health. Sequencing is easy these days, scientists told me over and over. Analysis is hard. And, with a few notable exceptions, persuading the public to make real sacrifices based on some changing letters in a viral genome is harder still. The pandemic has shown the power of genomic surveillance but also exposed the challenges of wielding it. In the future, when we have learned the lessons of COVID-19, genomic surveillance of diseases might become routine and bureaucratic. That could be the pandemic’s biggest scientific legacy.

But this time, in this pandemic, the successes of genomic surveillance have depended on human intuition and human foresight and an occasional lucky break.

The story of the linked discoveries of the U.K. and South Africa variants is one such example.  Last winter, Tulio de Oliveira, a bioinformatician at the University of KwaZulu-Natal in South Africa, began watching a cluster of mysterious pneumonia cases in China. And when that first viral genome hit the internet in January, he began preparing his lab to sequence the coronavirus in South Africa.

De Oliveira was one of a small cadre of scientists worldwide who already had significant experience with this sort of thing. He had previously sequenced the genomes of viruses, such as Zika, dengue, chikungunya, and yellow fever, that circulate in South America. (De Oliveira is originally from Brazil.) Others have done similar work on influenza, Ebola, West Nile, and Lassa fever. But for the most part, these studies tended to be small and retrospective—and they went at the leisurely pace of academia. By the time the results were published, the outbreak was typically long over.

This new coronavirus was an emergency, and de Oliveira wanted to get ahead of it. Within weeks, his team had finished writing the software for assembling genomes of the new virus, flown in collaborators from Brazil, and stocked up on lab chemicals in anticipation of the shutdown of air travel. De Oliveira envisioned a systemic sequencing of COVID-19 samples across South Africa. To get there, he had to secure legal and ethical permission to collect positive COVID-19 samples from the government testing labs. These samples then had to be reprocessed for sequencing. Typical COVID-19 tests speed-read only a few snippets of the coronavirus’s genome; sequencing means reading all 30,000 letters.

By March of last year, when COVID-19 began popping up in South Africa, de Oliveira’s team was ready to go. A key question for genomic surveillance in the early days was simply how COVID-19 had gotten into the country. So scientists used genomes to reconstruct the virus’s path: The coronavirus had been introduced multiple times, mostly from Europe. Over the course of 2020, the pandemic waxed and waned, and de Oliveira and his colleagues kept gathering sequences.

In November, an entirely new pattern appeared. Doctors in South Africa’s Eastern Cape told de Oliveira that cases were spiking again, seemingly out of nowhere. Had the virus changed? Could it have mutated? His team moved quickly to sequence samples from 50 clinics in the region within a week, and they found a surprising lack of diversity. The samples from all 50 clinics were closely related, with almost all the same mutations. They looked like one variant. Because he had data from the previous seven months, de Oliveira knew that this was strange; normally, if he sampled 50 clinics, he might find 30 or 40 different versions of a virus. And because he had data from all over the country, he could see that this variant was now creeping into other regions. All those months of sequencing had paid off, but the news was bad. The virus really had changed. And the new variant was taking over.

In early December, de Oliveira shared some preliminary results with an old colleague, Andrew Rambaut, who specializes in the evolution of new viruses. The two of them had overlapped at Oxford some 15 years ago; Rambaut is now at the University of Edinburgh, in the U.K. De Oliveira flagged one particular mutation, called N501Y, which sits in a key region of the spike protein that binds directly to human cells.

On de Oliveira’s tip, Rambaut began scouring coronavirus sequences in the U.K., which has the biggest and best-funded genomic-surveillance program in the whole world. (De Oliveira had in fact modeled South Africa’s surveillance program on it. “But, of course, we have much less resources,” he said, so South Africa was sequencing only a fraction as many samples as the U.K. was.) Six thousand miles away, N501Y showed up in the U.K. database too—in sequences from Kent, where cases were rising despite an ongoing lockdown.

Within weeks, the U.K. announced the discovery of its own more transmissible variant, which had arisen independently of the South Africa variant. The news reverberated around the world. Many countries, including the U.S., moved to restrict U.K. travel and look for the variant within their own borders.

The U.K. has the largest COVID-19 sequencing program in the world, in no small part because it already had many of the top experts in this field. “We all work with the Brits because they’re so damn good,” Kristian Andersen, a microbiologist at Scripps Research in San Diego, says. The U.K. has single-handedly generated more than one-third of all genomes for this coronavirus, thanks to a consortium of academic and public labs called COVID-19 Genomics UK, or COG-UK.

The idea for COG-UK came to Sharon Peacock, a microbiologist at the University of Cambridge, as the first wave began hitting Europe. On March 4, she emailed five colleagues to talk. “I wanted, really, a sanity check,” she told me. It was not yet clear that the virus would explode into an uncontrolled pandemic. Italy had not yet locked down. Life was still going on as normal. Was an unprecedented national surveillance effort really worth it? Scientists didn’t think that the coronavirus mutated very fast. Sequencing might find nothing more than a pile of meaningless mutations.

But Peacock’s colleagues agreed that national genomic surveillance was worth trying. Sequencing, which yields much more information than a simple positive-or-negative test, could at least give them a detailed look at how the virus spreads.

The consortium didn’t necessarily set out to look for variants. Variants were a theoretical possibility, but no one knew when and where they might arise. So initially, researchers were more interested in how the coronavirus was getting to the country—largely through other European countries such as Italy, Spain, and France, rather than directly from China, it turns out. And they were interested in how the virus was spreading, in ways big and small. One study found a COVID-19 cluster among six dialysis patients who all had appointments on the same days of the week. Their viruses were each very similar but distinct from a lineage circulating elsewhere in the hospital, which suggested that these dialysis patients were all infected the same way—either during dialysis or during shared transportation to the hospital. Crucially, by ruling out transmission from elsewhere in the hospital, officials could target the underlying problem rather than issue onerous blanket policies. The hospital closed the dialysis waiting room, spread out the patients more, and enforced universal masking during transportation.

But as case numbers skyrocketed, the question of how the virus was getting into the country became kind of moot. The virus was already everywhere. And more cases meant more sequences—so, so many sequences. “We haven’t used sequencing on this scale ever,” Peacock told me.

No human can possibly go through the thousands of coronavirus genomes generated daily to look for hints of troublesome mutations. Even the computational tools for comparing genomes are buckling under the weight of all the available data. Scientists often map related genomes as branches in an evolutionary tree. But at a certain point, these trees become unmanageable. “Our analysis tools aren’t built for doing this,” says Emma Hodcroft, a molecular epidemiologist at the University of Bern and a co-developer of Nextstrain, an open-source project that visualizes pathogen sequences. The tip that helped lead to the discovery of the U.K. variant came, after all, from de Oliveira, who sequenced a smaller, more manageable number of samples in South Africa, on the suggestion of frontline doctors.

A tight link between doctors, public-health officials, and sequencing labs is crucial for the success of genomic surveillance. Doctors and public-health officials can notice trends on the ground that tip off genomists, who can in turn detect mutations that make the case for new policies. But these worlds aren’t necessarily used to talking to one another. And the trees that genomic-surveillance experts prefer for visualizing related genomes are hard to interpret, because incomplete sampling means whole branches are missing.

In fact, even experts have had trouble interpreting the data. Last February, scientists in Washington State detected the first case of local transmission of COVID-19, which they presumed, from its viral sequence, to be descended from a case of a man who had returned from China six weeks earlier. This suggested that the virus had been silently spreading in Washington for more than a month. It was a major wake-up call. But as additional sequencing has filled in missing branches, a more complete—and different—picture of those early days has emerged. More likely than not, the second case was not descended from the first. A separate introduction of COVID-19 likely seeded Washington’s local transmission chain.

This is still a challenge on smaller scales too, such as when a hospital tries to pinpoint the origins of its outbreak. Scenario A, in which staff are infecting one another on coffee breaks, requires different interventions than Scenario B, in which multiple patients are passing the virus on to staff in a particular ward. Breuer, the virologist at University College London, is leading a COG-UK study to see whether genomic surveillance can indeed help hospitals identify and fix lapses in infection control.

Raw viral sequences or trees are pretty useless for hospitals, though; her team has instead devised automated reports that translate the data into a probability that one case is related to others in the same hospital. These reports have taken a while to get right, Breuer acknowledges, and they can still be better. Meanwhile, hospitals are themselves figuring out how to integrate this new data stream into their policies. It’s all the more difficult when they’re busy just trying to keep up with the crush of the pandemic. “When you get overwhelmed with cases, it becomes very hard for teams to be looking at data,” she says. “The reports become less useful, because people just don’t have time to look at them.”

This is the practical problem that genomic surveillance has repeatedly run into during this pandemic. Scientists can generate as much data as they want, but getting the public to act on that information is a whole different challenge.

When the U.K. and South Africa sounded the alarm about new variants, countries around the world reacted in radically different ways.

On one end of the spectrum, Denmark has taken them very seriously. It’s attempting to sequence nearly every positive sample in the country, thanks to a robust national surveillance program that plugs into its national health-care system. “That’s one really, really big advantage. We have four or five big hospitals, and once you have contacted them, you basically have everything in Denmark,” says Mads Albertsen, a microbiologist at Aalborg University who has reconfigured his lab for COVID-19 sequencing. Since December, Denmark has been watching the proportion of the U.K. variant, B.1.1.7, steadily rise in its sequencing data.

As a result, the country implemented a strict lockdown, even though total COVID-19 cases have fallen since December. Shops stayed closed, as did schools for fifth graders and older. The number of people allowed to gather was cut from 10 to five. “If we didn’t have the numbers on B.1.1.7, we would have opened the society a lot more,” Albertsen says. Denmark only began easing some restrictions on March 1 as part of a slow, calculated reopening. The country still has one of the lowest infection rates in Europe. “I’ve been very glad the government is using the data I generated on variants,” Albertsen told me. “But it’s also a bit scary that they put so much on our shoulders.” Genomics has never before been responsible for the data that govern the opening and closing of entire countries.

The United States has taken a much more lackadaisical approach to the variants. When I asked Andersen, the microbiologist at Scripps Research, if the U.S. is acting on genomic-surveillance information, he answered bluntly, “No, we’re not.” (Andersen, who is from Denmark, is also advising the Danish government.) He pointed out that the same problem has come up over and over again in the pandemic: The U.S. seems unable to plan for what will happen if the variants become dominant; we react only to what’s happening right now. States with growing proportions of B.1.1.7, such as Florida, aren’t placing new restrictions the way Denmark has. And last week Texas and Mississippi rolled back their mask mandates.

Like other parts of the pandemic response, sequencing in the U.S. has been extremely piecemeal. Andersen is working with the public-health department in San Diego, where they are sequencing about 2 percent of all confirmed cases—a tiny fraction compared with the U.K. or Denmark but decent compared with the rest of the U.S. Good coverage in any area of the U.S. is probably due to the sheer will of a particular scientist or lab.

At the University of Michigan, Adam Lauring told me that he’s personally driving around, collecting samples from testing labs to get them to sequencing machines. It’s part of an effort to track COVID-19 on campus, and the university’s discovery of B.1.1.7 among student athletes prompted a two-week suspension of the sports program. At Louisiana State University, Jeremy Kamil has sequenced about 2,000 genomes after getting the university’s vice chancellor on board and cobbling together various sources of funding. The wealth of data isn’t necessarily translating into public-health policy. For example, says Kamil, if he finds a nursing home with multiple separate introductions of COVID-19, he has no way of telling the facility to step up its infection-control measures. Sequencing just hasn’t been part of the public-health toolkit before. “Public-health officials did not know what we do, and how it would be useful, and how the data could be acted upon,” says Vaughn Cooper, a microbiologist who runs a sequencing facility in Pittsburgh that processes Kamil’s samples. “That's not a criticism. This is still newish technology.”

Alaska is one state where the health department has made sequencing a priority, but it has had a hard time getting doctors and hospitals to send in samples. “Providers want to know what’s in it for them,” says Jayme Parker, the state’s laboratory manager. And the answer, unfortunately, is not a lot. Parker can’t formally tell the doctors if their patient was infected with one of the variants, because her lab lacks the clinical certification. And even if she could, doctors can’t do much with the information; it doesn’t change treatment for COVID-19. (The one exception is that monoclonal-antibody therapy may be less effective against the Brazil and South Africa variants, though those are still rare in the U.S., and antibody-therapy use is also low.) When the department does get samples, they come on different types of swabs in different types of media in different containers. A lot of handling is necessary to get them ready for batch sequencing. “At the beginning, I remember getting urine cups with saline in them with a swab,” she says. “That got very unruly.” In Denmark, by contrast, Albertsen told me that the testing labs consolidate all the positive samples into tidy plates with 96 small wells, pretty much ready to go.

Genomic surveillance, Parker points out, is useful on a population level, but it requires persuading people to pitch in on an individual level. With the U.S.’s fragmentary health-care and public-health systems, it means connecting countless individual dots: physically getting the samples to the right lab, sorting out all the legal and ethical paperwork, and then interpreting the results for the public. The longer it takes to connect those dots, the more time it takes to get a sample through the sequencing pipeline. If officials aren’t seeing the data on B.1.1.7 prevalence until, say, a month after a patient first caught the variant, that limits how quickly they can tailor a response. If they sequence within days, however, then they can theoretically prioritize tracing a B.1.1.7 patient’s contacts to slow the variant down. But the laggier genomic surveillance is, the less useful it is for public health. At some point, it just becomes academic again.

Moreover, sequencing data are only useful when connected to epidemiological data. To figure out if variants are spreading or causing more severe illness, you also need to know when people got sick, where they are, how sick they got, and how these patterns change over time. In recent weeks in the U.S., scientists have detected several variants with possibly worrisome mutations in California, New York, and Louisiana, but they lack the context to assess whether they are real threats. The U.K. and South Africa were able to suss out the importance of their variants so quickly because of their systematic surveillance system.

Recently, the CDC has begun to step up its national genomic-surveillance effort. It has added contracts with national testing companies such as Helix. And last month, the Biden administration announced $200 million as a down payment for the CDC’s sequencing efforts. Andersen said he hopes the U.S. gets to sequencing 2 percent of all positives nationally, with a network of sentinel sites that can sequence at a higher volume in certain locations.

Building this infrastructure for the first time is hard; building it on the fly, even more so. When the next pandemic virus strikes, though, genomic surveillance could be ready to go—so ready that the next virus might quickly overtake COVID-19 as the most sequenced virus of all time. Or even better, the virus will be contained, and we’ll never have enough cases to sequence 700,000 genomes in a year again.