Ten years ago, a team of scientists published the first genome of Aedes aegypti—the infamous mosquito that spreads Zika, dengue fever, and yellow fever. It was a valiant effort, but also a complete mess. Rather than tidily bundled in the insect’s three pairs of chromosomes, its DNA was scattered among 36,000 small fragments, many of which were riddled with gaps and errors. But last week, a team of scientists led by Erez Lieberman Aiden at the Baylor College of Medicine announced that they had finally knitted those pieces into a coherent whole—a victory that will undoubtedly be helpful to scientists who study Aedes and the diseases it carries.
This milestone is about more than mosquitoes. The team succeeded by using a technique called Hi-C, which allows scientists to assemble an organism’s genome quickly, cheaply, and accurately. To prove that point, the team used Hi-C to piece together a human genome from scratch for just $10,000; by contrast, the original Human Genome Project took $4 billion to accomplish the same feat. “It’s very clear that this is the way that you want to be doing it,” says Olga Dudchenko, who was part of Aiden’s team. “At least in the foreseeable future, there’s no method that can compete,” adds her colleague Sanjit Singh Batra.
This technique should make it easier to map the genome of any species—especially those that have never been sequenced before. “I am very excited about its potential,” says Catherine Peichel from the University of Bern, who has used it to sequence the three-spine stickleback—a small fish. “I have been telling everyone I know to use it.”
The word “genome” has become so commonplace that it’s easy to forget how difficult it can be to sequence one—even now. When geneticists decipher an organism’s DNA, they do so in fits and starts, rather than in one continuous burst from start to finish. The result is a lot of short pieces, or “reads,” which must then be assembled. Sometimes, that’s easy: If two reads have a lot of overlap, they probably fit next to each other. But it’s much harder when genomes include long repetitive stretches. Assembling these is like solving a jigsaw puzzle filled with blue sky; it’s a royal pain to work out where each piece fits in. That’s why the Aedes genome was so fragmented. It is full of repetitive sections. And that’s where Hi-C comes in.
Aiden and his colleague Job Dekker created the technique in 2009 for a completely different purpose—to study the shape of the human genome. Each of our cells contains around two meters of DNA, which somehow packs into a compartment just six millionths of a meter wide. To fit, the long one-dimensional DNA strands fold into a tight three-dimensional ball. Aiden and Dekker developed Hi-C to study these folds: It freezes the entire genome in place, and reveals which bits of DNA are touching each other in three-dimensional space.
As it happens, this information also reveals how far apart two bits of DNA are likely to be in the one-dimensional string—which is really useful for assembling genomes. Think about that jigsaw puzzle. If you have two identical pieces of blue sky, you may not know where they go, but Hi-C can tell you that they have 15 pieces between them. Gather enough of that information, and you can put the whole sky together.
Two teams, one led by Dekker and the other by the University of Washington’s Jay Shendure, accomplished this in 2013, using Hi-C to assemble human, fly, and mouse genomes. A year later, a Chinese team did the same for a plant—the mustard cress. And in 2016, an American group used Hi-C to assemble the genome of a goat named Papadum. Goats had been sequenced before, but as with the Aedes mosquito, their genomes were messy, fragmented, and incomplete drafts. By contrast, Papadum’s genome was a polished work of art, with the fewest gaps of any mammalian genome to date. Adam Phillippy, one of the project leaders, jokingly called it the Greatest Of All Time (GOAT).
So Dudchenko and Batra are hardly the first to use Hi-C to assemble genomes, but they made two improvements. They developed software that can work off much shorter reads, so they can effectively solve jigsaw puzzles with 10,000 tiny pieces rather than 1,000 large ones. They also developed a better way of correcting errors along the way. “Their work significantly improves over the older projects,” says Michael Schatz from Cold Spring Harbor Laboratory, who worked on the original draft of the Aedes genome. “The assemblies they produce are an enormous improvement over the version I helped to publish in 2007.”
Being able to assemble genomes using short reads is useful because such reads are cheap. With Dudchenko and Batra’s methods, scientists can assemble many genomes on tight budgets, which is good news for several projects that are aiming to sequence 10,000 vertebrates, 10,000 birds, and 5,000 arthropods.
But Erich Jarvis from Rockefeller University, who is leading one of these dauntingly ambitious projects, says the future lies in pairing Hi-C with new sequencing technologies that can read longer stretches of DNA. That would provide higher-quality pieces for Hi-C to then stitch together. That’s what mosquito researcher Leslie Vosshall and her colleagues are trying to do for Aedes. She notes that Dudchenko started with the decade-old reads from the 2007 project, and her team is now working to re-sequence those before “applying the Hi-C magic,” she says. “We won’t be done with this chapter in the history of mosquito biology until that is fixed.”
“It has taken a while for this approach to catch its sails, because most groups did not have much experience with Hi-C, but that’s changed in the last year, as the technique has become easier to implement,” says Shendure. Indeed, at least two companies now offer it as a service. “I think this is really going to be the way genome assembly is done going forward. It’s a nice example of a technology proving useful for goals that are different than those for which it was originally developed.”