This technique should make it easier to map the genome of any species—especially those that have never been sequenced before. “I am very excited about its potential,” says Catherine Peichel from the University of Bern, who has used it to sequence the three-spine stickleback—a small fish. “I have been telling everyone I know to use it.”
The word “genome” has become so commonplace that it’s easy to forget how difficult it can be to sequence one—even now. When geneticists decipher an organism’s DNA, they do so in fits and starts, rather than in one continuous burst from start to finish. The result is a lot of short pieces, or “reads,” which must then be assembled. Sometimes, that’s easy: If two reads have a lot of overlap, they probably fit next to each other. But it’s much harder when genomes include long repetitive stretches. Assembling these is like solving a jigsaw puzzle filled with blue sky; it’s a royal pain to work out where each piece fits in. That’s why the Aedes genome was so fragmented. It is full of repetitive sections. And that’s where Hi-C comes in.
Aiden and his colleague Job Dekker created the technique in 2009 for a completely different purpose—to study the shape of the human genome. Each of our cells contains around two meters of DNA, which somehow packs into a compartment just six millionths of a meter wide. To fit, the long one-dimensional DNA strands fold into a tight three-dimensional ball. Aiden and Dekker developed Hi-C to study these folds: It freezes the entire genome in place, and reveals which bits of DNA are touching each other in three-dimensional space.
As it happens, this information also reveals how far apart two bits of DNA are likely to be in the one-dimensional string—which is really useful for assembling genomes. Think about that jigsaw puzzle. If you have two identical pieces of blue sky, you may not know where they go, but Hi-C can tell you that they have 15 pieces between them. Gather enough of that information, and you can put the whole sky together.
Two teams, one led by Dekker and the other by the University of Washington’s Jay Shendure, accomplished this in 2013, using Hi-C to assemble human, fly, and mouse genomes. A year later, a Chinese team did the same for a plant—the mustard cress. And in 2016, an American group used Hi-C to assemble the genome of a goat named Papadum. Goats had been sequenced before, but as with the Aedes mosquito, their genomes were messy, fragmented, and incomplete drafts. By contrast, Papadum’s genome was a polished work of art, with the fewest gaps of any mammalian genome to date. Adam Phillippy, one of the project leaders, jokingly called it the Greatest Of All Time (GOAT).
So Dudchenko and Batra are hardly the first to use Hi-C to assemble genomes, but they made two improvements. They developed software that can work off much shorter reads, so they can effectively solve jigsaw puzzles with 10,000 tiny pieces rather than 1,000 large ones. They also developed a better way of correcting errors along the way. “Their work significantly improves over the older projects,” says Michael Schatz from Cold Spring Harbor Laboratory, who worked on the original draft of the Aedes genome. “The assemblies they produce are an enormous improvement over the version I helped to publish in 2007.”