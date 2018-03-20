Fifteen years ago this April, scientists announced that the human genome sequence was complete. I regret to inform you this is not true. If you have been misled, it is because many scientists themselves have long ignored the last unassembled regions of human DNA, which consist mostly of short, repeating sequences that do not look like genes. “These huge gaps still remain,” says Karen Miga, a genomics researcher at the University of California, Santa Cruz. That’s because it has been impossible to sequence and assemble those repeating stretches of DNA—until now. In a major milestone, Miga and her colleagues reveal the complete 300,000-letter sequence of one of those odd, poorly understood regions: the centromere of the Y chromosome. It’s astonishing that a centromere sequence has never been assembled before, given how fundamental they are. Chromosomes are tightly packed structures of DNA, and centromeres are a specialized region on them. When a cell divides, thread-like proteins attach to the centromere to pull chromosomes apart. Without functioning centromeres, cells can end up with too few or too many chromosomes—like in Down Syndrome. Malfunctioning centromeres have also been linked to diseases like cancers.

“Here’s this region on every chromosome that is absolutely essential,” says Beth Sullivan, a molecular biologist at Duke who was not involved in the study. “You’d think we’d know a lot about the centromere.” Yet centromeres have been tough to crack. They contain similar or even identical sequences that are perhaps 170 letters long and repeated hundreds or thousands of times. Traditional sequencing machines chop up a strand of DNA into short pieces that are “read” and then assembled like a puzzle. “The problems with centromeres is all the pieces look the same. It’ll be like putting together puzzle of the Sahara Desert,” says Sullivan. Biologists studying genes have the benefit of reams of gene-sequence information, but those studying centromeres have essentially been stuck in the pre-sequencing days of the 1990s. In comes nanopore sequencing, a new technology that can read longer stretches of DNA. Miga and her colleagues decided to tackle centromers with it. Nanopore sequencing still cannot span the hundreds of thousands of letters of Y chromosome’s centromere in one go. But it gives you fewer and bigger puzzle pieces. The sequence is much easier to assemble. The Y chromosome centromere Miga and her colleagues sequenced and assembled came from an anonymous man in Buffalo, New York, whose DNA was also used for most the Human Genome Project. The sequence didn’t contain too many surprises. That’s good news, because it means nanopore sequencing—a still relatively new technique—isn’t coughing up errors. And it opens to the door to more centromere sequencing. “To me, this is just the bedrock of future analysis,” says Miga.