“Here’s this region on every chromosome that is absolutely essential,” says Beth Sullivan, a molecular biologist at Duke who was not involved in the study. “You’d think we’d know a lot about the centromere.”
Yet centromeres have been tough to crack. They contain similar or even identical sequences that are perhaps 170 letters long and repeated hundreds or thousands of times. Traditional sequencing machines chop up a strand of DNA into short pieces that are “read” and then assembled like a puzzle. “The problem with centromeres is all the pieces look the same. It’ll be like putting together a puzzle of the Sahara Desert,” says Sullivan. Biologists studying genes have the benefit of reams of gene-sequence information, but those studying centromeres have essentially been stuck in the pre-sequencing days of the 1990s.
In comes nanopore sequencing, a new technology that can read longer stretches of DNA. Miga and her colleagues decided to tackle centromeres with it. Nanopore sequencing still cannot span the hundreds of thousands of letters of the Y chromosome’s centromere in one go. But it gives you fewer and bigger puzzle pieces. The sequence is much easier to assemble.
The Y chromosome centromere Miga and her colleagues sequenced and assembled came from an anonymous man in Buffalo, New York, whose DNA was also used for most of the Human Genome Project. The sequence didn’t contain too many surprises. That’s good news, because it means nanopore sequencing—a still relatively new technique—isn’t coughing up errors. And it opens to the door to more centromere sequencing. “To me, this is just the bedrock of future analysis,” says Miga.
Sequencing one centromere is a technical curiosity, but sequencing many centromeres is where the real interesting stuff will come. For example, the Y chromosome has long been used to study past human migrations and map genetic variation. Centromeres add another layer to the data because they vary so much. Not only do the letters of the underlying repeated sequences change, but the length of centromeres can vary by as much as 20 times from person to person on the same chromosome. “If you want to look at a human variation, I think this is the place to look,” says Steve Henikoff, who studies centromeres at the Fred Hutchinson Cancer Research Center. He called the new study a “landmark” in the study of centromeres.
Scientist will want to look at the centromeres of other chromosomes, too. Miga started with the Y chromosome simply because it was the easiest. Its centromere is only hundreds of thousands of letters long, whereas the centromere on chromosome 17, which Sullivan studies, is 4 million letters long. Defects in it have been linked to diseases, most notably breast cancer. If scientists could fully sequence the long centromere, they could understand how subtle changes—like minor typos in the sequence or the order of repeats—affect centromere function, too.