Let’s say you have a patient with a severe inherited muscle disorder, the kind that Daniel MacArthur from the Broad Institute of Harvard and MIT specializes in. They’re probably a child, with debilitating symptoms and perhaps no diagnosis. To discover the gene(s) that underlie the kid’s condition, you sequence their genome, or perhaps just their exome: the 1 percent of their DNA that codes for proteins. The results come back, and you see tens of thousands of variants—sites where, say, the usual A has been replaced by a T, or the typical C is instead a G.
You’d then want to know if those variants have ever been associated with diseases, and how common they are in the general population. (The latter is especially important because most variants are so common that they can’t possibly be plausible culprits behind rare genetic diseases.) “To make sense of a single patient’s genome, you need to put it in the context of many people’s genomes,” says MacArthur. In an ideal world, you would compare all of a patient’s variants against “every individual who has ever been sequenced in the history of sequencing.”
This is not that world, at least not yet. When Macarthur launched his lab in 2012, he started by sequencing the exomes of some 300 patients with rare muscle diseases. But he quickly realized that he had nothing decent to compare them against. It has never been easier, cheaper, or quicker to sequence a person’s genome, but interpreting those sequences is tricky, absent a comprehensive reference library of human genetic variation. No such library existed, or at least nothing big or diverse enough. So, MacArthur started making one.