This 'Genome Hacker' Is Building Family Trees With Millions of Branches

Thanks to computer-aided genealogical analysis, your family may have 43,000,000 members. 

Shutterstock/Martin M303

There may be a new record for the largest family tree ever assembled. The thing dates back to the 15th century. It is comprised of 13 million individuals. And it is only one part of an even larger collection of genomic information: a collection compiled by the computational biologist Yaniv Erlich and stored not in albums or on walls, but in machines. Presented at the annual meeting American Society of Human Genetics in Boston, and discussed in the journal Nature, the mega-repository could offer a new way for researchers to analyze the relationships between human genotypes and phenotypes—between, essentially, nature and nurture.

In the past, such expansively branched informational trees would have been painstaking to cultivate. We have documentation, sure, of family relationships and the traits associated with them—church records, hospital logs, that kind of thing—but gathering those documents for analysis took time. Assembling genealogical data for even just a few thousand individuals, Erlich noted during his ASHG presentation, could take years.

So here's where the hacking comes in. Erlich and his team, rather than gathering those data themselves, went to a more streamlined source: Which is a genealogy website with 43 million public profiles. Those profiles offered a wealth of information, typically including not just individuals' birth and death dates, but also the locations of their births and deaths. Occasionally, they'd even contain photos uploaded by the site's users. 

What resulted, in turn, was an extensive collection of trait-and-gene information, ripe for analysis. And it was from that collection that Erlich and his colleagues were able to compile what Nature calls "a single uber-pedigree" involving some 13 million individuals. "We Are Family," as performed by a huge swath of humanity. (But performed anonymously: In making that and similar pedigrees  available to other researchers, Erlich and his team stripped names from the data to protect individuals' privacy.)

So what does a database like that—the family tree, digitized—get us? For one thing, it allows for a kind of longitudinal analysis of given traits, helping researchers to gain insights into the nature-vs.-nurture aspects of those traits as they played out over time. It can also offer insights into how traits are, ultimately, controlled. Given a trait like fertility, say, are there a few genes that exert broad influence ... or is fertility influenced by many genes that have smaller effects? It might also help us to understand inherited diseases. (See, for example, the Iceland-based genetics firm deCODE, which is taking advantage of the country's famously rich genealogical data to help determine genetic signatures that can influence diseases—and their treatment.)

For all that, Nature notes, it's unclear how, exactly, researchers will use the database for their own purposes. ("Some scientists at the meeting expressed enthusiasm for the project," Heidi Ledford puts it, "but were hard-pressed to come up with a specific experiment using the data.") Put another way, though, the biggest uses for the results of Erlich's genome-hacking may simply be to come. And those uses would rely on developments that are cultural as much as scientific: on medical records being stored and analyzed in digital, and potentially public, forms. Imagine Erlich's database being linked to individual medical information. Imagine it being linked to DNA sequence data. As Nancy Cox, a human geneticist at the University of Chicago, tells Ledford: "We’ve really only begun to scratch the surface of what these kinds of pedigrees can tell us."

Via Nature