Zhang: So what are you going to study now with the combination of DNA and genealogy data?
Erlich: Even better, we also have phenotypes now. Since October, we started to allow users to fill out surveys about themselves. So we have the genealogy and the surveys and the DNA. Our surveys are modeled after the U.K. Biobank surveys. We’re asking, did you have a heart attack? Are your parents suffering from Alzheimer’s?
About a year ago, Joe Pickrell and myself had a paper in Nature Genetics that was a genome-wide association study by proxy. Think about, say, we want to look at genes related to Alzheimer’s in our data set. If I go to our users and ask, do you have Alzheimer’s, they are healthy people; otherwise they wouldn’t be buying the test. So for certain diseases, it’s quite hard to get the information. What we show is you can ask users to ask about their first-degree relatives [parents, siblings, and children] and since you share half of their genome you lose half of the signal but you get so many people to answer the question that you get back to the power needed to implicate genetic variants.
Zhang: Let’s talk about privacy. Senator Chuck Schumer recently held a press conference calling for more scrutiny of DNA tests. You have a history of thinking about DNA and privacy, so how has that informed MyHeritage’s practices?
Erlich: It’s part of the challenge of this new era. At MyHeritage, we take it very seriously. We allow people to delete profiles. There are settings—you can have your profile private or public. People can delete their DNA data, and we’ll go to the lab and we’ll even wash away the tube. So we take these things very seriously.
But if you ask me, do you want to share with me your genealogy or your cellphone records or search-engine records, I will share my genealogy.
Zhang: In fact, you’ve put your own genome online.
Erlich: Because I feel like I don’t have a lot to risk in general. If you ask me do you want your search-engine data or data your ISP sees or your bank account versus your genome, your genome is actually quite—I don’t think it’s very interesting.
Zhang: In 2013, you actually published a paper finding that it’s possible to identify some DNA donors from publicly available information. I think this study still gets talked about a lot. Do you think it changed anything?
Erlich: I think it changed the way policy makers think about how we communicate risks to participants. I think previously the prevailing thought was we just promised them everything will be okay. Now we promise you 100 percent effort, but we are also learning.
I think the other interesting thing is in 2013, people didn’t understand why I did this study. I got many questions: “Why even do something like that?” And then now, we’ve matured into this data-intensive world, it became very clear this is the right research to do.
Zhang: That study was actually inspired by a mother-and-son pair who tracked down the son’s anonymous sperm donor using consumer genealogy databases, right?
Erlich: Yeah, the mother worked in Cold Spring Harbor [Laboratory in New York] or she used to work there 20 years before, and she contacted Cold Spring Harbor. I did my Ph.D. at Cold Spring Harbor, so I met her. I was like, “Wow, that’s crazy.” It was really mind-blowing.