“The vision for providing the data to any bona fide researcher without preferential access was really a game changer,” says Manny Rivas, a biostatistician at Stanford University. Rivas, who is an assistant professor, noted it is a real boon for junior faculty, who haven’t had years to amass their own data. The availability of a data set as rich and deep as U.K. Biobank democratizes genetics research.
On top of this shared data set, several research groups have now built freely available tools to help other scientists make use of U.K. Biobank’s data. Marchini’s group made a web browser dedicated to parsing genetic and brain data from U.K. Biobank. Albert Tenesa, from the University of Edinburgh, created GeneATLAS, which accounts for family members in the database, the presence of whom usually screw up the math used to find links between genetic variants and disease. Rivas made the Global Biobank Engine, which is essentially a search engine for genes potentially associated with any disease. The Global Biobank Engine, in turn, is partly based on calculations done by Ben Neale, a geneticist at the Broad Institute, who looked at nearly 2,500 traits and disorders and how they corresponded with genetic variants in the U.K. Biobank.
(Unlike U.K. Biobank’s full data, these tools are accessible to anyone with an internet connection, but they show only aggregate data, so study participants should not be individually identifiable.)
In the past, looking at how a single trait corresponded with a set of genetic variants could be a paper in itself. It’s called a genome-wide association study, or GWAS. Neale’s group did 2,500 GWASs in a single day—and he didn’t even bother to write a paper. It’s a blog post on his website. Neale says it didn’t quite feel like a discrete journal article. It’s more a starting point for scientists interested in specific genes or traits. He’s since heard from both pharmaceutical companies and academic researchers using his GWAS data.
Tenesa, who uploaded a preprint describing GeneATLAS on bioRxiv in August, says he has also heard from a couple dozen researchers using the tool. Some have asked him to run calculations for specific traits. This is happening as he’s still working to publish a paper about GeneATLAS in an official journal. It’s the way things are now. “When I get my email from Nature Genetics these days, and they tell you what papers have just been published, I’ve often seen the papers nine months earlier on bioRxiv,” says Marchini.
But is there such a thing as too fast? Jeffrey Barrett, a geneticist at the Sanger Institute, has cautioned against hastily posting preprints based on a quick GWAS. “I understand why,” he says. “It’s the quickest way to get out the stamp that you’ve done this analysis first.” But it’s easy to miss possible artifacts or mistakes in a data set as big and complex as this one. And now that it’s easy to identify genetic variants linked to a disorder, says Barrett, simply enumerating the variants doesn’t add much value. U.K. Biobank has made genetics research easier, but it is also raising the bar.