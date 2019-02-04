Yet, the search took 256 computers working together for an entire weekend, says Zamin Iqbal, a computational genomicist at the European Bioinformatics Institute, who collaborates with Public Health England. The researchers there did find colistin resistance among their 24,000 samples, and eventually, countries all over the world found it, too.

Why did this process take so long? The computers at Public Health England had to open up and search the sequencing files of 24,000 genomes one by one. If Google had to search every page on the internet for the word “pie” everytime you search for “pie,” that search would also take forever. Instead, Google is constantly indexing pages. If a blog post is written about “pie,” Google files that post under the “pie” entry in its index. So when someone comes along looking for pie recipes, it just has to serve up the pages under the “pie” entry. That’s part of the reason why a Google search takes less than a second.

So Iqbal decided to build a Google of sorts for bacteria and viral genomes. He and his colleagues downloaded all available genomes—nearly 500,000 at the time—from a public database called the European Nucleotide Archive. The 170,000 gigabyte dataset took six whole weeks to download. Then, the team indexed the data. The resulting tool is called BIGSI, for BItsliced Genomic Signature Index.

Searching for colistin resistance through nearly 500,000 sequences now takes just a few seconds.

Suppose a patient has an unusual brain infection, says Jennifer Gardy, a genomic epidemiologist who until recently was at the University of British Columbia and who was not involved with the project. Suppose it’s a pathogen the doctor doesn’t recognize. Before, the pathogen’s particular sequence might have been hiding in one of those 500,000 genomes. But a mountain of data is only as good as your ability to search it. “We can now go back and look through all of the DNA, through all of the other experiments that had done sequencing. Loads and loads of DNA,” Gardy says. For the first time, it’s possible to easily answer a question as simple as: “Have we seen this thing before?”

Since Iqbal and his co-authors started sharing their project—making a demo version of BIGSI available online, posting a non-peer-reviewed paper on the website bioRxiv, giving talks—they’ve been hearing from researchers who’ve started to use it. After Andrew Page, a bioinformatics researcher now at the Quadram Institute, learned about the tool, he walked back to his office and fired it up. Page was interested in a particular plasmid, a round loop of DNA, that helps make typhoid fever bacteria drug resistant. This plasmid seemed to have popped up out of the blue in Pakistan.

“Within in two seconds I got a list of twenty other samples where they were seen,” says Page. The plasmid wasn’t just in other typhoid bacteria. It was in soil bacteria, animal bacteria, E. coli—painting a much more complex picture of how resistance plasmids emerge and get swapped between different bacterial species.