Maximizing the potential of our increasingly vast base of scientific knowledge
We aren't yet at the stage where we can loose computers upon the stores of human knowledge only to return a week later with discoveries that would supplant those of Einstein or Newton in our scientific pantheon. But computational methods are helpful. Working in concert with people -- we are still needed to sort the wheat from the chaff -- computational programs and automated techniques can connect scientific areas that ought to be speaking to each other yet haven't, stitching together different fields until the interconnectivity between the different areas becomes clear.
In the fall of 2010, a team of scientists in the Netherlands published the first results of a project called CoPub Discovery. Their previous work had involved the creation of a massive database based on the co-occurrence of words in articles. If two papers both have the terms "p53" and "oncogenesis," for example, they would be linked more strongly than words with no two key terms in common. CoPub Discovery involved creating a new program that mines their database for unknown relationships between genes and diseases.
Essentially, CoPub Discovery automates the detections of relationships between thousands of genes and thousands of diseases, gene pathways, and even the effectiveness of different drugs. Doing this automatically allows many possible discoveries to be detected. In addition, CoPub Discovery also has a careful system of checks designed to sift out false positives -- instances where the program might say there is an association when there really isn't.