A Chemist Uses Google's Algorithm to Determine the Structure of Molecules

Scientists want to find a use for PageRank in the world beyond the web.

Chemistry is a science of links. Atomic units, made into more than they are by attractions and connections and bonds, all in a constant flurry of motion and change.
Sounds a little bit like the Web, doesn't it?
Aurora Clark, an associate professor of Chemistry at Washington State University, thinks so. She noticed that there's a probabilistic nature to molecular links: Some are stronger, and more likely to materialize, than others. Which is, broadly, the logic that guides the guiding algorithm of Google search: PageRank. While the mathematical ingredients of Google's most-secret secret sauce are closely guarded, the algorithm's broad approach -- the quantification and prioritization of links to determine a structure based on mutuality and relevance -- is ripe for the borrowing.
Clark and colleagues Barbara Logan Mooney and L. Rene Corrales, in a paper published in The Journal of Computational Chemistry, have done that borrowing. They've developed a chemistry-specific version of PageRank, moleculaRnetworks, which can be used to determine molecular shapes and chemical reactions ... minus the expense (and occasional danger) of lab experiments.

The model -- per the paper, an "integrated graph theoretic and data mining tool to explore solvent organization in molecular simulation" -- focuses on hydrogen bonds in water. "We take PageRank," Clark explains, "and we say that two water molecules are like two Web pages, and that their hydrogen-bonding interaction is like a hyperlink. And then we map that onto millions and millions and millions of water molecules. And from that we get a picture of the entire water network."

Furthermore, "from that connectivity, and that network picture, we can actually predict chemical activity."

Or, as the paper explains it:

moleculaRnetworks contains novel analysis algorithms and techniques unavailable elsewhere. These include graph theory-based analyses of network structure, including clustering of solvent molecules, and connectivity information such as PageRank (PR). PR, most famously used by Google to evaluate the importance of websites on the Internet, is used here as a descriptor of H-bonding structure and is unique to this toolkit. The scripts further use PR to instantaneously identify the geometric organization of the solvent about the solute, so that the dynamics of solvent shells can be monitored ....

Though the algorithm uses water as its test case, the molecule's ubiquity in living things means that moleculaRnetworks has potential uses beyond chemistry. The model could, Clark says, help scientists to understand how diseases spread in the human body -- and, therefore, to understand how medicines might be optimized to fight them. More broadly, the algorithm offers a nice lesson in the power of cross-pollination between technology and academia -- a link that Google, of course, has embodied from the start.

Image: The Journal of Computational Chemistry.