Translating Shakespeare Into DNA

Could the solution to data storage turn out to be the carrying capacity of our own genetic material?


In an unexpected confluence of furtive love, lust, and nucleotides, researchers have proposed a new method for storing data: transforming information, including poetry, into DNA.

Only William Shakespeare could truly appreciate the scientists' choice of his own Sonnets to demonstrate their high tech prowess. It's a moment where some of the finest couplets and rhymes ever written literally merge with the basic chemistry of life, and 400-year-old lines of verse are translated into As, Cs, Gs and Ts.

The evolution of health technology. See full coverage

This comes as the challenge to store the tsunami of big data in medicine is taking on Shakespearean dimensions, with hard drives and magnetic tapes struggling to keep up.

The researchers didn't merely transpose English letters into genetic code. They made real DNA. Instead of coding for proteins, though, these info-molecules code for, say, the first two lines of the famous "Sonnet 18":

Shall I compare thee to a summer's day? Thou art more lovely and more temperate...

Transposed into DNA -- using a code devised by Nick Goldman and Ewan Birney of the European Bioinformatics Institute located near Cambridge, UK -- the first three words of this sonnet's second line read:


As reported in Nature, each letter in the words "Thou art more" is first converted first into a binary code of zeros and ones, and then translated through a process into five distinct DNA letters. For instance, the "T" in the word "Thou" becomes TAGAT.

Goldman and Birney first dreamed up the project in a pub in Hamburg, jotting down their thoughts on a napkin. Later they studied past efforts to use DNA to store data, and wanted to make some improvements to eliminate errors from earlier coding schemes.

Once they created their new DNA language they sent the ACG and T versions of Shakespeare's 154 sonnets to Agilent Technologies in Santa Clara, California. The company synthesized the strings of nucleic acids and shipped them back to the scientists, who reconstructed every line with 100 percent accuracy.

The researchers also created strings of DNA that stored part of Martin Luther King's 1963 "I have a dream" speech, plus the famous 1953 Nature paper by Francis Crick and James Watson that first described the double helical structure of DNA.

"I think it's a really important milestone," said Harvard geneticist George Church in Nature. Last year, Church encoded in DNA a draft of his recent book, Regenesis, and some other materials.

This process may also save us from drowning in our own information. In 2011, IBM estimated that 90 percent of all human data had been produced in the previous two years. This year, the total data produced in the world is expected to reach 2.7 zettabytes (zetta = sextillions).

Storing and processing all of this information takes millions of servers in vast warehouses, a grid that consumes about 30 billion watts of electricity a year. That's the equivalent output of about 30 nuclear power plants, according to a recent article in The New York Times. Estimates of power usage to keep servers going -- which adds to the production of greenhouse gases -- ranges from 1.5 percent to 2.2 percent of all power consumed in the U.S.

Compare this to DNA. One Shakespeare sonnet, rendered in DNA, weighs in at only 0.3 millionths of a millionth of a gram. One gram of DNA could hold the data from more than a million CDs. To put this in perspective, using George Church's method of coding words to DNA, one gram of DNA could store 700 terabytes of data in chunk of DNA that could fit on your fingertip.

Plus, DNA stored under the right conditions -- cold, dark, and dry -- could last up to thousands of years. The evidence comes from the discovery of DNA in long extinct animals like mammoths and Neanderthals, some of which has survived intact for tens of thousands of years.

You won't, however, be storing your favorite Led Zeppelin tunes or your Facebook profile pictures on DNA anytime soon. The cost per megabyte of data encoded runs around $12,400, according to Nature, and an additional $220 to translate it back into materials that might include, say, "Whole Lotta Love" or that shot of you vacationing in Maui -- or "Sonnet 18."

Cost will have to drop considerably to make this technology viable. Sequencing and retrieving processes will also need to improve dramatically if DNA "servers" are to be used for anything other than archival storage of data that does not need to be accessed quickly.

This makes the advent of a future "age of genoformation" unlikely anytime soon. Yet as costs plummet in sequencing I suspect this technology will, within a decade or two, move far beyond the stowing of 400 year-old sonnets and offer a new take on the notion that "brevity is the soul of wit" -- at least when it comes to storing big data.