How Far Should Life’s Genetic Alphabet Be Stretched?

With the technology to add new bases to DNA, scientists now have to figure out if it’s possible to improve on nature’s genetic code.

Two bears growl at each other.
Eric Nyquist / Quanta Magazine

With recent innovations in gene editing, it may seem as if the field of synthetic biology is just starting to make strides into science-fiction territory. But for several decades, scientists have been cultivating ways to create novel forms of life with basic biochemical components and properties far removed from anything found in nature. In particular, they’re working to expand the number of amino acids—the building blocks of the proteins that perform the cell’s functions—in life’s stockpile.

In November, a group of researchers announced some of their greatest progress yet. But that breakthrough has also provided the opportunity to reflect on how and why they are looking to improve on nature at all—and what challenges they may face in turning those successes into more than demonstrations. A long history of theoretical work, after all, suggests that natural evolutionary forces settled on the genetic code universal to most organisms for good reason.

The impetus to engineer a more extensive code comes with several long-term goals. With more amino acids, it becomes possible to synthesize artificial proteins that could in principle serve as drugs or industrial enzymes that act more efficiently, effectively, and precisely. Artificial proteins could also tell us more about how natural proteins work, by demonstrating how their structure informs their activity and function.

Other applications of the research include conferring virus resistance to specific cells, for use in vaccines or transplants, and manufacturing novel materials endowed with desirable attributes like the ability to withstand high temperatures or pressures.

* * *

A research team at the Scripps Research Institute in California has now brought us closest to achieving these aims by designing bacterial cells that can replicate, transcribe, and translate an artificial DNA base pair. For nearly 20 years, the scientists painstakingly worked out how to add two new custom-made letters to the genome’s natural four-letter vocabulary, integrate them into the cell, and synchronize a complex series of processes to make that expanded vocabulary meaningful. The resulting protein made use of an amino acid that the cell wouldn’t normally employ.

The work, published in Nature, represents one of several ongoing efforts to increase the number of amino acids that DNA encodes. Take any organism on Earth, and its DNA and RNA have four nucleotide bases, or letters (usually abbreviated as A, T, C, and G in DNA; in RNA, another base, U, takes the place of T). Those letters constitute an alphabet that ultimately spells out how to make proteins. But for that to happen, the cell first has to read and translate that alphabet, using a set of rules—the genetic code—to decipher its meaning.

Basically, the cell’s protein-making machinery reads a sequence of DNA as a sentence composed entirely of three-letter words called codons. Codons name amino acids to add sequentially to a protein. With four nucleotide bases at the cell’s disposal, 64 codons are possible: One to six codons specify each of the 20 natural amino acids most commonly used, and three tell the cell to stop building the protein.

By adding a fifth and sixth letter to DNA—which the Scripps researchers, led by Floyd Romesberg, a chemist, have informally labeled as X and Y—the number of available codons explodes to 216.

The Scripps team’s accomplishment does not stand alone. Steven Benner, a chemist at the Foundation for Applied Molecular Evolution in Florida, and his colleagues have made a 12-letter genetic alphabet (although they have not put their new base pairs into a living cell). In both cases, having more bases offers lots of latitude to bring nonstandard amino acids into proteins with never-before-seen forms and functions.

Moreover, expanding the number of bases isn’t the only way to get more amino acids. George Church, the prominent geneticist at Harvard University known for his entrepreneurial endeavors in biotech, is spearheading an effort to reclaim redundant codons for natural amino acids to specify noncanonical ones instead. And Jason Chin, a biochemist at the Medical Research Council Laboratory of Molecular Biology in England, has created a ribosome (the cell’s protein-producing factory) that reads codons made up of four letters, not three.

* * *

Playing with the parameters that define the natural genetic code—four nucleotide bases, three-letter codons, 20 amino acids—leads back to questions raised decades ago about how that code evolved and whether it is optimal. Might having six bases be better than four? Do 21 amino acids do more for the cell than 20? What about 25? “These were questions that were un-askable until very recently,” says Stephen Freeland, an evolutionary biologist at the University of Maryland at Baltimore County, who has run theoretical studies on the comparative fitness of the genetic code. Now that expanded codes are a technological reality, scientists can for the first time start thinking about answering them experimentally.

Researchers studying the genetic code have gradually determined that its codon–amino acid assignments are decidedly not random. They instead seem to be a product of natural selection, optimized to generate a favorable degree of genetic diversity, as well as to help safeguard the organism’s cells against the kinds of errors that tend to occur most frequently during the process of protein synthesis.

The code achieves this in a number of clever ways. Codons that denote the same amino acid, for example, tend to differ only by the nucleotide in their third position, because that’s where the cell’s translation machinery is most likely to make a mistake. (Take glutamic acid, for instance, which is specified by both GAG and GAA.) Even codons for different amino acids that have two of their three letters in common tend to translate into amino acids that share key chemical properties. As a result, common genetic errors will still leave proteins folding mostly as they should, and retaining their correct function.

Computational experiments, including ones by Freeland, have compared the resilience of the real genetic code with that of potential alternatives, in which codons were assigned arbitrarily to amino acids. Nature’s genetic code outperformed nearly all of them. “For what we have,” says Chang Liu, a synthetic biologist at the University of California at Irvine, “it’s better than a one-in-a-million code.”

But while “the genetic code is a very long way from random,” Freeland says, “it’s also a very long way from perfect.” That is, it may be locally optimal—the best of the many, many codes made possible by the chemistry of 20 amino acids—but that doesn’t necessarily mean it’s globally best. “What Darwinism does,” Benner says, “is to search locally in the sequence space. You get by with what works.”

* * *

The ability to increase the number of base pairs or amino acids changes the rules of that game entirely. Because even a binary system of bases would have been incredibly efficient, many researchers posit that primitive cellular life began with a single pair of bases, and evolved a second pair only after cellular systems became more complex and sophisticated, and a higher information density in DNA became advantageous. But why stop at four? “Would upgrading to six or eight bases be augmenting this?” Freeland asks. “You’d get even more information per length of genetic segment. It would be very interesting to see the ramifications of that, to see if it would actually make something better and more efficient.”

Some argue that six (or more) bases could in fact be less optimal: Mutations might become too common and cells would have difficulty doing damage control. Simulations have suggested that populations of organisms that use two base pairs would not only have optimal replication accuracy but would evolve most efficiently and reach the highest levels of fitness, according to one study.

But that argument comes with a caveat, according to Romesberg: Without knowledge of the selective pressures that existed billions of years ago when the code would have been evolving—without a clear picture of how rapidly the environment was changing or what competition looked like—it’s impossible to make such judgments about mutation rate. The same general argument could apply to whether three is the best number of letters for constituting a codon. “When you don’t understand the problem, it’s very hard to theorize about it,” he wrote in an email.

Ultimately, then, the argument comes down to whether the observed number of amino acids—20 in most organisms, although some organisms code for 21 or 22—is optimal. At the very least, 20 is “good enough,” Freeland says. That number enabled the emergence of all living organisms, and their adaptation to every extreme environment thrown their way. The 20 naturally occurring amino acids are ideally and evenly spread over a wide range of hydrophobicity, size, and electronegativity values.

But would adding more colors to the palette improve anything? Some say no—that having 20, 21, or 22 is a “Goldilocks” scenario, and that their properties are already spread out enough to allow proteins to be fantastically varied while also evolving efficiently.

Others disagree, and are holding out for evidence that they hope is soon to arrive. According to Benner, our DNA’s nucleotides are not as stable as they could be, and having an expanded alphabet could very well have a positive effect if the additions are well chosen.

“It’s conceivable that on a long evolutionary timescale, having additional amino acids would be advantageous, allowing the host to adapt in new ways,” Liu says. “But it would be an entirely new chemistry that’s difficult to predict.”

Freeland agrees, noting that evidence suggests life began with a smaller handful of amino acids and gradually enlarged its inventory. “There’s nothing magic about 20 amino acids,” he says. “It’s not clear to me what advantage there would be to going beyond that, though. I’m not saying it couldn’t get more optimal. It’s just that it’s already good enough.”

Major innovations in the genetic code might also have difficulty taking hold because many researchers describe its rules as effectively “frozen.” Once organisms began to flourish with three-letter codons, Benner says, it became hard for anything that deviated from that system to compete.

* * *

Right now, the expanded synthetic genetic codes certainly don’t offer much in the way of competition: They are definitely less optimal than the natural one. Romesberg’s semisynthetic organisms replicate less efficiently and experience a greater rate of mutation. Their codons are not as stable, in part because the artificial X-Y base pair tends to mutate to a natural one relatively quickly. Romesberg’s lab is working on ways to overcome those problems. “Nature has had a lot more time to figure it out,” he says. Besides, optimization isn’t necessarily the goal, given that the experiments being done to incorporate noncanonical amino acids are exploratory and geared toward applications, not theoretical research. In fact, Church says, although he and his colleagues would want their experimental organisms to be robust enough to thrive in the lab, they would want the cells’ new genetic code to be “slightly more brittle” to increase the odds of lethal mutations if they were to escape.

That said, when technology catches up to help ensure the stability and precision of the genetic codes designed by Romesberg, Benner, and Church, it will be possible to test whether having a greater number of amino acids is better. For the time being, they’re just getting started. As Freeland puts it, “We’re kind of in the Wild West now, aren’t we?”