There’s a Mystery Machine That Sculpts the Human Genome

Geneticists can’t see this machine, but they can see its works—and they say it might be the key to reshaping the genome.

Jason Ku and Erik Demaine

Genomes are so regularly represented as strings of letters—As, Gs, Cs, and Ts—that it’s easy to forget that they aren’t just abstract collections of data. They exist in three dimensions. They are made of molecules. They are physical objects that take up space—a lot of space.

Consider that the human genome is longer than the average human. It consists of around two meters of DNA, which must somehow fit into cells, whose nuclei are about 200,000 times narrower.

So it folds. And it folds in such a way that any given stretch can be easily unfolded, so the genes within it can be read and used. Knots are verboten, and anyone who has ever shoved headphones into their pockets will know how hard it is to scrunch an extremely long thread into a ball without knotting anything.

In the 1970s, biochemists showed that this feat of extreme origami begins when DNA is wrapped around proteins called histones, creating what looks like a string of beads. This reduces the packing problem, but doesn’t come close to solving it. The wrapped DNA must be folded and twisted in ever more complicated (and as yet unknown) ways. Eventually, it forms large loops.

The loops aren’t just a packing solution. They also bring genes into close contact with distant sequences that turn them on or off. So, the 3-D form of the genome also dictates its function. And to really understand how genes are used (and how they are misused in cases of disease), we need to appreciate the genome as a looping, twisting, physical entity, rather than just a string of letters.

In 2014, a team led by Erez Lieberman Aiden at Baylor College of Medicine took important steps towards this goal by creating an unprecedentedly detailed 3-D map of the human genome. These genetic cartographers used a technique called Hi-C to embalm the genome and identify regions that interact with one another. Using this method, they identified a grand total of 10,000 loops—far fewer than the millions that were thought to exist.

They also showed that the loops obey certain rules. Most tend to be short. They occur in the same places whether you’re looking at a neuron or a skin cell, or a human cell or mouse cell. And they almost always associate with a protein called CTCF, which acts as a fastener. In theory, two CTCF proteins will bind to separate stretches of DNA and then lock together, creating a loop and holding it in place.

But when Aiden’s team looked at CTCF more closely, they found a huge surprise. The protein recognizes and sticks to specific DNA sequences, which act as its landing pads. These sequences point in a particular direction, which means that a pair of them can line up in four possible ways. They don’t. In reality, they almost always line up in just one of the four orientations, pointing towards each other in what study co-leader Eric Lander described as “a genomic yin and yang.”

“That was a total bombshell,” says student Suhas Rao who worked on the project. He, like many others, had assumed that loops form when two stretches of free-floating DNA randomly find each other and are fastened by a pair of CTCF proteins. But that can’t be right. If it was, the CTCF landing sequences would align in all four possible orientations, rather than the very specific one that Rao saw in his data. The loops must be forming in a completely different way, one that’s deliberate and controlled.

Rao and fellow student Adrian Sanborn think that the key to this process is a cluster of proteins called an “extrusion complex,” which looks like a couple of Polo mints stuck together. The complex assembles on a stretch of DNA so that the long molecule threads through one hole, forms a very short loop, and then passes through the other one. Then, true to its name, the complex extrudes the DNA, pushing both strands outwards so that the loop gets longer and longer. And when the complex hits one of the CTCF landing sites, it stops, but only if the sites are pointing in the right direction.

This explanation is almost perfect. It accounts for everything that the team have seen in their work: why the loops don’t get tangled, and why the CTCF landing sites align the way they do. “This is an important milestone in understanding the three dimensional structure of chromosomes, but like most great papers, it raises more questions than it provides answers,” says Kim Nasmyth, a biochemist at the University of Oxford who first proposed the concept of an extrusion complex in 2001.

The big mystery, he says, is how the loops actually grow. Is there some kind of ratcheting system that stops the DNA from sliding back? Is such a system even necessary? And “even when we understand how loops are created, we still need to understand what they are doing for the genome,” Nasmyth adds. “It’s very early days.”

And then there’s the really big problem: No one knows if the extrusion complex exists.

Since Nasmyth conceived of it, no one has yet proved that it’s real, let alone worked out which proteins it contains. CTCF is probably part of it, as is a related protein called cohesin. Beyond that, it’s a mystery. It’s like a ghostly lawnmower, whose presence is inferred by looking at a field of freshly shorn grass, or the knife that we only know about by studying the stab wounds. It might not actually be a thing.

Except: The genome totally behaves as if the extrusion complex was a thing. Rao and Sanborn created a simulation that predicts the structure of the genome on the basis that the complex is real and works they way they think it does.

These predictions were so accurate that the team could even re-sculpt the genome at will. They started playing around with the CTCF landing pads, deleting, flipping, and editing these sequences using a powerful gene-editing technique called CRISPR. In every case, their simulation predicted how the changes would alter the 3-D shape of the genome, and how it would create, move, or remove the existing loops. And in every case, it was right.

“Our model requires very little knowledge beyond where CTCF is binding, but it tells us where the loops will be,” says Rao. “It now allows us to do genome surgery, where we can reengineer the genome on a large scale.”

This predictive power has several applications. Remember that loops allow seemingly innocuous stretches of DNA to control the activity of distant genes. If biologists can understand the principles behind these interactions, and predict their outcomes, they can more efficiently engineer new genetic circuits.

There’s a growing appreciation that some diseases are related to how the genome is oriented rather than just a mutation,” adds Rao. “This is a little speculative, but there might be diseases where you could go in, put a loop back, and fix the problem.”

“The ability to read out the 3-D structure of a genome is improving rapidly,” the team writes. “As shown by our genome-editing experiments, it may now be possible not only to read genome folding patterns but also to write them.”

Najeeb Marc Tarazi, Adrian Sanborn, and Erez Lieberman