Mission accomplished — or close enough, anyway.
That was the message scientists sent to the world in 2003 when they announced that the human genome had been sequenced, assembled and was essentially complete — with a few seemingly minor gaps.
In reality, the effort to quantify and identify the genetic code that makes us all human, which cost the U.S. government billions of dollars, remained a rough draft and at least 8 percent short of being finished.
Some of the largest, most repetitive and complex pieces of the DNA puzzle remained in the dark — until now.
Buoyed by powerful new sequencing technology, a loose collaborative of about 100 scientists announced Thursday they’d filled in the gaps, completing a single human genome from one end to the other and opening new, promising lines of research in areas where scientists have been wandering around in the dark.
The genome’s sequencing was first shared publicly more than a year ago, but the results from a full accounting, now vetted and in use by researchers across the world, was published for the first time Thursday in a peer-reviewed journal. Six new articles describe the complete sequencing effort and additional analysis of its impacts in the journal Science.
“It’s done, and it’s correct, and it’s been through all those levels of vetting,” said Adam Phillippy, a computational biologist at the National Human Genome Research Institute, and a leader of the recent effort. “We’re optimistic there might be keys to human evolution and what makes us uniquely human.”
This legwork could one day assist researchers in identifying the genetic causes of disorders, untangling the mysteries of what drives some cells to become cancerous and help explain how different groups of people developed different traits over time, such as the ability to thrive at high altitude.
“It’s a landmark,” said Steve Henikoff, a molecular biologist and a professor at the Fred Hutchinson Cancer Research Center and the University of Washington, who was not involved in the project.
From lines to pages
Assembling a genome is akin to “taking a book, ripping it up into tiny pieces and matching it together again,” said Megan Dennis, an assistant professor who studies human genetics and genomics at UC Davis Health, who contributed to the sequencing effort.
First, researchers must chop the DNA up into short fragments. Then, it gets processed and read bit by bit.
Sliced into pieces, it is difficult to know where each strand came from, so scientists must “stitch that DNA together in a computational way,” Dennis said.
During the 2000s, DNA sequencing technology could only produce short fragments of genetic code — about 500 base pairs, or letters, at a time.
But some regions of the human genome are extremely repetitive, almost like a book page with words repeated many times.
“Repetitive elements exist in many different places. It’s hard to know where they belong,” Dennis said. For years, scientists just had to leave those pages — and their understanding of the genome — blank.
In recent years, new technology that creates longer reads of DNA has changed the game entirely. New machines can produce hundreds of thousands of base pairs in a single chunk.
The advances have allowed researchers to fill in the genome’s missing pieces.
“It would have been unthinkable 20 years ago to have this technology,” Phillippy said. Suddenly, researchers could order and place into context those repetitive parts of the genome.
“Those sequences have genes … there’s very important functions contained within those regions.”
A pandemic project
The idea to finish the genome grew organically.
A perfectionist at heart, it had always grated on Phillippy that the human genome remained incomplete.
About five years ago, he teamed up with Karen Miga, an assistant professor in the biomolecular engineering department at the University of California, Santa Cruz, to finish the job.
When they got stuck, they reached out for help. The project began to snowball, accumulating about a few hundred scientific contributors and growing into what’s now called the Telomere-to-Telomere project, using a term that describes the end caps of chromosomes.
When the pandemic hit, the pace of research only accelerated, with researchers communicating from dingy basements on the communication platform Slack and over Zoom calls.
“2020 was a crazy year for many reasons. It gave us something to focus on,” Phillippy said.
Ultimately, the researchers pieced together the entire genetic code for a single version of a genome. That genome — which was derived decades ago from cell tissue that contains the genetic information of a single sperm — does not represent any human who ever lived because it only contains one set of paternal chromosomes.
The completed code will now form the backbone of new genomic research, and becomes a new, finished reference for comparison.
Theory and practice
The completed genome opens new avenues for research.
For decades, scientists have been poring over the 92 percent of the genome available, probing it to find genetic variations that could be causing diseases.
“We have a good grasp of what variation looks like in those regions, but we have no idea about the other 8 percent,” Phillippy said.
Now, researchers are reanalyzing their old data against the new reference genome, trying to tease out new clues from what had been missing.
“We identified many more, tens of thousands, if not hundreds of thousands, of new variants,” Dennis said. “Some of them fall within genes that encode proteins and some of those genes are medically important, clinically important, and contribute to diseases.”
The new genome reference also enables further study of how centromeres work.
Centromeres are structures in the middle of chromosomes that are filled with repeating sequences of code and integral to the cell division process. They’re historically among the least understood parts of the genome because they contain so much tedious, dense coding.
“We don’t understand the underlying mechanism of the evolution of centromeres,” Henikoff said. “All of a sudden in the past year as the data have been coming out, we’ve been learning a lot more about centromeres.”
Using the new genome, researchers can better study how centromere proteins assemble and what happens when they change or lose function.
“Centromere dysfunction can be a serious driver in cancer,” Henikoff said. Until now, “we’ve been hampered because we haven’t had a reference sequence.”
Further study of newly-sequenced portions of the genome could also help scientists better understand how humans evolved particular traits, such as the bigger brains that sent them down a genetically distinct path from their great ape ancestors.
“The things that make our frontal cortex bigger come from the genes that map in these repetitive regions,” said Evan Eichler, a professor in the department of genome sciences at the University of Washington School of Medicine and also part of the research collaborative.
Advances in genomic sequencing technology could drive a renaissance of medical breakthroughs, the researchers say.
“I’m more excited about what we don’t know and the opportunities for discovery,” Miga said.
Phillippy said his next goal is to streamline the sequencing process to make it cheaper, more efficient and broadly available. He also plans to sequence genetic code with both paternal and maternal chromosomes. Sequencing broadly among people from many backgrounds will help describe the world’s genetic diversity and home in on important genetic variations, he said.
He envisions a world in which everyone has access to their genetic data, which could help provide individualized information about what diseases doctors should watch for or which drugs to prescribe.
“Within 10 years, getting a complete, perfectly accurate human genome will be a routine part of health care and it will be cheap enough that it won’t be a second thought — an under $1,000 lab test,” Phillippy said. “You’ll have the complete genome in your pocket.”