IE 11 is not supported. For an optimal experience visit our site on another browser.

Let a thousand genomes bloom

Genetic researchers in China, Britain and the United States are teaming up to unravel the full genetic code of at least 1,000 people around the world - an unprecedented scientific project that could cost tens of millions of dollars and eventually reveal the roots of hundreds of diseases.

"The 1000 Genomes Project will examine the human genome at a level of detail that no one has done before," Richard Durbin of Britain's Wellcome Trust Sanger Institute, who is the project consortium's co chair, said in today's announcement. "Such a project would have been unthinkable two years ago. Today, thanks to amazing strides in sequencing technology, bioinformatics and population genomics, it is now within our grasp."

The project will build on the foundation created for HapMap, a similarly international gene-decoding effort. HapMap charted genetic differences between various geographical populations by looking at variations in "letters" of genetic code, known as single nucleotide polymorphism or SNPs. This time, researchers will analyze the full volume of human genetic information - which runs to a length of 3 billion letters, or roughly the entire English-language content of Wikipedia.

Using HapMap and other genetic databases, researchers already have identified about 100 regions of the genome that are associated with increased risk for diseases ranging from cancer and diabetes to cystic fibrosis and Huntington's disease. But in order to track down exactly what goes wrong and how to fix it, researchers generally have to go through another circuitous round of genetic sequencing.

Taking the shortcut

The 1000 Genomes Project is aimed at providing a shortcut: The organizers of the effort figure that by mapping at least 1,000 full human genomes, they should be able to catalog the variants that appear in 1 percent or more of the global population across most of the genome. Within specific genes, the precision should be even better, catching variations down to the 0.5 percent level.

That would improve the sensitivity of disease discovery efforts by a factor of five for the full genome, and by a factor of 10 or more within gene regions, said Francis Collins, who headed the Human Genome Project and is now director of the federally funded National Human Genome Research Institute.

Once the project's database is filled out, researchers could use genome-wide association studies to narrow down an area that appeared to be associated with a disease. Then they could consult the catalog for the assorted variations within that region. Finally, they could run studies to figure out whether - and exactly how - particular variations contribute to the disease in question.

The data will be made freely available to researchers around the world, starting in 2011 or so, via the National Center for Biotechnology Information, the European Bioinformatics Institute and the Beijing Genomics Institute in Shenzhen.

Who's involved?

The first samples for the 1000 Genomes Project will be coming from specimens already collected for the HapMap project and the extended HapMap set. The DNA is not linked to personal medical data, but rather to ethnic/geographical populations: Yoruba in Nigeria, Japanese in Tokyo, Chinese in Beijing, Utah residents with northern European ancestry, Luhya and Maasai in Kenya, Toscani in Italy, Gujarat Indians in Houston, Chinese in metropolitan Denver, Mexican-Americans in Los Angeles and African-Americans in the Southwest.

The project is getting major support from the institutes headed by Durbin and Collins, as well as from the Beijing Genomics Institute. A variety of American institutes and universities will be working through the National Human Genome Research Institute's Large-Scale Sequencing Network - and more institutions may join the international consortium as time goes on.

Based on current rates, the cost of sequencing so many genomes would amount to at least $350 million, and perhaps more than $500 million. Earlier this month, Massachusetts-based Knome and the Beijing Genetics Institute announced that they were pairing up to do whole-genome sequencing for 20 people, with a price tag starting at $350,000 per genome. (You think that's expensive? BGI did the first Chinese personal genome last year for $1.3 million.)

Over the next three years, the 1000 Genomes Project is aiming to bring the cost down to a tenth of the current rate - for a total cost of between $30 million and $50 million - by employing new sequencing technologies with greater efficiency.

The road ahead

The first year of the international effort will be taken up with pilot projects, aimed at finding out which combination of low-resolution and high-resolution sequencing will work the best. Then, during the scheduled two-year production phase, researchers hope to churn out an average of 8.2 billion DNA bases per day - the equivalent of more than two full human genomes every 24 hours.

"When up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year," the University of Oxford's Gil McVean, one of the co-chairs of the consortium's analysis group, said in today's announcement.

Will the project hasten the day when your genome is an open book, revealing your predisposition to suffer deadly diseases - and perhaps to do dastardly deeds? The project's organizers say that they're deploying a phalanx of ethicists to guard against abuses, and that the privacy of genetic donors will be preserved. What do you think? Learn more about the project from the 1000 Genomes Web site, as well as this advance report from Nature, then weigh in with your comments below.

Update for 10:50 a.m ET: Nature's follow-up report says some scientists fear the project's goals are too ambitious for its budget and timeline. The report also quotes Knome's George Church as saying the project might not be ambitious enough, because the database won't link genetic variants directly with disease data. The project organizers held back from doing that due to privacy concerns - and also because they felt the medical applications were best left to follow-up studies. More food for thought...