Geneticists have finally assembled the genome of the huge loblolly pine. With more than 23 billion base pairs, it's seven times the size of a human's, and the largest one ever sequenced. But what makes its genetic code, like the tree itself, so big?
The loblolly pine is a major source for timber and paper products, so understanding its genetic code may be helpful in improving the tree's longevity and resistance to disease. But the sheer amount of genes that needed to be sequenced meant that, using the techniques we'd used to assemble the human genome, it would take years to do so.
Clever computer scientists and geneticists reduced the burden by a factor of 100 with two techniques. First, they initially sequenced only the genes found in the haploid portion of the tree's nut — meaning there was only one set of chromosomes to put together.
The problem was further simplified by letting a supercomputer assemble large chunks of genome it was fairly sure were contiguous — and discard millions of pieces already accounted for in these "super reads."
It's a lot like the way people approach jigsaw puzzles. Get the edges and a few big features done, and then you can focus on the small pieces that you're sure belong elsewhere. In sequencing, the benefit is even greater, because there are millions of duplicate "puzzle piece" sequences, and once one is found to fit somewhere, the rest can be discarded.
And as it turned out, those duplicates made up a full 82 percent of the genome. It seems that millions of years of short sequences copying and recopying themselves have made the genome something of a mess.
The findings are published in the journal Genetics. What's next for the team, led by Steven Salzberg at Johns Hopkins and James Yorke at University of Maryland? Sequencing the even more convoluted genome of the sugar pine: 35 billion base pairs.