Feedback
Tech

Massive Autism Genome Database Finds Home on Google’s Cloud

A major autism research group is pairing up with Google to put an enormous archive of genetic data online.

Ten thousand genomes from people on the autism spectrum will be made available for scientists worldwide to peruse and analyze as part of the joint venture.

Autism Speaks, an organization that funds and advocates for autism research, maintains a private collection of DNA samples for use in studies. But handling, storing, and analyzing all that information isn't easy: the data for a single genome can be as large as a hundred gigabytes.

"That adds up quickly," said Robert Ring, chief science officer at Autism Speaks, in an email to NBC News. "We expect the total size of the completed database of 10,000 genomes along with all the associated clinical data to be on the petabyte scale."

Growing Autism Diagnoses Puzzle Researchers 3:13

To address this, the organization has partnered with Google's Cloud Platform. Comparing millions of genes and sequences is a job for supercomputers, and Google has plenty of those. They'll be storing the bulky genomes and providing the processing power necessary to analyze all that data.

What do they hope to find? Cataloging mutations and genetic anomalies and linking them to different autistic behaviors could help identify kids on the spectrum early, while comparing that data to other sets might illuminate possible causes or treatments. Different researchers will have different priorities, but anyone doing research into autism will likely want to dip into this database for one reason or another.

"We will be providing open access to the qualified researchers who agree to basic terms that involve responsible use of the data for research purposes," Ring wrote. "At the end of the day it is our goal to make the data available to the research community as quickly as possible."

When it comes to medical research, more data is almost always better. But don't expect any breakthroughs right away; it's taken years to examine just a fraction of the genome library.