Massive Autism Genome Database Finds Home on Google's Cloud

Breaking News Emails

Get breaking news alerts and special reports. The news and stories that matter, delivered weekday mornings.

A major autism research group is pairing up with Google to put an enormous archive of genetic data online.

Ten thousand genomes from people on the autism spectrum will be made available for scientists worldwide to peruse and analyze as part of the joint venture.

Byers Market Newsletter

Get breaking news and insider analysis on the rapidly changing world of media and technology right to your inbox.

Autism Speaks, an organization that funds and advocates for autism research, maintains a private collection of DNA samples for use in studies. But handling, storing, and analyzing all that information isn't easy: the data for a single genome can be as large as a hundred gigabytes.

"That adds up quickly," said Robert Ring, chief science officer at Autism Speaks, in an email to NBC News. "We expect the total size of the completed database of 10,000 genomes along with all the associated clinical data to be on the petabyte scale."

To address this, the organization has partnered with Google's Cloud Platform. Comparing millions of genes and sequences is a job for supercomputers, and Google has plenty of those. They'll be storing the bulky genomes and providing the processing power necessary to analyze all that data.

What do they hope to find? Cataloging mutations and genetic anomalies and linking them to different autistic behaviors could help identify kids on the spectrum early, while comparing that data to other sets might illuminate possible causes or treatments. Different researchers will have different priorities, but anyone doing research into autism will likely want to dip into this database for one reason or another.

"We will be providing open access to the qualified researchers who agree to basic terms that involve responsible use of the data for research purposes," Ring wrote. "At the end of the day it is our goal to make the data available to the research community as quickly as possible."

When it comes to medical research, more data is almost always better. But don't expect any breakthroughs right away; it's taken years to examine just a fraction of the genome library.