The U.S. National Science Foundation has funneled $15 million into research projects aimed at organizing massive amounts of digital data, the foundation announced Wednesday (Oct. 3). If successful, the research projects may help scientists learn more from the data they already collect. "There are enormous opportunities to extract knowledge from these large-scale, diverse data sets," Suzi Iacono, a senior adviser at the foundation, said in a statement.
Over the past several years, sensors and other technology have improved to the point that they're collecting more data than traditional computer software is able to process. Space telescopes are now gathering enormous amounts of data, for example, while the U.S. National Institutes of Health manage the complete genetic data of more than 1,700 people. In the future, such information collections could help scientists make better decisions and predictions, Iacono said.
Finding important patterns in large amounts of data often requires new, specialized computer programs, however. In March, the White House announced the U.S. government would invest more than $200 million, across several agencies, into research regarding how to best analyze very large datasets. The $15 million dispensed Wednesday is part of that larger initiative.
Among the projects getting funded is a system that will constantly gather new data analyze it quickly. Previous systems were either good at quick analysis or getting fresh data, but not both. Researchers aim to make an index that's 200 times faster than current indexes, according to Rutgers University, where one of the project's scientists teaches computer science.
Another project focuses on helping scientists find the exact data they need in a larger set. Computer scientists plan to write a program that makes suggestions by finding patterns and recognizing relationships between the users of one dataset.
A third funds recipient is a system designed to comb through DNA data, looking for trends. The project's engineers want users to be able to enter in the hypothesis they want to test and receive an estimate from the system of how much data they need to analyze for the test to be scientifically sound.