IE 11 is not supported. For an optimal experience visit our site on another browser.

Computer program helps decode ancient texts

An ancient, indecipherable text from the Indus Valley region is slowly being decoded with the help of a computer program, according to recent research.
Image:
A collection of written texts of the ancient inhabitants of the Indus River Valley region. Although the meaning of the symbols in these texts have long eluded scientists, computers are helping researchers to slowly decode their meaning.J. M. Kenoyer
/ Source: Discovery Channel

An ancient, indecipherable text from the Indus Valley region is slowly being decoded with the help of a computer program, according to recent research.

Though it has yet to decrypt this mysterious language, the program may help to decipher other ancient texts whose meanings have been long since forgotten.

"The computer program operates on sequences of symbols, so it can be used to learn a statistical model of any set of unknown or known texts," said Rajesh Rao, a University of Washington professor of computer science and co-author of the paper published in the Proceedings of the National Academy of Sciences (PNAS).

"In fact, such statistical models have been used to analyze a wide variety of sequences ranging from DNA and speech to economic data."

Roughly 5,000 seals, tablets and amulets, filled with about 500 different symbols, were created somewhere between 2600 and 1900 B.C. by a people living in the Indus River Valley.

Despite numerous attempts to decipher the symbols, a full translation has long eluded scientists. In fact, one recent paper even cast doubt on whether the Indus Valley script was even a written text at all, but rather political or religious symbols.

To start the search for what meaning the text might hold, American and Indian scientists input the symbols into a computer program and then ran a statistical analysis of the symbols and where they appear in the texts.

With that information, the program can do many things: create new, hypothetical Indus Valley texts, fill in missing symbols in existing texts, and tell the scientist if a particular text has been generated by their computer model.

"We used the latter to show that the Indus texts that have been discovered in West Asia are statistically very different from the texts found in the Indus Valley," said Rao, "suggesting that the Indus people used their script to represent different content or language when living in a foreign land."

For Asko Parpolo, a professor at the University of Helsinki and an expert on the Indus Valley script, the PNAS research helps prove that the symbols are indeed an early written language. It does little, however, to decipher the text.

The written of the ancient inhabitants of the Indus Valley might never be decoded, according to Parpolo, but computer modeling of unknown languages could help reveal their meaning as well, said Marcelo Montemurro, a scientist at the University of Manchester.

Using modern texts to validate this theory, Montemurro and his colleagues used computers and information theory to find the main topic of written works including Charles Darwin's "On the Origin of Species" and Herman Melville's "Moby Dick." Not surprisingly, words like "species," "selection" and "islands" were some of the top ten words in "Origin of Species."

Montemurro now wants to test his model on an undeciphered medieval text known as the Voynich manuscript.

"The text is not long, but these methods can be applied so we can at least obtain a list of special words that would presumably convey the overall meaning of the texts," said Montemurro.

The technique "separates words like 'a' and 'the' that are frequent but not functional from words that presumably convey the overall meaning of the texts," said Montemurro. With the most significant symbols identified, scientists could then study those symbols intensively to decipher the language more quickly.

For now, however, the Indus Valley script and the Voynich manuscript, along with many other ancient texts, remain indecipherable, but scientists are hopeful that computers will eventually decode the symbols on them.

"There are some who say the (Indus Valley) script can never be deciphered without a bilingual text like the Rosetta Stone or really long texts," said Rao.

"I am however optimistic that given a few more years, we may be able to at least narrow down the language family of the script by using computer analysis to gain an in-depth understanding of the underlying grammar."