A group of Israeli researchers has built a computer algorithm to decode one of the most important books in Western culture: the Bible.
The results accord generally with the consensus of scholars that the book contains writing styles defined as "priestly" and "non-priestly."
The scientists developed an algorithm able to analyze the the writing styles found in different parts of the "five books of Moses," or Pentateuch, that is Genesis, Exodus, Leviticus, Numbers and Deuteronomy.
The algorithm compared sets of synonyms (called synsets) in blocks of text, along with "function" words, such as prepositions. It then looked at the distribution of the most common words in the Bible. By finding sets that were similar in any two blocks, it was able to group them according to the style they were written in.
The synonyms were identified using Hebrew roots that were translated the same way in the King James version, based largely on the work of the 19th century scholar James Strong.
Computer scientist Moshe Koppel of Bar-Ilan University, a member of the team that developed the algorithm, noted one interesting result: the synonyms for "God" weren't that important. "Some of the (synonyms) that do the heavy lifting on the Pentateuch had been noted before by scholars, but the most famous synset -- names of God -- actually didn't help at all."
That may sound counter intuitive, but Koppel said there are about 150 different sets, so the fact that a word of historical significance doesn't help determine authorship isn't that shocking.
To test out the algorithm, the researchers used it to analyze two well-known books of the Bible, Jeremiah and Ezekiel, who scholars agree had two different authors. They cut the text up and mixed them together at random. The algorithm managed to separate the two with near 99 percent accuracy, demonstrating that the method worked.
Koppel stressed that the algorithm can't say exactly how many authors the Bible has (or doesn't have). But it can say where styles change. That alone can shed light on debates over authorship. Generally speaking current scholarship divides the Pentateuch into two writing styles: priestly and non-priestly. The algorithm in most areas divided the text the same way, so that would seem to show that the division is valid.
But there was one big caveat: the researchers had to tell the algorithm how many stylistic "families" they wanted the text to be split into. While asking for two gave a result that agreed generally with scholarly consensus, dividing the text into more than that seemed to stray from it.
University of Pennsylvania professor of linguistics Mark Liberman, who wasn't connected with the research, noted the big innovation was the use of synsets rather than just the location of words or their frequencies.
"The key to making such methods work is to hit on features (words or constructions or word-senses or whatever) that genuinely differentiate the authors," he said. "In their experiment on un-munging Jeremiah and Ezekiel, they found that word distributions did not work well; but synonym choice (as estimated in a clever way) did work."
That could make the algorithm useful for analyzing other historic texts. Because it uses criteria not subject to interpretation. Ignoring what the writer "meant," it can quickly zero in on what was actually written. It can also pick up more subtle changes in word use and distribution than a human can, since it can instantly check through hundreds of synonym sets.