A new study warns that less-common languages are in danger of disappearing from the Internet.
Tongues including Icelandic, Latvian, Lithuanian and Maltese simply have too few speakers to gain a foothold, and too few examples online to power translation engines. While they are among those with the highest risk for digital extinction, no language — other than English — is safe. Even Dutch, French, German, Italian and Spanish were shown to have no better than "moderate support," when it came to resources to fuel increasingly sophisticated technology such as speech-to-text and voice-controlled devices.
The study, "Europe's Languages in the Digital Age," was carried out by META-NET, a European nonprofit that aims to future-proof at least 30 of the 80 languages spoken in Europe. META-NET has designated today (Sept. 26) as The European Day of Languages.
The researchers assessed language technology software, including spell and grammar checkers, virtual personal assistants such as Siri on the iPhone, online translators such as Google Translate and car navigation systems to see how well languages are represented digitally.
Languages are often automatically translated by comparing each new sentence against thousands of sentences previously translated by people and stored in a database. The better the match, the more accurate the result. But statistical methods are doomed to fail in the case of languages with smaller pools of sample data, the study said.
"The gap between 'big' and 'small' languages still keeps widening," Georg Rehm, co-editor of the study, said in a statement. "We have to make sure that we equip all smaller and under-resourced languages with the needed base technologies, otherwise these languages are doomed to digital extinction."
META-NET says the gaps in technology across European languages must be overcome to establish a single digital market, one where language does not hamper the flow of information. But the barriers are huge, they say. The deep-rooted English-language focus of most research and development is one big hurdle. In addition, Europe's oldest languages are imperiled by the inherent difficulties, little interest and insufficient funds on the part of tech companies.