Global Lingo Activate Menu

Using Google to study languages

Written by on

One of the many advantages of technology today is that we can better observe and analyse anything and everything. Researchers from the Royal Society of London have used Google Books Ngram corpus to analyse 8 million books, which is about 6 percent of all books ever published according to Google’s own estimates. Using Google’s massive database, this study is shaping up to be the largest yet.

The researchers decided to look at written language because it is more conservative in its expressions than spoken language; there are also more records of it.

Ironically, the researchers themselves had a hard time understanding each other.

The lead author was Søren Wichmann, a Dane working at the Max Plank Institute for Evolutionary Anthropology in Leipzig, Germany. The co-authors were Valery Solovyev, a linguist at Kazan Federal University in the Republic of Tartarstan in Russia, and astrophysicist Vladimir Bochkarev, also at Kazan, who was interested in languages. The study took place at the Kazan Linguistics Laboratory.

The research didn’t go smoothly because Wichmann did not speak Russian and Bochkarev didn’t speak English. Wichmann’s wife helped with occasional translations but more often they used Google translate, which apparently wasn’t very helpful.

One result of the study was that it showed how languages were shaped by culture

Words that were once specialised would change to a broader meaning and vice versa. One such example is the English word for ‘dog’, which was ‘hound’. Today ‘hound’ is a specific type of dog.

In contrast, the word ‘vodka’ is replacing the word ‘liquor’ in some places, which is the same thing that is happening to the word ‘dog’ but in reverse.

Wichmann said: “Any major change in society will change the frequency of words.”

Researchers found that languages change at a similar rate, but the rate itself is measured in half centuries unless something intervenes, like war. When war breaks out, changes to languages are rapid and they usually include new words that are specific to the conflict.

During the Victorian era, when Britain was a stable empire, languages changed steadily. That was until the beginning of the 20th Century when times grew more chaotic. Language changes occurred more rapidly then.

The research also showed that, from around 1850, British English and American English drifted apart. For the first half of the 19th Century British English was the same as American English, although its vocabulary lagged behind the American English by about 20 years. New words appeared in the American lexicon that only appeared in British English 20 years later. However, British English began to catch up with the arrival of mass-media from 1950, and today the two types of English are more similar than ever.

The study also revealed why some languages are more difficult to learn than others

Researchers noted that languages contain something that linguists call a ‘kernel lexicon’, which is a list of words that constitute 75 percent of a language. When learning a new language, starting with the kernel lexicon of any language and mastering it will help understand much of the literature.

The English language has a kernel lexicon of 2,400 words and a total of around 600,000 words. Russian has a kernel lexicon of 24,000 words, while having only a sixth of the total words than that of the English language. Without knowing at least 21,000 of the words from the kernel lexicon, Russian writing will mostly be incomprehensible.

Sometimes the change in the words we use tells us more than we think. For example, Wichmann pointed out that, in recent years, the word ‘divorce’ has been used more frequent than the word ‘marry’.

Article source.

How can we help you?

  • This field is for validation purposes and should be left unchanged.
Vendor Spring Newsletter

Our news

Vendor Spring Newsletter

The Global Lingo Vendor Management Team is happy to release the third edition of the Vendor Newsletter. This issue introduces more of our team and is mostly focused on how to improve efficiency as a translator and transcriber. Click here to view the Spring edition of the Vendor Newsletter