Does Google Translate speak "like a 10-year-old"?

Nov 7, 2011 by

[Thanks to Martin W. Lewis for inspiration for this post]

In several earlier posts (see here, here and here), I’ve already touched on the topic of Google Translate and its failures to… translate. But the argument continues, with more and more GT propaganda pieces appearing in the popular media on a regular basis. Here’s one of the latest examples: an article by Jeremy Kingsley in Slate. According to Mr. Kingsley, Google Translate “already speaks 57 languages as well as a 10-year-old” — but does it?!

The typical defence of Google Translate advocates is that it allows one to “get the gist of it”, but as I showed in the earlier posts, to say that, one has to define “the gist” very loosely. Here’s an additional example:

One question though: what interests our government have promoted to integer island. Nothing in their history, customs and Muslim clan, political and economic … neighbor, do not close to us except to ostracize them fly abroad. This department will remain a burden for our country. So to answer this question: why this referendum proposed pipe?

Did you get the gist of it? This is the GT-produced French-to-”English” “translation” (in fact, it’s neither English, nor a translation) of a comment from a forum discussion of Mayotte becoming an overseas department (département d’outre-mer) of France (the event itself is discussed in detail in an excellent post in Martin W. Lewis’s GeoCurrents blog).

Judging by GT’s translation, the commentator is not pleased with Mayotte’s incorporation into France, but why? It is hardly clear from this feeble attempt at translation. So here’s the original French passage:

“Une question tout de même : quels intérêts notre gouvernement a-t-il favorisés pour intéger cette île . Rien dans leur histoire , moeurs musulmanes et claniques , contexte politique et économique voisin… , ne les rapproche de nous sauf pour ostraciser l’étranger qui les vole . Ce département restera un poids pour notre pays. Donc qui répondra à cette question : pourquoi avoir proposé ce référendum pipé ?”

And here’s a human-made translation:

A question all the same: what interests has our government encouraged to integrate this island? Nothing in their history, Muslim and clannish morals, or local political or economic context moves them closer to us except to ostracize the foreigner who robs them. This department will remain a weight on our country. So who will respond to this question: why was this loaded referendum proposed?

This exercise in translation, GT and human, highlights the falsity of Mr. Kingsley’s statement (which he argues for throughout the article) that GT “speaks like a 10-year-old”. There’s nothing closer to the truth. In fact, GT and a human child (speaking any language, it doesn’t matter which) handle language completely differently. And it shows in the results.

As you can see in the passage above, GT handles best “the big words”, rather than “the small words” or the grammar. In the passage above, GT handles correctly gouvernement, politique, économique, ostraciser, référendum (what 10-year-old really knows such words?!). In contrast, it fails to translate moeurs, contexte and vole; misses a typo in intéger; and mishandles the idiomaticity of pipé, among other things.

The reason that GT can handle “the big words”, which would get you a lot of points in Scrabble, but not the “small words” is directly related to length. Longer words tend to be newer and less frequent in the language (as words “shrivel” with age and use). Such words are also less likely to be polysemous (i.e. have multiple meanings) or be part of homonym or homograph sets (homonyms are words that are pronounced the same but mean different things, e.g. mussel and muscle; homographs are spelled the same, but are pronounced and interpreted differently, e.g. the verb tear and the noun tear). Statistically speaking, the longer the word, the less likely it is to coincide in pronunciation (or in spelling) with another word. And, of course, the fewer different meanings, the easier it is to “translate”.

This is illustrated beautifully with the verb voler (vole in the above French passage). As a transitive verb, it means ‘to steal, to rob’ and as an intransitive verb, it means ‘to fly’. Any human — including a 10-year-old and even younger children — will be able to choose the correct translation because humans process the structure of the sentence. We just can’t help it; GT just can’t do it. In our passage, the subject of vole is the relative pronoun qui and the object is a pronominal clitic les (which appears before the verb, as pronominal clitics are known to do in French). Thus, this verb here is unmistakably ‘to rob’, not ‘to fly’.

This also goes to show that, contrary to Mr. Kingsley’s claim that “there are more exceptions, qualifications, and ambiguities than rules and laws to follow”, ambiguities are often subject to rules too, even if these rules are more subtle than Mr. Kingsley would like.

Let’s note also that “the big words” are less likely to be grammatically irregular than “the small words”, which again — given the inability of GT to handle grammar — makes “the big words” easier to translate. Hence, the future tense form of the verb rester ‘to remain’ (restera in the passage) is translated by GT properly whereas the future tense form of the verb répondre ‘to respond’ (répondra in the passage) is not.

Furthermore, GT fails to analyze and/or render the syntactic structure of the original passage. It mishandles a direct question in the first sentence; two instances of coordinate structures in the second sentence; two instances of transitive structures (treated by GT as intransitives), also in the second sentence; another instance of a direct question in the fourth sentence; a complex analytical tense, also in the fourth sentence; and a participial modifier at the very end. In fact, only the third sentence Ce département restera un poids pour notre pays is translated correctly or with anything resembling a grammatical English.

In contrast to GT, children handle grammar even if they don’t understand some of “the big words”. All the aspects of grammar that GT fails to “understand” — questions, coordinations, modification structures, verb-argument structures — are easily understood by 10-year-olds and even younger children. In fact, an average 5-year-old can do better than mistake a transitive verb for an intransitive one, even if the child does not know what the meanings (transitive or intransitive) are. Moreover, there is clear evidence for the so-called syntactic bootstrapping: children learn the meaning of new words that they don’t yet know by working them out from their syntactic context, sort of like we process Lewis Carroll’s famous lines:

`Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

In other words, although children acquiring their native tongue do not (yet) know the terminology like “subjects, nouns, verbs”, they do — contrary to Mr. Kingsley — “deconstruct sentence structure” and figure out the patterns behind those structures.

So “will Google’s computers understand language better than humans?” — hardly, if they don’t even attempt to understand language, but simply to find the statistically best matches.


Like this post? Please pass it on:


Previous Post
«
| Next Post
»

Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:

      

  • Ran

    That GT example reminds me less of a ten-year-old than of the word salad sometimes seen in schizophrenia. But I suppose "already speaks 57 languages as well as some people with severe mental illness" doesn't have the same ring to it!

  • Margaret

    Very interesting. GT's translations remind me more of an adult language learner than a child learning his or her native language. The big words often have enough similarity to a word or root in a known language, but the little words–which include the troublesome verbs that must be conjugated and tricky prepositions are crucial to meaning. The entire gist of a sentence can shift on the weight of a mere "little word," and I have even seen GT fail to include a negation in its so-called translation so that "I do not believe" becomes "I believe" !!

  • Asya Pereltsvaig

    @Ran: Good point! Thanks!

  • Asya Pereltsvaig

    @Margaret: Good point, I agree! Thank you!

  • Anonymous

    Very interesting post. As always, when it comes to Google Translate, we (translator) all agree on that.

    Nowadays, it is important to let people know how bad it can be and, certainly, posting this type of blogs can be helpful. It is central for clients to rely on a good and professional translation agency whenever they need some work to be done.

    The hard thing to do, it to convince them about it!

  • Asya Pereltsvaig

    @Anonymous: Thank you for your comment and for the link to your agency. If I am going to introduce advertizing on my blog, will you consider advertizing here?

  • shivakumar

    Let's face it, what Google is trying to do with Translate is to take over a whole field of global business while filling it's data vaults even more.
    So yeah, machine translators are ok when all u need is a quick understanding of a some rather simple text and the BIG words in it. Using anything but a professional translator or translation agency is still a sure way to run your cross-border business endeavour into the ground.

  • Asya Pereltsvaig

    @shivakumar: Thank you for your comment and the link!

  • Pingback: Google Translates Gender | Computational Linguistics | Languages Of The World

  • Christine

    Google Translate has developed into something quite useful over the years, and is now making heavy use of human input as well to perfect the results, but Google is still miles away from providing professional translations. For that you will have to turn to human translation professionals, and I do not see that change in the next couple of years. Far from that, I believe translation and localization will become an even more popular field over the course of the next decade or so…. and it will not be dominated by machine translation.

    That said… it is good to know that our friend Google is there whenever all we need is basic understanding. I do not speak Chinese at all, for instance, but thanks to Google Translate I occasionally enjoy reading a blog post in Chinese. THAT is a wonderful invention indeed….

    • http://www.pereltsvaig.com Asya Pereltsvaig

      In some cases it might be useful, but how do you know that the “translation” you get is adequate?

  • http://www.pereltsvaig.com Asya Pereltsvaig

    Thanks!