Nearly missed amid all the talk about the impending controversy about Google’s plans to digitise millions of books, possibly violating authors’ copyright in the process, was the announcement last week by Google that its instant translation function is to be available to users of its Gmail service. The idea is that if you are using Gmail you can instantly get a translation of an email into any of 41 languages at the click of a mouse button. Sounds pretty good doesn’t it? The horrible thing is that it is actually rather good. Google’s instant translation technology is based on statistical machine translation which, rather than using rigid rules to define how sentences should be translated, performs statistical analyses on large corpora or collections of natural text to tell it how to translate. The result is better translations and fewer mistakes.
I’ve used Google for various things over the past year or so: in my translation technology classes to demonstrate how far MT has come over the decades, to quickly decipher websites in languages I don’t speak and even to book hotels by email and I’ve been impressed by some pretty decent quality translations even though it still gets some things spectacularly wrong or simply doesn’t translate them at all. Google is the first to admit that its system isn’t perfect but that at the very least, users will be able to get the gist of a text. Fair enough. At least they’re honest and realistic about the capabilities and limitations of their product. I am slightly worried, however, about the possibility that over time people will settle for “pretty decent” and that they won’t demand high-quality translations. Obviously nobody in their right mind would dream of using a machine translation for important texts, but if clunky, unidiomatic and incomplete translations become the norm for the small things, we might become blind to these foibles and start to consider MT for important things? Just look at how the “text speak” used in SMS messages has made its way into normal writing. Could mangled, machine translated language – Google-ese if you will – eventually become accepted as “proper” language usage? So while I’m all in favour of the advances in machine translation, both from a linguist’s point of view and from a nerd’s point of view, maybe it should come with a health warning against overuse.