Motherboard reported on users on Reddit who had discovered typing a series of nonsense letters into Google translate (or even just the word “dog” over and over) spat out a translation that seemed like some sort of religious prophesy. These “translations” spawned an entire Reddit of people hunting for messages — many of them creepy, several of them probably faked.
People like seeing the creepy side of the algorithm and using it as an “end is nigh” signal that somehow the machine means us harm. To combat the negativity, I’ve done my best to generate positive affirmations using nonsense text translated from Somali.
Sean Colbath, a senior scientist at BBN Technologies, told Motherboard that it might be the translator’s algorithm trying to find order in the chaos, saying; “If they tried to build a model out of that stuff, it may be that the model simply throws a hail-mary pass (pun semi-intended) and barfs out a piece of its training,” while emphasizing that he was speaking for himself rather than for his employer.
The interesting thing is how many of the chunks of “TranslateGate” translation seem to be pulled from documents online — some are even speculating that it might have been pulled from private messages, though Google has denied this. From Reddit;
The data theory: this theory proposes that the AI used in Google translate is taking the training data and repeating it when asked to translate the nonsense imputed
The AI theory: the AI theory says that the gibberish put in can be a broken up question or stament to ask an AI using obscure languages as a backdoor
The crawler theory: first proposed by u/Muhnamajeff, this theory explains that Google uses various little “robots” to gather website data for search results, translate may be using crawlers to gather translation data from website HTML and text
EDIT 12/25/17 — u/Muhnamajeff has proven that this could indeed be coming from private messages too. Google has yet to even acknowledge this HUGE breach in privacy. Here’s the post. https://www.reddit.com/r/TranslateGate/comments/7m4yt4/i_went_translating_for_a_bit_found_some_weird/?utm_source=reddit-android
EDIT 12/25/17 — Google has acknowledged the bug and is sure that it is not coming from private messages. you can read more about this here
Google Translate uses a method called “statistical machine translation.” Statistical machine translation is a method for interpreting languages by gathering as much text they can that is parallel between two languages. Translate then looks for patterns in those languages to find a way to translate something similar in the future. For this purpose, Google uses works that have been translated into as many languages as possible, like UN Documents and the Bible.
Some country’s bilingual texts are so few there’s not much to pull from — so when you input nonsense into the machine, the machine gives you back its best guess at what you’re statistically trying to say. Google is also working implementing Neural Machine Translation methods that can help with the accuracy of the language translation process. In the end, this isn’t really a “glitch” per se, but the machine trying to parse nonsense using all the nonsense that has come before.