Technology Review (09/25/13)
Google researchers have developed a technique that uses vector space mathematics for language conversion. Instead of using versions of the same document in different languages, the technique relies on data mining to model the structure of a single language and then compares this to another language's structure. Their method is based on the notion that all languages have to describe a similar set of ideas, requiring similar words to accomplish this. The researchers determined a way to represent a language using the relationship between its words. The set of all the relationships, or the language space, can be visualized as a set of vectors pointing from one word to another; linguists recently have found that these vectors can be approached mathematically. Converting one language into another becomes a mathematical task of determining the transformation that converts one vector space into the other. To map the vector spaces, the researchers use a small bilingual dictionary developed by humans that compares the same body of words in two languages, to lay the groundwork for the linear transformation. The mapping can then be applied to larger language spaces. The researchers note that although their method is simple, it achieves almost 90-percent precision for English and Spanish translation, and is equally effective with less closely-related languages, such as English and Vietnamese.