Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Machine translation

Machine translation

Overview
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....

, machine-aided human translation MAHT and interactive translation
Interactive machine translation
Interactive Machine Translation , is a specific sub-field ofcomputer-aided translation. Under this translation paradigm, thecomputer software that assists the human translator attempts to predict the...

) is a sub-field of computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

 that investigates the use of computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 to translate
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 text or speech from one natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

 to another.
Discussion
Ask a question about 'Machine translation'
Start a new discussion about 'Machine translation'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Recent Discussions
Encyclopedia
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....

, machine-aided human translation MAHT and interactive translation
Interactive machine translation
Interactive Machine Translation , is a specific sub-field ofcomputer-aided translation. Under this translation paradigm, thecomputer software that assists the human translator attempts to predict the...

) is a sub-field of computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

 that investigates the use of computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 to translate
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

 text or speech from one natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

 to another.

On a basic level, MT performs simple substitution
Substitution
Substitution may refer to:- Sciences :* Substitution , a syntactic transformation on strings of symbols of a formal language* Substitution of variables* Substitution cipher, a method of encryption...

 of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...

 and statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology
Linguistic typology
Linguistic typology is a subfield of linguistics that studies and classifies languages according to their structural features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages...

, translation of idiom
Idiom
Idiom is an expression, word, or phrase that has a figurative meaning that is comprehended in regard to a common use of that expression that is separate from the literal meaning or definition of the words of which it is made...

s, and the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession
Profession
A profession is a vocation founded upon specialized educational training, the purpose of which is to supply disinterested counsel and service to others, for a direct and definite compensation, wholly apart from expectation of other business gain....

 (such as weather reports
Meteorology
Meteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw breakthroughs occur after observing networks developed across several countries...

), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified
Word sense disambiguation
In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...

 which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).

The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality. Some critics claim that there are in-principle obstacles to automatizing the translation process.

History



The idea of machine translation may be traced back to the 17th century. In 1629, René Descartes
René Descartes
René Descartes ; was a French philosopher and writer who spent most of his adult life in the Dutch Republic. He has been dubbed the 'Father of Modern Philosophy', and much subsequent Western philosophy is a response to his writings, which are studied closely to this day...

 proposed a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, The Georgetown experiment
Georgetown-IBM experiment
The Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into...

 (1954) involved fully automatic translation of over sixty Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

 sentences into English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.

Real progress was much slower, however, and after the ALPAC report
ALPAC
ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular...

 (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as computation
Computation
Computation is defined as any type of calculation. Also defined as use of computer technology in Information processing.Computation is a process following a well-defined model understood and expressed in an algorithm, protocol, network topology, etc...

al power increased and became less expensive, more interest was shown in statistical models for machine translation
Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

.

The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth
Andrew Donald Booth
Andrew Donald Booth was a British electrical engineer, physicist and computer scientist who led the invention of the magnetic drum memory for computers and invented Booth's multiplication algorithm....

 and possibly others. Warren Weaver
Warren Weaver
Warren Weaver was an American scientist, mathematician, and science administrator...

 wrote an important memorandum "Translation" in 1949. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC
APEXC
The APEC, or All Purpose Electronic Computer series was designed by Andrew Donald Booth at Birkbeck College, London in the early 1950s. His work on the APEC series was sponsored by the British Rayon Research Association. Although the naming conventions are slightly unclear, it seems the first...

 machine at Birkbeck College
Birkbeck, University of London
Birkbeck, University of London is a public research university located in London, United Kingdom and a constituent college of the federal University of London. It offers many Master's and Bachelor's degree programmes that can be studied either part-time or full-time, though nearly all teaching is...

 (University of London
University of London
-20th century:Shortly after 6 Burlington Gardens was vacated, the University went through a period of rapid expansion. Bedford College, Royal Holloway and the London School of Economics all joined in 1900, Regent's Park College, which had affiliated in 1841 became an official divinity school of the...

) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World
Wireless World
Wireless World was the pre-eminent British magazine for radio and electronics enthusiasts. It was one of the very few "informal" journals which were tolerated as a professional expense.- History :...

, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille
Braille
The Braille system is a method that is widely used by blind people to read and write, and was the first digital form of writing.Braille was devised in 1825 by Louis Braille, a blind Frenchman. Each Braille character, or cell, is made up of six dot positions, arranged in a rectangle containing two...

 texts by computer.

Translation process



The human translation process may be described as:
  1. Decoding
    Decoding
    Decoding is the reverse of encoding, which is the process of transforming information from one format into another. Information about decoding can be found in the following:* Digital-to-analog converter, the use of analog circuit for decoding operations...

     the meaning of the source text
    Source text
    A source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...

    ; and
  2. Re-encoding this meaning in the target language
    Target language
    Target language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...

    .


Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text
Source text
A source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...

 in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...

, semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

, syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

, idiom
Idiom
Idiom is an expression, word, or phrase that has a figurative meaning that is comprehended in regard to a common use of that expression that is separate from the literal meaning or definition of the words of which it is made...

s, etc., of the source language, as well as the culture
Culture
Culture is a term that has many different inter-related meanings. For example, in 1952, Alfred Kroeber and Clyde Kluckhohn compiled a list of 164 definitions of "culture" in Culture: A Critical Review of Concepts and Definitions...

 of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language
Target language
Target language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...

.

Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language
Target language
Target language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...

 that "sounds" as if it has been written by a person.

This problem may be approached in a number of ways.

Approaches



Machine translation can use a method based on linguistic rules
Expert system
In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...

, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.

It is often argued that the success of machine translation requires the problem of natural language understanding
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 to be solved first.

Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation
Interlingual machine translation
Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the...

 or transfer-based machine translation
Transfer-based machine translation
Transfer-based machine translation is a type of machine translation. It is based on the idea of interlingua and is currently one of the most widely used methods of machine translation-Overview:...

. These methods require extensive lexicon
Lexicon
In linguistics, the lexicon of a language is its vocabulary, including its words and expressions. A lexicon is also a synonym of the word thesaurus. More formally, it is a language's inventory of lexemes. Coined in English 1603, the word "lexicon" derives from the Greek "λεξικόν" , neut...

s with morphological
Morphology (linguistics)
In linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...

, syntactic
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

, and semantic
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 information, and large sets of rules.

Given enough data, machine translation programs often work well enough for a native speaker
Native Speaker
Native Speaker is Chang-Rae Lee’s first novel. In Native Speaker, he creates a man named Henry Park who tries to assimilate into American society and become a “native speaker.”-Plot summary:...

 of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.

Rule-based


The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.
Transfer-based machine translation
Interlingual
Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the interlingua
Interlinguistics
Interlinguistics is the study of various aspects of linguistic communication between people who cannot make themselves understood by means of their different first languages...

.

Dictionary-based
Machine translation can use a method based on dictionary
Dictionary
A dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...

 entries, which means that the words will be translated as they are by a dictionary.

Statistical


Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament
European Parliament
The European Parliament is the directly elected parliamentary institution of the European Union . Together with the Council of the European Union and the Commission, it exercises the legislative function of the EU and it has been described as one of the most powerful legislatures in the world...

. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE
Candide
Candide, ou l'Optimisme is a French satire first published in 1759 by Voltaire, a philosopher of the Age of Enlightenment. The novella has been widely translated, with English versions titled Candide: or, All for the Best ; Candide: or, The Optimist ; and Candide: or, Optimism...

 from IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

. Google used SYSTRAN
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....

 for several years, but switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting
approximately 200 billion words from United Nations
United Nations
The United Nations is an international organization whose stated aims are facilitating cooperation in international law, international security, economic development, social progress, human rights, and achievement of world peace...

 materials to train their system. Accuracy of the translation has improved.

Example-based


Example-based machine translation (EBMT) approach was proposed by Makoto Nagao
Makoto Nagao
is a Japanese computer scientist. He contributed to various fields: machine translation, natural language processing, pattern recognition, image processing and library science...

 in 1984. It is often characterised by its use of a bilingual corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 as its main knowledge base, at run-time. It is essentially a translation by analogy
Analogy
Analogy is a cognitive process of transferring information or meaning from a particular subject to another particular subject , and a linguistic expression corresponding to such a process...

 and can be viewed as an implementation of case-based reasoning
Case-based reasoning
Case-based reasoning , broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning...

 approach of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

.

Hybrid MT


Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. Several MT companies (Asia Online
Asia Online
Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...

, LinguaSys, Systran
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....

, PangeaMT, UPV
Polytechnic University of Valencia
The Polytechnic University of Valencia is a Spanish university located in Valencia, with a focus on science and technology. It was founded in 1968 as the Higher Polytechnic School of Valencia and became a university in 1971, but some of its schools are more than 100 years old.- Characteristics...

) are claiming to have a hybrid approach using both rules and statistics. The approaches differ in a number of ways:
  • Rules post-processed by statistics: Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.
  • Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating.

Disambiguation



Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel
Yehoshua Bar-Hillel
Yehoshua Bar-Hillel was an Israeli philosopher, mathematician, and linguist at the Hebrew University of Jerusalem, best known for his pioneering work in machine translation and formal linguistics.- Biography :...

. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.

The late Claude Piron
Claude Piron
Claude Piron was a psychologist and a translator for the United Nations from 1956 to 1961....

, a long-time translator for the United Nations
United Nations
The United Nations is an international organization whose stated aims are facilitating cooperation in international law, international security, economic development, social progress, human rights, and achievement of world peace...

 and the World Health Organization
World Health Organization
The World Health Organization is a specialized agency of the United Nations that acts as a coordinating authority on international public health. Established on 7 April 1948, with headquarters in Geneva, Switzerland, the agency inherited the mandate and resources of its predecessor, the Health...

, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

 in the source text
Source text
A source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...

, which the grammatical and lexical exigencies of the target language
Target language
Target language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...

 require to be resolved:
Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.


The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI
Ai
AI, A.I., Ai, or ai may refer to:- Computers :* Artificial intelligence, a branch of computer science* Ad impression, in online advertising* .ai, the ISO Internet 2-letter country code for Anguilla...

 than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

Named entities



Related to named entity recognition
Named entity recognition
Named-entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.Most research on NER...

 in information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

.

Applications


There are now many software programs for translating natural language, several of them online
ONLINE
ONLINE is a magazine for information systems first published in 1977. The publisher Online, Inc. was founded the year before. In May 2002, Information Today, Inc. acquired the assets of Online Inc....

, such as:
  • Anusaaraka
    Anusaaraka
    Anusaaraka is an English to Indian language accessing software, which employs algorithms derived from Panini's Ashtadhyayi . The software is being developed by the Chinmaya International Foundation at the International Institute of Information Technology, Hyderabad and the...

     A free open source machine translation from English to Hindi based on Panini grammar and uses state of the art NLP tools. Can be used online and downloaded from http://anusaaraka.iiit.ac.in
  • Apertium
    Apertium
    Apertium is a rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.-History:...

    , a free and open source machine translation platform (WinXLator gives this a Windows GUI, but it is likely to be in violation of the Apertium GPL license)
  • AppTek
    Apptek
    Applications Technology is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.AppTek's primary focus is on machine translation and automatic speech recognition...

    , which released a hybrid MT system in 2009.
  • Arabic machine translation in multilingual framework.
  • Asia Online
    Asia Online
    Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...

    http://www.asiaonline.net provides a custom machine translation engine building capability that they claim gives near-human quality compared to the "gist" based quality of free online engines. Asia Online
    Asia Online
    Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...

     also provides tools to edit and create custom machine translation engines with their Language Studio suite of products.
  • Bing Translator http://www.microsofttranslator.com/ a free online translator from Microsoft
    Microsoft
    Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

    .
  • Cunei http://www.cunei.org/ An open-source platform for data-driven machine translation released under the MIT license
    MIT License
    The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...

    . Platform-independent Java code with both a command-line and graphical interface.
  • DocTranslator A web service which uses the Google Translate API to automatically translate and return Office document files (Word, Excel, PowerPoint, PDF) while preserving the original document layouts.
  • English to Punjabi Translationhttp://h2p.learnpunjabi.org/eng2pun.aspx Web based English to Punjabi Machine Translation System.
  • Google Translate
    Google Translate
    Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...

     A free online translator from Google
    Google
    Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

    .
  • Google Translator Toolkit
    Google translator toolkit
    Google Translator Toolkit is a web service designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and translation memories...

     A web service designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and translation memories.
  • Hindi to Punjabi Machine Translation System
    Hindi to Punjabi Machine Translation System
    Hindi to Punjabi machine translation system, developed at Punjabi University, Patiala by Gurpreet Singh Lehal and Dr. Vishal Goyal, is aimed to translate Hindi text into Punjabi text. It is based on the direct approach...

    http://h2p.learnpunjabi.org, provides machine translation using a direct approach. It translates Hindi into Punjabi. It also features writing e-mail in the Hindi language and sending the same in Punjabi to the recipient.
  • Punjabi to Hindi Machine Translation Systemhttp://www.jgmatrix.com, provides machine translation using a direct approach. It translates Punjabi to Hindi. It also features converting any website in Punjabi to Hindi on the fly. The Punjabi Website must be in Unicode.
  • IdiomaX
    IdiomaX
    IdiomaX LLC is a translation software company that has been offering translation products and services for the international market since 1996.- History :IdiomaX was established in 1996...

    , which powers online translation services at idiomax.com
  • localization tools
    Internationalization and localization
    In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...

    , such and Alchemy Catalyst
    Alchemy Catalyst
    Alchemy CATALYST is a software Internationalization and localization suite which is developed by Alchemy Software Development Limited.-History:...

     and Multilizer.
  • Jibbigo
    Jibbigo
    Jibbigo is a mobile language translation application that was developed by Mobile Technologies, LLC and Dr. Alex Waibel, a professor at Carnegie Mellon. Jibbigo is an offline voice translator, and does...

     http://www.Jibbigo.com sells a bidirectional, offline, speech-to-speech translation app for Apple's App Store and the Android Market
    Android Market
    Android Market is an online software store developed by Google for Android OS devices. Its gateway is an application program called "Market", preinstalled on most Android devices, allows users to browse and download mobile apps published by third-party developers...

    .
  • Kilgray memoQ A translation environment provided for human translators. http://kilgray.com/products/memoq
  • LetsMT! http://www.letsmt.com Cloud-based platform for generation of custom MT engines from user provided data. Powered by Moses
    Moses (machine translation)
    Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...

    .
  • LinguaSys http://www.linguasys.net provides highly customized hybrid machine translation that can go from any language to any language.
  • Lucy Software http://www.lucysoftware.com Translates in several European languages.
  • Moses
    Moses (machine translation)
    Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...

     A free software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

     statistical machine translation engine released under the LGPL license for Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

     and Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

    .
  • NeuroTran is a software translator that translates books, web pages, documents, e-mails, faxes, memos, manuals, reports, spreadsheets, correspondence, letters and more to and from many languages. For Windows and Macintosh.
  • Power Translator
  • Promt
    PROMT
    PROMT is a Russian company focused upon the development of machine translation systems. At the moment PROMT translators exist for English, German, French, Spanish, Italian, Portuguese and Russian languages...

    , which powers online translation services at Voila.fr and Orange.fr
  • SDL ETS and SDL Language Weaver
    Language Weaver
    SDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as...

     which power FreeTranslation.com (website)
  • SDL Passolo
    SDL Passolo
    SDL Passolo is an award-winning specialised visual software localisation tool developed to enable the translation of user interfaces. They currently have a newly released 2009 version.-History:...

  • SiShiTra — A hybrid machine translation engine for Spanish-Catalan translation.
  • SYSTRAN
    SYSTRAN
    SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....

    , which powers Yahoo! Babel Fish
  • Ta with you http://www.tauyou.com is specialized in customized machine translation solutions in any language. Their web-based user interface makes it easy for any Language Service Provider to generate any combination of domain and language pair to achieve the best quality. Their solution works with almost human quality for a wide variety of language pairs.
  • Tilde Translator http://translate.tilde.com Free online translator for Latvian language. Provides also free apps for Android and iOS.
  • Toggletext uses a transfer-based system (known as Kataku) to translate between English
    English language
    English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

     and Indonesian
    Indonesian language
    Indonesian is the official language of Indonesia. Indonesian is a normative form of the Riau Islands dialect of Malay, an Austronesian language which has been used as a lingua franca in the Indonesian archipelago for centuries....

    .
  • Translate and Back http://trans121.com A free online round-trip machine translation tool, which enables checking correctness by back translation. Contains virtual keyboards and human voice. Suitable for right to left languages, as well.
  • Yandex
    Yandex
    Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine...

     http://translate.yandex.ru/ translates between English
    English language
    English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

     and Russian
    Russian language
    Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

     and Ukrainian
    Ukrainian language
    Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....

    .
  • Worldlingo provides machine translation using both statistical based TE's and rule based TE's. Most recognizable as the MT partner in Microsoft Windows and Microsoft Mac Office
    Microsoft Office 2008 for Mac
    Microsoft Office 2008 for Mac is a version of the Microsoft Office productivity suite for Mac OS X. It supersedes Office 2004 for Mac and is the Mac OS X equivalent of Office 2007. Office 2008 was developed by Microsoft's Macintosh Business Unit and released on January 15, 2008...

    .
  • Yahoo! Babel Fish, powered by SYSTRAN
    SYSTRAN
    SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....



A number of translation software programs are available free of charge, e.g. ForeignDesk, the multiplatform Okapi Framework
Okapi Framework
The Okapi Framework is a cross-platform and open-source set of components and applications that offer extensive support for localizing and translating documentation and software.- Architecture :The Okapi Framework is organized around the following parts:...

, GTS Website Translator and OmegaT+.

While no system provides the holy grail of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output. The quality of machine translation is substantially improved if the domain is restricted and controlled.

Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission
European Commission
The European Commission is the executive body of the European Union. The body is responsible for proposing legislation, implementing decisions, upholding the Union's treaties and the general day-to-day running of the Union....

. The MOLTO project, for example, coordinated by the University of Gothenburg, received more than 2.375 million euros project support from the EU to create a reliable translation tool that covers a majority of the EU languages.http://www.molto-project.eu/

Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 has claimed that promising results were obtained using a proprietary statistical machine translation engine. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English had an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.

With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. In-Q-Tel (a venture capital
Venture capital
Venture capital is financial capital provided to early-stage, high-potential, high risk, growth startup companies. The venture capital fund makes money by owning equity in the companies it invests in, which usually have a novel technology or business model in high technology industries, such as...

 fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver
Language Weaver
SDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as...

. Currently the military community is interested in translation and processing of languages like Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...

, Pashto
Pashto language
Pashto , known as Afghani in Persian and Pathani in Punjabi , is the native language of the indigenous Pashtun people or Afghan people who are found primarily between an area south of the Amu Darya in Afghanistan and...

, and Dari
Dari (Eastern Persian)
Dari or Fārsī-ye Darī in historical terms refers to the Persian court language of the Sassanids. In contemporary usage, the term refers to the dialects of modern Persian language spoken in Afghanistan, and hence known as Afghan Persian in some Western sources. It is the term officially recognized...

. The Information Processing Technology Office in DARPA hosts programs like TIDES
DARPA TIDES program
TIDES is an ambitious technology development effort, funded by DARPA. It stands for Translingual Information Detection, Extraction and Summarization. It is focused on the automated processing and understanding of a variety of human language data...

 and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology.

The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or instant messaging
Instant messaging
Instant Messaging is a form of real-time direct text-based chatting communication in push mode between two or more people using personal computers or other devices, along with shared clients. The user's text is conveyed over a network, such as the Internet...

 clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other. Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation
Mobile translation
Mobile translation is a machine translation service for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. It relies on computer programming in the sphere of computational linguistics and the device's communication means to work...

 tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.

Evaluation



Machine translation systems and output can be evaluated along numerous dimensions. The intended use of the translation, characteristics of the MT software, the nature of the translation process, etc., all affect how one evaluates MT systems and their output. The FEMTI taxonomy of dimensions, with associated evaluation metrics, appears at http://www.issco.unige.ch:8080/cocoon/femti/st-home.html .

There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. Automated means of evaluation include BLEU
Bilingual evaluation understudy
BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,...

, NIST
NIST (metric)
NIST is a method for evaluating the quality of text which has been translated using machine translation. Its name comes from the US National Institute of Standards and Technology....

 and METEOR
METEOR
METEOR is a metric for the evaluation of machine translation output. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision...

.

Relying exclusively on unedited machine translation ignores the fact that communication in human language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

 is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human. The late Claude Piron
Claude Piron
Claude Piron was a psychologist and a translator for the United Nations from 1956 to 1961....

 wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

 in the source text
Source text
A source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...

, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.

In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation
Dictionary-based machine translation
Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation...

 system has produced satisfactory translations that require no human intervention save for quality inspection.

See also

  • Comparison of machine translation applications
    Comparison of machine translation applications
    A machine translation application is a program which can translate text or speech from one natural language to another. Machine translation applications are essential to the modern language industry...

  • Statistical machine translation
    Statistical machine translation
    Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...

  • Artificial Intelligence
    Artificial intelligence
    Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...

  • Cache language model
    Cache language model
    A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution...

  • Computational linguistics
    Computational linguistics
    Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

  • Universal Networking Language
    Universal Networking Language
    Universal Networking Language is a declarative formal language specifically designed to represent semantic data extracted from natural language texts...

  • Computer-assisted translation
    Computer-assisted translation
    Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....

     and Translation memory
    Translation memory
    A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

  • Controlled natural language
    Controlled natural language
    Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity.Traditionally, controlled languages fall into two major types:...

  • Fuzzy matching
    Fuzzy matching
    Fuzzy matching is a technique used in computer-assisted translation and some other information technology applications such as record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous...

  • Postediting
    Postediting
    Postediting “is the process of improving a machine-generated translation with a minimum of manual labour”. A person who postedits is called a posteditor. The concept of postediting is linked to that of pre-editing...

  • History of machine translation
    History of machine translation
    The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of...

  • Human Language Technology
  • Language barrier
    Language barrier
    Language barrier is a figurative phrase used primarily to indicate the difficulties faced when people who have no language in common attempt to communicate with each other...

  • List of emerging technologies
  • List of research laboratories for machine translation
  • Pseudo-translation
  • Translation
  • Translation memory
    Translation memory
    A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...

  • Universal translator
    Universal translator
    A universal translator is a device common to many science fiction works, especially on television. First described in Murray Leinster's 1945 novella "First Contact", the translator's purpose is to offer an instant translation of any language...

  • Phraselator
    Phraselator
    The Phraselator is a weatherproof handheld language translation device developed by VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, MD.-The device:...

  • Mobile translation
    Mobile translation
    Mobile translation is a machine translation service for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. It relies on computer programming in the sphere of computational linguistics and the device's communication means to work...


External links