Machine translation, sometimes referred to by the abbreviation
MT (not to be confused with
computer-aided translationComputer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
,
machine-aided human translation MAHT and
interactive translationInteractive Machine Translation , is a specific sub-field ofcomputer-aided translation. Under this translation paradigm, thecomputer software that assists the human translator attempts to predict the...
) is a sub-field of
computational linguisticsComputational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
that investigates the use of
computer softwareComputer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....
to
translateTranslation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
text or speech from one
natural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
to another.
On a basic level, MT performs simple
substitutionSubstitution may refer to:- Sciences :* Substitution , a syntactic transformation on strings of symbols of a formal language* Substitution of variables* Substitution cipher, a method of encryption...
of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with
corpusCorpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
and
statisticalStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
techniques is a rapidly growing field that is leading to better translations, handling differences in
linguistic typologyLinguistic typology is a subfield of linguistics that studies and classifies languages according to their structural features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages...
, translation of
idiomIdiom is an expression, word, or phrase that has a figurative meaning that is comprehended in regard to a common use of that expression that is separate from the literal meaning or definition of the words of which it is made...
s, and the isolation of anomalies.
Current machine translation software often allows for customisation by domain or
professionA profession is a vocation founded upon specialized educational training, the purpose of which is to supply disinterested counsel and service to others, for a direct and definite compensation, wholly apart from expectation of other business gain....
(such as
weather reportsMeteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw breakthroughs occur after observing networks developed across several countries...
), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.
Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has
unambiguously identifiedIn computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...
which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).
The progress and potential of machine translation has been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality. Some critics claim that there are in-principle obstacles to automatizing the translation process.
History
The idea of machine translation may be traced back to the 17th century. In 1629,
René DescartesRené Descartes ; was a French philosopher and writer who spent most of his adult life in the Dutch Republic. He has been dubbed the 'Father of Modern Philosophy', and much subsequent Western philosophy is a response to his writings, which are studied closely to this day...
proposed a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, The
Georgetown experimentThe Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into...
(1954) involved fully automatic translation of over sixty
RussianRussian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
sentences into
EnglishEnglish is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.
Real progress was much slower, however, and after the
ALPAC reportALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular...
(1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as
computationComputation is defined as any type of calculation. Also defined as use of computer technology in Information processing.Computation is a process following a well-defined model understood and expressed in an algorithm, protocol, network topology, etc...
al power increased and became less expensive, more interest was shown in
statistical models for machine translationStatistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
.
The idea of using digital computers for translation of natural languages was proposed as early as 1946 by
A. D. BoothAndrew Donald Booth was a British electrical engineer, physicist and computer scientist who led the invention of the magnetic drum memory for computers and invented Booth's multiplication algorithm....
and possibly others.
Warren WeaverWarren Weaver was an American scientist, mathematician, and science administrator...
wrote an important memorandum "Translation" in 1949. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the
APEXCThe APEC, or All Purpose Electronic Computer series was designed by Andrew Donald Booth at Birkbeck College, London in the early 1950s. His work on the APEC series was sponsored by the British Rayon Research Association. Although the naming conventions are slightly unclear, it seems the first...
machine at
Birkbeck CollegeBirkbeck, University of London is a public research university located in London, United Kingdom and a constituent college of the federal University of London. It offers many Master's and Bachelor's degree programmes that can be studied either part-time or full-time, though nearly all teaching is...
(
University of London-20th century:Shortly after 6 Burlington Gardens was vacated, the University went through a period of rapid expansion. Bedford College, Royal Holloway and the London School of Economics all joined in 1900, Regent's Park College, which had affiliated in 1841 became an official divinity school of the...
) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example
Wireless WorldWireless World was the pre-eminent British magazine for radio and electronics enthusiasts. It was one of the very few "informal" journals which were tolerated as a professional expense.- History :...
, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing
BrailleThe Braille system is a method that is widely used by blind people to read and write, and was the first digital form of writing.Braille was devised in 1825 by Louis Braille, a blind Frenchman. Each Braille character, or cell, is made up of six dot positions, arranged in a rectangle containing two...
texts by computer.
Translation process
The human translation process may be described as:
- Decoding
Decoding is the reverse of encoding, which is the process of transforming information from one format into another. Information about decoding can be found in the following:* Digital-to-analog converter, the use of analog circuit for decoding operations...
the meaning of the source textA source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...
; and
- Re-encoding this meaning in the target language
Target language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...
.
Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the
source textA source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...
in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the
grammarIn linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
,
semanticsSemantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
,
syntaxIn linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
,
idiomIdiom is an expression, word, or phrase that has a figurative meaning that is comprehended in regard to a common use of that expression that is separate from the literal meaning or definition of the words of which it is made...
s, etc., of the source language, as well as the
cultureCulture is a term that has many different inter-related meanings. For example, in 1952, Alfred Kroeber and Clyde Kluckhohn compiled a list of 164 definitions of "culture" in Culture: A Critical Review of Concepts and Definitions...
of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the
target languageTarget language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...
.
Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the
target languageTarget language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...
that "sounds" as if it has been written by a person.
This problem may be approached in a number of ways.
Approaches
Machine translation can use a method based on
linguistic rulesIn artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...
, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.
It is often argued that the success of machine translation requires the problem of
natural language understandingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
to be solved first.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as
interlingual machine translationInterlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the...
or
transfer-based machine translationTransfer-based machine translation is a type of machine translation. It is based on the idea of interlingua and is currently one of the most widely used methods of machine translation-Overview:...
. These methods require extensive
lexiconIn linguistics, the lexicon of a language is its vocabulary, including its words and expressions. A lexicon is also a synonym of the word thesaurus. More formally, it is a language's inventory of lexemes. Coined in English 1603, the word "lexicon" derives from the Greek "λεξικόν" , neut...
s with
morphologicalIn linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...
,
syntacticIn linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
, and
semanticSemantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
information, and large sets of rules.
Given enough data, machine translation programs often work well enough for a
native speakerNative Speaker is Chang-Rae Lee’s first novel. In Native Speaker, he creates a man named Henry Park who tries to assimilate into American society and become a “native speaker.”-Plot summary:...
of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual
corpusIn linguistics, a corpus or text corpus is a large and structured set of texts...
of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.
Rule-based
The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.
Transfer-based machine translation
Interlingual
Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the
interlinguaInterlinguistics is the study of various aspects of linguistic communication between people who cannot make themselves understood by means of their different first languages...
.
Dictionary-based
Machine translation can use a method based on
dictionaryA dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
entries, which means that the words will be translated as they are by a dictionary.
Statistical
Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the
European ParliamentThe European Parliament is the directly elected parliamentary institution of the European Union . Together with the Council of the European Union and the Commission, it exercises the legislative function of the EU and it has been described as one of the most powerful legislatures in the world...
. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was
CANDIDECandide, ou l'Optimisme is a French satire first published in 1759 by Voltaire, a philosopher of the Age of Enlightenment. The novella has been widely translated, with English versions titled Candide: or, All for the Best ; Candide: or, The Optimist ; and Candide: or, Optimism...
from
IBMInternational Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
. Google used
SYSTRANSYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
for several years, but switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting
approximately 200 billion words from
United NationsThe United Nations is an international organization whose stated aims are facilitating cooperation in international law, international security, economic development, social progress, human rights, and achievement of world peace...
materials to train their system. Accuracy of the translation has improved.
Example-based
Example-based machine translation (EBMT) approach was proposed by
Makoto Nagaois a Japanese computer scientist. He contributed to various fields: machine translation, natural language processing, pattern recognition, image processing and library science...
in 1984. It is often characterised by its use of a bilingual
corpusIn linguistics, a corpus or text corpus is a large and structured set of texts...
as its main knowledge base, at run-time. It is essentially a translation by
analogyAnalogy is a cognitive process of transferring information or meaning from a particular subject to another particular subject , and a linguistic expression corresponding to such a process...
and can be viewed as an implementation of
case-based reasoningCase-based reasoning , broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning...
approach of
machine learningMachine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
.
Hybrid MT
Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. Several MT companies (
Asia OnlineAsia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...
, LinguaSys,
SystranSYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
, PangeaMT,
UPVThe Polytechnic University of Valencia is a Spanish university located in Valencia, with a focus on science and technology. It was founded in 1968 as the Higher Polytechnic School of Valencia and became a university in 1971, but some of its schools are more than 100 years old.- Characteristics...
) are claiming to have a hybrid approach using both rules and statistics. The approaches differ in a number of ways:
- Rules post-processed by statistics: Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.
- Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating.
Disambiguation
Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by
Yehoshua Bar-HillelYehoshua Bar-Hillel was an Israeli philosopher, mathematician, and linguist at the Hebrew University of Jerusalem, best known for his pioneering work in machine translation and formal linguistics.- Biography :...
. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.
Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.
The late
Claude PironClaude Piron was a psychologist and a translator for the United Nations from 1956 to 1961....
, a long-time translator for the
United NationsThe United Nations is an international organization whose stated aims are facilitating cooperation in international law, international security, economic development, social progress, human rights, and achievement of world peace...
and the
World Health OrganizationThe World Health Organization is a specialized agency of the United Nations that acts as a coordinating authority on international public health. Established on 7 April 1948, with headquarters in Geneva, Switzerland, the agency inherited the mandate and resources of its predecessor, the Health...
, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve
ambiguitiesAmbiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...
in the
source textA source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...
, which the grammatical and lexical exigencies of the
target languageTarget language may refer to:*Target language, in applied linguistics and language education, the language which a person is learning, also called second language*Target language, in translation, the language to which a source text is translated...
require to be resolved:
- Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.
The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of
AIAI, A.I., Ai, or ai may refer to:- Computers :* Artificial intelligence, a branch of computer science* Ad impression, in online advertising* .ai, the ISO Internet 2-letter country code for Anguilla...
than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.
Named entities
Related to
named entity recognitionNamed-entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.Most research on NER...
in
information extractionInformation extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
.
Applications
There are now many software programs for translating natural language, several of them
onlineONLINE is a magazine for information systems first published in 1977. The publisher Online, Inc. was founded the year before. In May 2002, Information Today, Inc. acquired the assets of Online Inc....
, such as:
- Anusaaraka
Anusaaraka is an English to Indian language accessing software, which employs algorithms derived from Panini's Ashtadhyayi . The software is being developed by the Chinmaya International Foundation at the International Institute of Information Technology, Hyderabad and the...
A free open source machine translation from English to Hindi based on Panini grammar and uses state of the art NLP tools. Can be used online and downloaded from http://anusaaraka.iiit.ac.in
- Apertium
Apertium is a rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.-History:...
, a free and open source machine translation platform (WinXLator gives this a Windows GUI, but it is likely to be in violation of the Apertium GPL license)
- AppTek
Applications Technology is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.AppTek's primary focus is on machine translation and automatic speech recognition...
, which released a hybrid MT system in 2009.
- Arabic machine translation in multilingual framework.
- Asia Online
Asia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...
http://www.asiaonline.net provides a custom machine translation engine building capability that they claim gives near-human quality compared to the "gist" based quality of free online engines. Asia OnlineAsia Online is a privately owned company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, and it has significant operations in Bangkok, Thailand, with R&D activities throughout Asia and expanding sales operations in Europe and North...
also provides tools to edit and create custom machine translation engines with their Language Studio suite of products.
- Bing Translator http://www.microsofttranslator.com/ a free online translator from Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
.
- Cunei http://www.cunei.org/ An open-source platform for data-driven machine translation released under the MIT license
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...
. Platform-independent Java code with both a command-line and graphical interface.
- DocTranslator A web service which uses the Google Translate API to automatically translate and return Office document files (Word, Excel, PowerPoint, PDF) while preserving the original document layouts.
- English to Punjabi Translationhttp://h2p.learnpunjabi.org/eng2pun.aspx Web based English to Punjabi Machine Translation System.
- Google Translate
Google Translate is a free statistical machine translation service provided by Google Inc. to translate a section of text, document or webpage, into another language.The service was introduced in April 28, 2006 for the Arabic language...
A free online translator from GoogleGoogle Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
.
- Google Translator Toolkit
Google Translator Toolkit is a web service designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and translation memories...
A web service designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and translation memories.
- Hindi to Punjabi Machine Translation System
Hindi to Punjabi machine translation system, developed at Punjabi University, Patiala by Gurpreet Singh Lehal and Dr. Vishal Goyal, is aimed to translate Hindi text into Punjabi text. It is based on the direct approach...
http://h2p.learnpunjabi.org, provides machine translation using a direct approach. It translates Hindi into Punjabi. It also features writing e-mail in the Hindi language and sending the same in Punjabi to the recipient.
- Punjabi to Hindi Machine Translation Systemhttp://www.jgmatrix.com, provides machine translation using a direct approach. It translates Punjabi to Hindi. It also features converting any website in Punjabi to Hindi on the fly. The Punjabi Website must be in Unicode.
- IdiomaX
IdiomaX LLC is a translation software company that has been offering translation products and services for the international market since 1996.- History :IdiomaX was established in 1996...
, which powers online translation services at idiomax.com
- localization tools
In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...
, such and Alchemy CatalystAlchemy CATALYST is a software Internationalization and localization suite which is developed by Alchemy Software Development Limited.-History:...
and Multilizer.
- Jibbigo
Jibbigo is a mobile language translation application that was developed by Mobile Technologies, LLC and Dr. Alex Waibel, a professor at Carnegie Mellon. Jibbigo is an offline voice translator, and does...
http://www.Jibbigo.com sells a bidirectional, offline, speech-to-speech translation app for Apple's App Store and the Android MarketAndroid Market is an online software store developed by Google for Android OS devices. Its gateway is an application program called "Market", preinstalled on most Android devices, allows users to browse and download mobile apps published by third-party developers...
.
- Kilgray memoQ A translation environment provided for human translators. http://kilgray.com/products/memoq
- LetsMT! http://www.letsmt.com Cloud-based platform for generation of custom MT engines from user provided data. Powered by Moses
Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...
.
- LinguaSys http://www.linguasys.net provides highly customized hybrid machine translation that can go from any language to any language.
- Lucy Software http://www.lucysoftware.com Translates in several European languages.
- Moses
Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...
A free softwareFree software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
statistical machine translation engine released under the LGPL license for WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
and LinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
.
- NeuroTran is a software translator that translates books, web pages, documents, e-mails, faxes, memos, manuals, reports, spreadsheets, correspondence, letters and more to and from many languages. For Windows and Macintosh.
- Power Translator
- Promt
PROMT is a Russian company focused upon the development of machine translation systems. At the moment PROMT translators exist for English, German, French, Spanish, Italian, Portuguese and Russian languages...
, which powers online translation services at Voila.fr and Orange.fr
- SDL ETS and SDL Language Weaver
SDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as...
which power FreeTranslation.com (website)
- SDL Passolo
SDL Passolo is an award-winning specialised visual software localisation tool developed to enable the translation of user interfaces. They currently have a newly released 2009 version.-History:...
- SiShiTra — A hybrid machine translation engine for Spanish-Catalan translation.
- SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
, which powers Yahoo! Babel Fish
- Ta with you http://www.tauyou.com is specialized in customized machine translation solutions in any language. Their web-based user interface makes it easy for any Language Service Provider to generate any combination of domain and language pair to achieve the best quality. Their solution works with almost human quality for a wide variety of language pairs.
- Tilde Translator http://translate.tilde.com Free online translator for Latvian language. Provides also free apps for Android and iOS.
- Toggletext uses a transfer-based system (known as Kataku) to translate between English
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
and IndonesianIndonesian is the official language of Indonesia. Indonesian is a normative form of the Riau Islands dialect of Malay, an Austronesian language which has been used as a lingua franca in the Indonesian archipelago for centuries....
.
- Translate and Back http://trans121.com A free online round-trip machine translation tool, which enables checking correctness by back translation. Contains virtual keyboards and human voice. Suitable for right to left languages, as well.
- Yandex
Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine...
http://translate.yandex.ru/ translates between EnglishEnglish is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
and RussianRussian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
and UkrainianUkrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....
.
- Worldlingo provides machine translation using both statistical based TE's and rule based TE's. Most recognizable as the MT partner in Microsoft Windows and Microsoft Mac Office
Microsoft Office 2008 for Mac is a version of the Microsoft Office productivity suite for Mac OS X. It supersedes Office 2004 for Mac and is the Mac OS X equivalent of Office 2007. Office 2008 was developed by Microsoft's Macintosh Business Unit and released on January 15, 2008...
.
- Yahoo! Babel Fish, powered by SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
A number of translation software programs are available free of charge, e.g.
ForeignDesk, the multiplatform
Okapi FrameworkThe Okapi Framework is a cross-platform and open-source set of components and applications that offer extensive support for localizing and translating documentation and software.- Architecture :The Okapi Framework is organized around the following parts:...
,
GTS Website Translator and
OmegaT+.
While no system provides the holy grail of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output. The quality of machine translation is substantially improved if the domain is restricted and controlled.
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the
European CommissionThe European Commission is the executive body of the European Union. The body is responsible for proposing legislation, implementing decisions, upholding the Union's treaties and the general day-to-day running of the Union....
. The MOLTO project, for example, coordinated by the University of Gothenburg, received more than 2.375 million euros project support from the EU to create a reliable translation tool that covers a majority of the EU languages.
http://www.molto-project.eu/
GoogleGoogle Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
has claimed that promising results were obtained using a proprietary statistical machine translation engine. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English had an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.
With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering.
In-Q-Tel (a
venture capitalVenture capital is financial capital provided to early-stage, high-potential, high risk, growth startup companies. The venture capital fund makes money by owning equity in the companies it invests in, which usually have a novel technology or business model in high technology industries, such as...
fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like
Language WeaverSDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as...
. Currently the military community is interested in translation and processing of languages like
ArabicArabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
,
PashtoPashto , known as Afghani in Persian and Pathani in Punjabi , is the native language of the indigenous Pashtun people or Afghan people who are found primarily between an area south of the Amu Darya in Afghanistan and...
, and
DariDari or Fārsī-ye Darī in historical terms refers to the Persian court language of the Sassanids. In contemporary usage, the term refers to the dialects of modern Persian language spoken in Afghanistan, and hence known as Afghan Persian in some Western sources. It is the term officially recognized...
. The Information Processing Technology Office in DARPA hosts programs like
TIDESTIDES is an ambitious technology development effort, funded by DARPA. It stands for Translingual Information Detection, Extraction and Summarization. It is focused on the automated processing and understanding of a variety of human language data...
and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology.
The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook, or
instant messagingInstant Messaging is a form of real-time direct text-based chatting communication in push mode between two or more people using personal computers or other devices, along with shared clients. The user's text is conveyed over a network, such as the Internet...
clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other. Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as
mobile translationMobile translation is a machine translation service for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. It relies on computer programming in the sphere of computational linguistics and the device's communication means to work...
tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.
Evaluation
Machine translation systems and output can be evaluated along numerous dimensions. The intended use of the translation, characteristics of the MT software, the nature of the translation process, etc., all affect how one evaluates MT systems and their output. The FEMTI taxonomy of dimensions, with associated evaluation metrics, appears at http://www.issco.unige.ch:8080/cocoon/femti/st-home.html .
There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. Automated means of evaluation include
BLEUBLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,...
,
NISTNIST is a method for evaluating the quality of text which has been translated using machine translation. Its name comes from the US National Institute of Standards and Technology....
and
METEORMETEOR is a metric for the evaluation of machine translation output. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision...
.
Relying exclusively on unedited machine translation ignores the fact that communication in
human languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human. The late
Claude PironClaude Piron was a psychologist and a translator for the United Nations from 1956 to 1961....
wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve
ambiguitiesAmbiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...
in the
source textA source text is a text from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.-Description:...
, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.
In certain applications, however, e.g., product descriptions written in a controlled language, a
dictionary-based machine-translationMachine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation...
system has produced satisfactory translations that require no human intervention save for quality inspection.
See also
- Comparison of machine translation applications
A machine translation application is a program which can translate text or speech from one natural language to another. Machine translation applications are essential to the modern language industry...
- Statistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
- Artificial Intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
- Cache language model
A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution...
- Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
- Universal Networking Language
Universal Networking Language is a declarative formal language specifically designed to represent semantic data extracted from natural language texts...
- Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
and Translation memoryA translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
- Controlled natural language
Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity.Traditionally, controlled languages fall into two major types:...
- Fuzzy matching
Fuzzy matching is a technique used in computer-assisted translation and some other information technology applications such as record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous...
- Postediting
Postediting “is the process of improving a machine-generated translation with a minimum of manual labour”. A person who postedits is called a posteditor. The concept of postediting is linked to that of pre-editing...
- History of machine translation
The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of...
- Human Language Technology
- Language barrier
Language barrier is a figurative phrase used primarily to indicate the difficulties faced when people who have no language in common attempt to communicate with each other...
- List of emerging technologies
- List of research laboratories for machine translation
- Pseudo-translation
- Translation
- Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
- Universal translator
A universal translator is a device common to many science fiction works, especially on television. First described in Murray Leinster's 1945 novella "First Contact", the translator's purpose is to offer an instant translation of any language...
- Phraselator
The Phraselator is a weatherproof handheld language translation device developed by VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, MD.-The device:...
- Mobile translation
Mobile translation is a machine translation service for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. It relies on computer programming in the sphere of computational linguistics and the device's communication means to work...
External links