All Topics  
Machine translation

 

   Email Print
   Bookmark   Link






 

Machine translation



 
 
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics
Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
 that investigates the use of computer software
Computer software

Computer software, or just software is a general term used to describe a collection of computer programs, Algorithm and Software documentation that perform some tasks on a computer system....
 to translate
Translation

Translation is the hermeneutics of the Meaning of a text and the subsequent production of an Dynamic and formal equivalence text, likewise called a "translation," that communicates the same message in another language....
 text or speech from one natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 to another. At its basic level, MT performs simple substitution
Substitution

: For Wikipedia Template Substitution, see...
 of words in one natural language for words in another. Using corpus
Corpus linguistics

Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language....
 techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology
Linguistic typology

Linguistic typology is a subfield of linguistics that studies and classifies languages according to their structural features. Its aim is to describe and explain the structural diversity of the world's languages....
, phrase recognition
Recognition

=Recognition=Recognition is one of the three basic memory tasks. It involves identifying objects or events that have been encountered before. It is the easiest of the memory tasks....
, and translation of idiom
Idiom

An idiom is a phrase whose meaning cannot be determined by the literal definition of the phrase itself, but refers instead to a figurative language meaning that is known only through common use....
s, as well as the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession
Profession

"A profession is a vocation founded upon specialised educational training, the purpose of which is to supply disinterested counsel and service to others, for a direct and definite compensation, wholly apart from expectation of other business gain"....
 (such as weather reports
Meteorology

Meteorology is the interdisciplinary scientific study of the Earth's atmosphere that focuses on weather processes and forecasting . Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the eighteenth century....
) — improving output by limiting the scope of allowable substitutions.






Discussion
Ask a question about 'Machine translation'
Start a new discussion about 'Machine translation'
Answer questions from other users
Full Discussion Forum



Recent Posts









Encyclopedia


Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics
Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
 that investigates the use of computer software
Computer software

Computer software, or just software is a general term used to describe a collection of computer programs, Algorithm and Software documentation that perform some tasks on a computer system....
 to translate
Translation

Translation is the hermeneutics of the Meaning of a text and the subsequent production of an Dynamic and formal equivalence text, likewise called a "translation," that communicates the same message in another language....
 text or speech from one natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 to another. At its basic level, MT performs simple substitution
Substitution

: For Wikipedia Template Substitution, see...
 of words in one natural language for words in another. Using corpus
Corpus linguistics

Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language....
 techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology
Linguistic typology

Linguistic typology is a subfield of linguistics that studies and classifies languages according to their structural features. Its aim is to describe and explain the structural diversity of the world's languages....
, phrase recognition
Recognition

=Recognition=Recognition is one of the three basic memory tasks. It involves identifying objects or events that have been encountered before. It is the easiest of the memory tasks....
, and translation of idiom
Idiom

An idiom is a phrase whose meaning cannot be determined by the literal definition of the phrase itself, but refers instead to a figurative language meaning that is known only through common use....
s, as well as the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession
Profession

"A profession is a vocation founded upon specialised educational training, the purpose of which is to supply disinterested counsel and service to others, for a direct and definite compensation, wholly apart from expectation of other business gain"....
 (such as weather reports
Meteorology

Meteorology is the interdisciplinary scientific study of the Earth's atmosphere that focuses on weather processes and forecasting . Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the eighteenth century....
) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified
Word sense disambiguation

In computational linguistics, word sense disambiguation is the process of identifying which word sense of a word is used in any given Sentence , when the word has a number of distinct senses....
 which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is".

History

The idea of machine translation may be traced back to 17th century. In 1629, René Descartes
René Descartes

Ren? Descartes , , also known as Renatus Cartesius , was a French philosophy, mathematician, scientist, and writer who spent most of his adult life in the Dutch Republic....
 proposed a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, The Georgetown experiment
Georgetown-IBM experiment

The Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian language sentences into English language....
 (1954) involved fully-automatic translation of over sixty Russian
Russian language

Russian is the most geographically widespread language of Eurasia, the most widely spoken of the Slavic languages, and the largest native language in Europe....
 sentences into English
English language

English is a West Germanic language that originated in Anglo-Saxon England and has lingua franca status in many parts of the world as a result of the military, economic, scientific, political and cultural influence of the British Empire in the 18th, 19th and early 20th centuries and that of the United States from the mid 20th century onwa...
. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.

Real progress was much slower, however, and after the ALPAC report
ALPAC

ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular....
 (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as computation
Computation

Computation is a general term for any type of information processing. This includes phenomena ranging from human thinking to calculations with a more narrow meaning....
al power increased and became less expensive, more interest was shown in statistical models for machine translation
Statistical machine translation

Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora....
.

The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth
Andrew Donald Booth

Andrew Donald Booth was a United Kingdom engineer, physicist and computer scientist who led the invention of the magnetic drum memory for computers and invented Booth's multiplication algorithm....
 and possibly others. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC
APEXC

The APEXC, or All Purpose Electronic X-Ray Computer was designed by Andrew D. Booth at Birkbeck, University of London, London in the early 1950s....
 machine at Birkbeck College
Birkbeck, University of London

Birkbeck, University of London, sometimes referred to by its former name Birkbeck College or by the abbreviation BBK, is a constituent college of the University of London....
 (University of London
University of London

Based primarily in London, England, United Kingdom, the University of London is a federal mega university made up of 31 affiliates: 19 separate university institutions, and 12 research institutes....
) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World
Wireless World

Wireless World was the pre-eminent British magazine for radio and electronics enthusiasts. It was one of the very few "informal" journals which were tolerated as a professional expense....
, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille
Braille

The Braille system is a method that is widely used by blindness people to read and write. Braille was devised in 1821 by Louis Braille, a Frenchman....
 texts by computer.

Translation process

The translation process may be stated as:
  1. Decoding
    Decoding

    Decoding is the reverse of encoding, which is the process of transforming information from one format into another. Information about decoding can be found in the following:...
     the meaning of the source text
    Source text

    A source text is a writing from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language....
    ; and
  2. Re-encoding
    Encoding

    Encoding is the process of transforming information from one format into another. The opposite operation is called decoding.There are a number of more specific meanings that apply in certain contexts:...
     this meaning in the target language
    Target language

    A target language is a language that is the focus or end result of certain processes.*In applied linguistics and second language pedagogy, the term "target language" refers to any language that learners are trying to learn in addition to their native language....
    .


Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text
Source text

A source text is a writing from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language....
 in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar
Grammar

Grammar is the field of linguistics that covers the conventions governing the use of any given natural language. It includes morphology and syntax, often complemented by phonetics, phonology, semantics, and pragmatics....
, semantics
Semantics

Semantics is the study of meaning in communication. The word is derived from the Greek language word s??a?t???? , "significant", from s??a??? , "to signify, to indicate" and that from s??a , "sign, mark, token"....
, syntax
Syntax

In linguistics, syntax is the study of the principles and rules for constructing Sentence s in natural languages. In addition to referring to the discipline, the term syntax is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the Irish syntax"....
, idiom
Idiom

An idiom is a phrase whose meaning cannot be determined by the literal definition of the phrase itself, but refers instead to a figurative language meaning that is known only through common use....
s, etc., of the source language, as well as the culture
Culture

Culture is difficult to define. For example, in 1952, Alfred Kroeber and Clyde Kluckhohn compiled a list of 164 definitions of "culture" in Culture: A Critical Review of Concepts and Definitions....
 of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language
Target language

A target language is a language that is the focus or end result of certain processes.*In applied linguistics and second language pedagogy, the term "target language" refers to any language that learners are trying to learn in addition to their native language....
.

Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language
Target language

A target language is a language that is the focus or end result of certain processes.*In applied linguistics and second language pedagogy, the term "target language" refers to any language that learners are trying to learn in addition to their native language....
 that "sounds" as if it has been written by a person.

This problem may be approached in a number of ways.

Approaches

Direct Translation and Transfer Translation Pyramind
Machine translation can use a method based on linguistic rules
Expert system

An expert system is software that attempts to reproduce the performance of one or more human experts, most commonly in a specific problem domain, and is a traditional application and/or subfield of artificial intelligence....
, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.

It is often argued that the success of machine translation requires the problem of natural language understanding
Natural language processing

Natural language processing is a field of computer science concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language....
 to be solved first.

Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation
Interlingual machine translation

Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation....
 or transfer-based machine translation
Transfer-based machine translation

Transfer-based machine translation is a type of machine translation, it is based on the idea of interlingua and is currently one of the most widely used methods of machine translation...
. These methods require extensive lexicon
Lexicon

In linguistics, the lexicon of a language is its vocabulary, including its words and expressions. More formally, it is a language's inventory of lexemes....
s with morphological
Morphology (linguistics)

Morphology is the identification, analysis and description of structure of words . While words are generally accepted as being the smallest units of syntax, it is clear that in most languages, words can be related to other words by rules....
, syntactic
Syntax

In linguistics, syntax is the study of the principles and rules for constructing Sentence s in natural languages. In addition to referring to the discipline, the term syntax is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the Irish syntax"....
, and semantic
Semantics

Semantics is the study of meaning in communication. The word is derived from the Greek language word s??a?t???? , "significant", from s??a??? , "to signify, to indicate" and that from s??a , "sign, mark, token"....
 information, and large sets of rules.

Given enough data, machine translation programs often work well enough for a native speaker
Native Speaker

Native Speaker is Chang-Rae Lee?s first novel. In Native Speaker, he creates a man named Henry Park who tries to assimilate into American society and become a ?native speaker.?...
 of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus
Text corpus

In linguistics, a corpus or text corpus is a large and structured set of texts . They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules on a specific universe....
 of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.

Rule-based

The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.

Transfer-based machine translation

Interlingual

Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the interlingua
Interlinguistics

Interlinguistics is the study of various aspects of international communication. This may include, for example, changes in languages related to contacts between two or more languages....
.

Dictionary-based

Machine translation can use a method based on dictionary
Dictionary

A dictionary is a book of Alphabetical order listed words in a specific language, with definitions, etymologies, pronunciations, and other information; or a book of alphabetically listed words in one language with their equivalents in another, also known as a lexicon....
 entries, which means that the words will be translated as they are by a dictionary.

Statistical


Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard
Hansard

Hansard is the traditional name for the printed Transcription of parliamentary debates in the Westminster system of government. In addition to the Parliament of the United Kingdom and the UK's devolved institutions, a Hansard is maintained for the Parliament of Canada and the Canadian provincial legislatures, the Parliament of Australia and...
 corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament
European Parliament

The European Parliament is the only direct election parliamentary institution of the European Union . Together with the Council of the European Union , it forms the bicameral Institutions of the European Union#Legislature of the Institutions of the European Union and has been described as one of the most powerful legislatures in the world....
. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE
Candide

Candide, ou l'Optimisme is a ian the Age of Enlightenment philosopher Voltaire, English translations of which have been titled Candide: Or, All for the Best ; Candide: Or, The Optimist ; and Candide: Or, Optimism ....
 from IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
. Google used SYSTRAN
SYSTRAN

SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
 for several years, but has switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations
United Nations

The United Nations is an international organization whose stated aims are to facilitate cooperation in international law, international security, economic development, Social change, human rights and achieving world peace....
 materials to train their system. Accuracy of the translation has improved.

Example-based


Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus
Corpus

Corpus is Latin for body. It can refer to:* Corpus Christi * Corpus linguistics** Text corpus, in linguistics, a large and structured set of texts...
 as its main knowledge base, at run-time. It is essentially a translation by analogy
Analogy

Analogy is both the cognition process of transferring information from a particular subject to another particular subject , and a language expression corresponding to such a process....
 and can be viewed as an implementation of case-based reasoning
Case-based reasoning

Case-based reasoning , broadly construed, is the process of solving new problems based on the solutions of similar past problems. An auto mechanic who fixes an engine by recalling another automobile that exhibited similar symptoms is using case-based reasoning....
 approach of machine learning
Machine learning

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases....
.

Major issues


Disambiguation

Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel
Yehoshua Bar-Hillel

Yehoshua Bar-Hillel was a philosopher, mathematician, and linguistics at the Hebrew University of Jerusalem, best known for his pioneering work in machine translation and formal linguistics....
. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.

The late Claude Piron
Claude Piron

Claude Piron , a linguistics and psychology, was a translator for the United Nations from 1956 to 1961.After leaving the UN he worked all over the world for the World Health Organization, as well as being a prolific author of Esperanto works....
, a long-time translator for the United Nations
United Nations

The United Nations is an international organization whose stated aims are to facilitate cooperation in international law, international security, economic development, Social change, human rights and achieving world peace....
 and the World Health Organization
World Health Organization

The World Health Organization is a specialized agency of the United Nations that acts as a coordinating authority on international public health....
, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities
Ambiguity

Ambiguity is the property of being ambiguous, where a word, term, notation, sign, symbol, phrase, Sentence , or any other form used for communication, is called ambiguous if it can be interpreted in more than one way....
 in the source text
Source text

A source text is a writing from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language....
, which the grammatical and lexical exigencies of the target language
Target language

A target language is a language that is the focus or end result of certain processes.*In applied linguistics and second language pedagogy, the term "target language" refers to any language that learners are trying to learn in addition to their native language....
 require to be resolved:

Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are the ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.


The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI
Ai

Ai may refer to:...
 than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.

Named entities

Related to named entity recognition
Named entity recognition

Named entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc....
 in information extraction
Information extraction

In natural language processing, information extraction is a type of information retrieval whose goal is to automatically extract structured information, i.e....
.

Applications

There are now many software programs for translating natural language, several of them online, such as:
  • SYSTRAN
    SYSTRAN

    SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
    , which powers Yahoo's Babel Fish
    Babel Fish (website)

    Babel Fish is a World Wide Web-based application on Yahoo! that machine translation text or web pages from one of several languages into another....
  • Promt
    Promt

    If you have misspelled the word prompt please proceed to PromptPROMT, founded in 1991, offers translation software for home and business use as well as for Internet and corporate intranets....
    , which powers online translation services at Voila.fr and Orange.fr
Although no system provides the holy grail of fully automatic high-quality machine translation, many systems produce reasonable output.

Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission
European Commission

The European Commission is the executive of the European Union. The body is responsible for proposing legislation, implementing decisions, upholding the Treaties of the European Union and the general day-to-day running of the Union....
.

Toggletext uses a transfer-based system (known as Kataku) to translate between English
English language

English is a West Germanic language that originated in Anglo-Saxon England and has lingua franca status in many parts of the world as a result of the military, economic, scientific, political and cultural influence of the British Empire in the 18th, 19th and early 20th centuries and that of the United States from the mid 20th century onwa...
 and Indonesian
Indonesian language

Indonesian is the official national language of Indonesia. It is based on a version of Malay language from the Riau islands in western Indonesia, today called Riau Indonesian....
.

Google
Google

Google Inc. is an United States public company, earning revenue from AdWords related to its Google search, Gmail, Google Maps, Google Apps, Orkut, and YouTube services as well as selling advertising-free versions of the Google Search Appliance....
 has claimed that promising results were obtained using a proprietary statistical machine translation engine. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.

With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. In-Q-Tel (a venture capital
Venture capital

Venture capital is a type of private equity capital typically provided to early-stage, high-potential, Growth investing companies in the interest of generating a return through an eventual realization event such as an IPO or mergers and acquisitions of the company....
 fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver
Language Weaver

Language Weaver is a Los Angeles, California?based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic translation and natural language processing - now known globally as statistical machine translation software ....
. Currently the military community is interested in translation and processing of languages like Arabic
Arabic language

Arabic is a Central Semitic language, thus related to and classified alongside other Semitic languages languages such as Hebrew language and Aramaic language....
, Pashto
Pashto language

Pashto , also known as Afghani, is an Indo-European language spoken primarily in Afghanistan and northwestern Pakistan. Pashto belongs to the East Iranian languages branch of the Indo-Iranian languages language family....
, and Dari. Information Processing Technology Office in DARPA hosts programs like TIDES
DARPA TIDES program

TIDES is an ambitious technology development effort, funded by DARPA. It stands for Translingual Information Detection, Extraction and Summarization....
 and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology.

Evaluation

There are various means for evaluating the performance of machine-translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. Automated means of evaluation include BLEU
Bilingual evaluation understudy

BLEU is an algorithm for evaluating the quality of text which has been machine translation from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is"....
, NIST
NIST (metric)

NIST is a method for evaluating the quality of text which has been translated using machine translation.It is based on the Bilingual evaluation understudy metric, but with some alterations....
 and METEOR
METEOR

METEOR is a Metrics for the evaluation of machine translation output. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision....
.

Relying exclusively on unedited machine translation ignores the fact that communication in human language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
 is context-embedded, and that it takes a human to adequately comprehend the context of the original text. Even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human.

It has, however, been asserted that in certain applications, e.g. product descriptions written in a controlled language, a dictionary-based machine-translation
Dictionary-based machine translation

Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them....
 system has produced satisfactory translations that require no human intervention.

See also

  • Comparison of Machine translation applications
    Comparison of machine translation applications

    A machine translation application is a program which can translation text or speech from one natural language to another. Please see the individual products' articles for further information....
  • Artificial Intelligence
    Artificial intelligence

    Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents,"...
  • Computational linguistics
    Computational linguistics

    Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
  • Universal Networking Language
    Universal Networking Language

    In machine translation, Universal Networking Language is an artificial pivot language that relies on the semi-automatic translation from the initial text in a natural language into its pivot equivalent....
  • Computer-assisted translation
    Computer-assisted translation

    Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer Computer software designed to support and facilitate the translation process....
  • Controlled natural language
    Controlled natural language

    Controlled natural languages are subsets of natural languages, obtained byrestricting the grammar and vocabulary in orderto reduce or eliminate ambiguity and complexity....
  • History of machine translation
    History of machine translation

    The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. The Georgetown-IBM experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English....
  • Human Language Technology
  • List of emerging technologies
    List of emerging technologies

    This is a list of emerging technologies. Emerging technologies are new and potentially disruptive technologies, which may marginalize an existing dominant technology....
  • List of research laboratories for machine translation
    List of research laboratories for machine translation

    The following is a list of research laboratory that focus on machine translation....
  • Pseudo-translation
  • Translation
    Translation

    Translation is the hermeneutics of the Meaning of a text and the subsequent production of an Dynamic and formal equivalence text, likewise called a "translation," that communicates the same message in another language....
  • Universal translator
    Universal translator

    The universal translator is a fictional device common to many science fiction works, especially on television. First described in Murray Leinster's 1945 novella "First Contact ,", the translator's purpose is to offer an instant translation of any language....
  • Wiktionary:Translations
  • Phraselator
    Phraselator

    The Phraselator is a weatherproof handheld language machine translation device developed by VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, MD....


External links

  • , an introductory guide to MT by D.J.Arnold et al. (1994)
  • by John Hutchins
    John Hutchins

    John Hutchins was a United States House of Representatives from Ohio.Hutchins was born in Vienna, Ohio. He was a first cousin of future congressman Wells Andrews Hutchins....
    . An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
  • — Publications by John Hutchins (includes PDFs of several books on machine translation)


Software