Transfer-based machine translation - AbsoluteAstronomy.com

Transfer-based machine translation is a type of machine translation

Machine translation

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

. It is based on the idea of interlingua

Interlingua

Interlingua is an international auxiliary language , developed between 1937 and 1951 by the International Auxiliary Language Association...

and is currently one of the most widely used methods of machine translation

Overview

Both transfer-based and interlingua-based machine translation have the same idea: to make a translation it is necessary to have an intermediate representation that captures the "meaning" of the original sentence in order to generate the correct translation. In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved.

The way in which transfer-based machine translation systems work varies substantially, but in general they follow the same pattern: they apply sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analysing the input text for morphology

Morphology (linguistics)

In linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...

and syntax

Syntax

In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

(and sometimes semantics

Semantics

Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

) to create an internal representation. The translation is generated from this representation using both bilingual dictionaries and grammatical rules.

It is possible with this translation strategy to obtain fairly high quality translations, with accuracy in the region of 90% (although this is highly dependent on the language pair in question — for example the distance between the two).

How it works

In a rule-based machine translation system the original text is first analysed morphologically and syntactically in order to obtain a syntactic representation. This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information. The transfer process then converts this final representation (still in the original language) to a representation of the same level of abstraction in the target language. These two representations are referred to as "intermediate" representations. From the target language representation, the stages are then applied in reverse.

Analysis and transformation

Various methods of analysis and transformation can be used before obtaining the final result. Along with these statistical approaches may be augmented generating hybrid systems. The methods which are chosen and the emphasis depends largely on the design of the system, however, most systems include at least the following stages:

Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.) All of the possible "analyses" for each surface form are typically outputted at this stage, along with the lemma of the word.
Lexical categorisation. In any given text some of the words may have more than one meaning
Meaning (linguistics)
In linguistics, meaning is what is expressed by the writer or speaker, and what is conveyed to the reader or listener, provided that they talk about the same thing . In other words if the object and the name of the object and the concepts in their head are the same...

, causing ambiguity
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

in analysis. Lexical categorisation looks at the context of a word to try to determine the correct meaning in the context of the input. This can involve part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

and word sense disambiguation
Word sense disambiguation
In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...

.
Lexical transfer. This is basically dictionary translation; the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen.
Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrase
Phrase
In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words which form a constituent and so function as a single unit in the syntax of a sentence. A phrase is lower on the grammatical hierarchy than a clause....

s and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases.
Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated.

Transfer types

One of the main features of transfer based machine translation systems is a phase that "transfers" an intermediate representation of the text in the original language to an intermediate representation of text in the target language. This can work at one of two levels of linguistic analysis, or somewhere in between. The levels are:

Superficial transfer (or syntactic). This level is characterised by transferring "syntactic structures" between the source and target languages. It is suitable for languages in the same family or of the same type, for example in the Romance languages
Romance languages
The Romance languages are a branch of the Indo-European language family, more precisely of the Italic languages subfamily, comprising all the languages that descend from Vulgar Latin, the language of ancient Rome...

between Spanish, Catalan, French, Italian, etc.
Deep transfer (or semantic). This level constructs a semantic representation that is dependent on the source language. This representation can consist of a series of structures which represent the meaning. In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related languages (e.g. Spanish-English or Spanish-Basque, etc.)

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.