Morphological parsing
Encyclopedia
Morphological parsing, in natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules
Orthographic rules
Orthographic rules are general rules used when breaking a word into its stem and modifiers. An example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this to Morphological rules which contain corner cases to these general rules. Both of these types of...

 and morphological rules
Morphological rules
Morphological rules are exceptions to the orthographic rules used when breaking a word into its stem and modifiers. An example would be while one normally pluralizes a word in English by adding 's' as a suffix, the word 'fish' does not change when pluralized. Contrast this to orthographic rules...

. For example, the word 'foxes' can be decomposed into 'fox' (the stem), and 'es' (a suffix indicating plurality).

The generally accepted approach to morphological parsing is through the use of a finite state transducer
Finite state transducer
A finite state transducer is a finite state machine with two tapes: an input tape and an output tape. This contrasts with an ordinary finite state automaton , which has a single tape.-Overview:...

 (FST), which inputs words and outputs their stem and modifiers. The FST is initially created through algorithmic parsing of some word source, such as a dictionary, complete with modifier markups.

Another approach is through the use of an indexed lookup method, which uses a constructed radix tree
Radix tree
In computer science, a radix tree is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single...

. This is not an often-taken route because it breaks down for morphologically complex languages.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK