SEQUITUR algorithm
Encyclopedia
Sequitur is a recursive algorithm developed by Craig Nevill-Manning
Craig Nevill-Manning
Craig Nevill-Manning is a New Zealand computer scientist who founded Google's first remote engineering center, located in midtown Manhattan, where he is an Engineering Director...

 and Ian H. Witten in 1997 that infers a hierarchical structure (context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 software applications.

Method summary

The algorithm works by scanning a sequence of terminal symbols, building a list of all the symbol pairs which it has read. Whenever a second occurrence of a pair is discovered, the two occurrences are replaced in the sequence by an invented nonterminal symbol, the list of symbol pairs is adjusted to match the new sequence, and scanning continues. Once the scanning has been completed, the transformed sequence can be interpreted as the top-level rule in a grammar for the original sequence. The rule definitions for the nonterminal symbols which it contains can be found in the list of symbol pairs. Those rule definitions may themselves contain additional nonterminal symbols whose rule definitions can also be read from elsewhere in the list of symbol pairs.

See also

  • Context-free grammar
    Context-free grammar
    In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

  • Straight-line grammar
  • Lossless data compression
  • Data compression
    Data compression
    In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK