Context-free grammar generation algorithms
Encyclopedia
Straight-line grammars (SLG) are formal grammars that do not branch (every non-terminal has only one associated production rule) nor loop (if non-terminal A appears in a derivation of B, B does not appear in a derivation of A). Such grammars generate only one sequence, and this property makes them of interest in fields like Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...

, Lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

, Structure discovery and Compressed data structure
Compressed data structure
The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a conventional data structure for the problem, but whose size can be...

.

The problem of finding a SLG of minimal size is called the Smallest grammar problem
Smallest grammar problem
The smallest grammar problem is the problem of finding the smallest formal grammar which encodes for a unique string of characters. The size of a grammar is defined by the number of symbols on the right side of the production rules.-External links:*...

.
Formal Definition=
A Grammar G is a SLG iif:

1. for every non-terminal , there is at most one production rule that has as its left-hand side

2. take the graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...

 , with the set of non-terminals and if appears at the right-hand side of a production rule for . must be acyclic
Acyclic
Acyclic can refer to:* In chemistry, a compound which is not cyclic, e.g. alkanes and acyclic aliphatic compounds* In mathematics:** A graph without a cycle, especially*** A directed acyclic graph...

.

A SLG in Chomsky normal form
Chomsky normal form
In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form....

 is equivalent to a straight-line program. In general, the only type of grammar that are considered are context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

.
A list of algorithms =
  • Sequitur
    SEQUITUR algorithm
    Sequitur is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure from a sequence of discrete symbols. The algorithm operates in linear space and time...

  • Lempel-Ziv-Welch algorithm
    LZW
    Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978...

     creates a context-free grammar in a such deterministic way that it is necessary to store only the start rule of the generated grammar.
  • Byte pair encoding
    Byte pair encoding
    Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK