Grammar Induction - AbsoluteAstronomy.com

Grammatical induction, also known as grammatical inference or syntactic pattern recognition

Syntactic pattern recognition

Syntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features...

, refers to the process in machine learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

of learning a formal grammar

Formal grammar

A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

(usually in the form of re-write rules or productions) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. Grammatical inference is distinguished from traditional decision rules

Decision rules

A set of decision rules is the verbal equivalent of a graphical decision tree, which specifies class membership based on a hierarchical sequence of decisions...

and other such methods principally by the nature of the resulting model, which in the case of grammatical inference relies heavily on hierarchical substitutions. Whereas a traditional decision rule set is geared toward assessing object classification, a grammatical rule set is geared toward the generation of examples. In this sense, the grammatical induction problem can be said to seek a generative model, while the decision rule problem seeks a descriptive model.

Methodologies

There are a wide variety of methods for grammatical inference. Two of the classic sources are and . also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below.

Grammatical inference by trial-and-error

The method proposed in Section 8.7 of suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's version space

Version space

A version space in concept learning or induction is the subset of all hypotheses that are consistent with the observed training examples. This set contains all hypotheses that have not been eliminated as a result of being in conflict with observed data....

algorithm. The text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious.

Grammatical inference by genetic algorithms

Grammatical Induction using evolutionary algorithm

Evolutionary algorithm

In artificial intelligence, an evolutionary algorithm is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection...

s is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammar

Formal grammar

A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

s can easily be represented as a tree structure of production rules that can be subjected to evolutionary operators. Algorithm

Algorithm

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s of this sort stem from the genetic programming

Genetic programming

In artificial intelligence, genetic programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. It is a specialization of genetic algorithms where each individual is a computer program...

paradigm pioneered by John Koza

John Koza

John R. Koza is a computer scientist and a former consulting professor at Stanford University, most notable for his work in pioneering the use of genetic programming for the optimization of complex problems. He was a cofounder of Scientific Games Corporation, a company which built computer systems...

. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF

Extended Backus–Naur form

In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

language made trees

Tree (data structure)

In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

a more flexible approach.

Koza represented Lisp programs as trees

Tree (data structure)

. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions

Grammatical function

In linguistics, grammatical functions refer to functional relationships between participants in a proposition...

of the lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction.

In the case of Grammar Induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase

Noun phrase

In grammar, a noun phrase, nominal phrase, or nominal group is a phrase based on a noun, pronoun, or other noun-like word optionally accompanied by modifiers such as adjectives....

or a verb phrase

Verb phrase

In linguistics, a verb phrase or VP is a syntactic unit composed of at least one verb and the dependents of that verb. One can distinguish between two types of VPs, finite VPs and non-finite VPs . While phrase structure grammars acknowledge both, dependency grammars reject the existence of a...

) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal.

Grammatical inference by greedy algorithms

Like all greedy algorithm

Greedy algorithm

A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....

s, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage.
These made decisions deal usually with things like the making of a new or the removing of the existing rules, the choosing of the applied rule or the merging of some existing rules.
Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms.

These context-free grammar

Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

generating algorithms make the decision after every read symbol:

Lempel-Ziv-Welch algorithm
LZW
Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978...

creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar.
Sequitur
SEQUITUR algorithm
Sequitur is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure from a sequence of discrete symbols. The algorithm operates in linear space and time...

and its modifications.

These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions:

Byte pair encoding
Byte pair encoding
Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data...

and its optimizations.

Applications

The principle of grammar induction has been applied to other aspects of natural language processing

Natural language processing

Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, and have been applied (among many other problems) to morpheme

Morpheme

In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word,...

analysis, and even place name derivations. Grammar induction has also been used for lossless data compression

Lossless data compression

Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

and statistical inference

Statistical inference

In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

via
MML

Minimum message length

Minimum message length is a formal information theory restatement of Occam's Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct...

and MDL

Minimum description length

The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...

principles.

Methodologies

Grammatical inference by trial-and-error

Grammatical inference by genetic algorithms

Grammatical inference by greedy algorithms

Applications

See also