All Topics  
Tree-adjoining grammar

 

   Email Print
   Bookmark   Link






 

Tree-adjoining grammar



 
 
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi
Aravind Joshi

Aravind Krishana Joshi is the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania....
. Tree-adjoining grammars are somewhat similar to context-free grammar
Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every Production rule is of the formwhere V is a single nonterminal symbol, and w is a string of Terminal and nonterminal symbolss and/or nonterminals ....
s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory)
Tree (graph theory)

In mathematics, more specifically graph theory, a tree is a graph in which any two Vertex are connected by exactly one path . Alternatively, any connectedness graph with no Cycle is a tree....
 and tree data structure).

originated in investigations by Joshi and his students into the family of adjunction grammars (AG), the "string grammar" of Zellig Harris
Zellig Harris

Zellig Sabbetai Harris was a renowned American linguistics, mathematical syntactician, and methodologist of science. Originally a Semitic languages, he is best known for his work in Structuralism#Structuralism in linguistics and discourse analysis and for the discovery of transformational structure in language, all achieved in the first 10 y...
.






Discussion
Ask a question about 'Tree-adjoining grammar'
Start a new discussion about 'Tree-adjoining grammar'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi
Aravind Joshi

Aravind Krishana Joshi is the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania....
. Tree-adjoining grammars are somewhat similar to context-free grammar
Context-free grammar

In formal language theory, a context-free grammar is a formal grammar in which every Production rule is of the formwhere V is a single nonterminal symbol, and w is a string of Terminal and nonterminal symbolss and/or nonterminals ....
s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory)
Tree (graph theory)

In mathematics, more specifically graph theory, a tree is a graph in which any two Vertex are connected by exactly one path . Alternatively, any connectedness graph with no Cycle is a tree....
 and tree data structure).

History

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG), the "string grammar" of Zellig Harris
Zellig Harris

Zellig Sabbetai Harris was a renowned American linguistics, mathematical syntactician, and methodologist of science. Originally a Semitic languages, he is best known for his work in Structuralism#Structuralism in linguistics and discourse analysis and for the discovery of transformational structure in language, all achieved in the first 10 y...
. AGs handle endocentric
Endocentric

In linguistics, an endocentric construction is a grammatical construction thatfulfills the same linguistic function as one of its constituent ....
 properties of language in a natural and effective way, but do not have a good characterization of exocentric
Exocentric

Exocentric has a number of meanings.In linguistics, it refers to phrases and compound Word s which are not the same part of speech as their constituents....
 constructions; the converse is true of rewrite grammars
Rewrite rule

In mathematics, linguistics and computer science, a rewrite rule in generative grammar is a rule of the form A ? X where A is a syntactic category label, such as noun phrase or sentence, and X is a sequence of such labels and/or morphemes, expressing the fact that A can be replaced by X in generating the constituent structure of a sentence....
, or phrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the Chomsky hierarchy
Chomsky hierarchy

Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....
 but intersects it in interesting and linguistically relevant ways.

Description

The rules in a TAG are trees with a special leaf node known as the foot node, which is anchored to a word. There are two types of basic trees in TAG: initial trees (often represented as ) and auxiliary trees (). Initial trees represent basic valency relations, while auxiliary tree allow for recursion. Auxiliary trees have the root (top) node and foot node labeled with the same symbol. A derivation starts with an initial tree, combining via either substitution or adjunction. Substitution replaces a frontier node with another tree whose top node has the same label. Adjunction inserts an auxiliary tree into the center of another tree.. The root/foot label of the auxiliary tree must match the label of the node at which it adjoins.

Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.

Complexity and application


Tree-adjoining grammars are often described as mildly context-sensitive, meaning that they possess certain properties that make them more powerful (in terms of weak generative capacity
Weak generative capacity

Weak generative capacity refers to the set of strings that can be generated by a generative grammar. All grammars of a given formal complexity can generate the same strings....
) than context-free grammars, but less powerful than indexed
Indexed grammar

An indexed grammar is a formal grammar which describes indexed languages. They have three disjoint sets of symbols: the usual terminals and nonterminals and the indices, which appear only in intermediate derivation steps....
 or context-sensitive grammar
Context-sensitive grammar

A context-sensitive grammar is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal symbol and nonterminal symbols....
s. Mildly context-sensitive grammars are conjectured to be powerful enough to model natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
s while remaining efficiently parseable in the general case.

Due to their formal weakness, TAG is often used in computational linguistics
Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the Statistics and/or rule-based modeling of natural language from a computational perspective....
 and natural language processing
Natural language processing

Natural language processing is a field of computer science concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language....
.

A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language . This type of processing can be represented by an embedded pushdown automaton
Embedded pushdown automaton

An embedded pushdown automaton or EPDA is a computational model that parse languages in the tree-adjoining grammar . It is similar to the context-free grammar-parsing pushdown automaton, except that instead of using a stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a complexity between context-...
.

Languages with cubes (ie triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.

For these reasons, languages generated by tree-adjoining grammars are referred to as mildly context-sensitive language
Mildly context-sensitive language

In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitive language to allow the parsing of natural languages....
s.

External links

  • , which uses a TAG for natural language processing.
  • with focus on comparison with Lexical Functional Grammar
    Lexical functional grammar

    Lexical functional grammar is a grammar framework in theoretical linguistics, a variety of generative grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the direction research in the area of transformational grammar had begun to take....
     and grammars extraction from Treebank
    Treebank

    A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name Treebank....
  • A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
  • The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for Multi-Component Tree Adjoining Grammars with Tree Tuples
  • which provides several tools to edit and compile MetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
  • A Lexicalized Tree Adjoining Grammar parser which provides an easy to use graphical environment (page in french)