A
concrete syntax tree or
parse tree is an (ordered, rooted)
treeIn computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is not a tree, but an arborescence: an acyclic connected graph where each node has zero or more children nodes and at most one parent node...
that represents the
syntacticIn linguistics, syntax is the study of the principles and rules for constructing sentences in natural languages...
structure of a
stringIn mathematics, a string is an sequence of symbols that are chosen from a set or alphabet.In computer programming, a string is, essentially, a sequence of characters...
according to some
formal grammarA formal grammar is a set of rules for forming strings in a formal language. These rules that make up the grammar describe how to form strings from the language's alphabet that are valid according to the language's syntax...
. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the
leaf nodeIn computer science, a leaf node or external node is a node of a tree data structure that has zero child nodes. Often, leaf nodes are the nodes farthest from the root node. In the graph theory tree, a leaf node is a vertex of degree 1 other than the root...
s are labeled by terminals of the grammar. A program that produces such trees is called a
parserIn computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...
. Parse trees may be generated for
sentenceIn linguistics, a sentence is an expression in natural language—a grammatical and lexical unit consisting of one or more words, representing distinct and differentiated concepts, and combined to form a meaningful statement, question, request and command....
s in
natural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
s (
see natural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language...
), as well as during
processingA compiler is a computer program that transforms source code written in a computer language into another computer language...
of computer languages, such as
programming languageA programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human...
s.
A
concrete syntax tree or
parse tree is an (ordered, rooted)
treeIn computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is not a tree, but an arborescence: an acyclic connected graph where each node has zero or more children nodes and at most one parent node...
that represents the
syntacticIn linguistics, syntax is the study of the principles and rules for constructing sentences in natural languages...
structure of a
stringIn mathematics, a string is an sequence of symbols that are chosen from a set or alphabet.In computer programming, a string is, essentially, a sequence of characters...
according to some
formal grammarA formal grammar is a set of rules for forming strings in a formal language. These rules that make up the grammar describe how to form strings from the language's alphabet that are valid according to the language's syntax...
. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the
leaf nodeIn computer science, a leaf node or external node is a node of a tree data structure that has zero child nodes. Often, leaf nodes are the nodes farthest from the root node. In the graph theory tree, a leaf node is a vertex of degree 1 other than the root...
s are labeled by terminals of the grammar. A program that produces such trees is called a
parserIn computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...
. Parse trees may be generated for
sentenceIn linguistics, a sentence is an expression in natural language—a grammatical and lexical unit consisting of one or more words, representing distinct and differentiated concepts, and combined to form a meaningful statement, question, request and command....
s in
natural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
s (
see natural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language...
), as well as during
processingA compiler is a computer program that transforms source code written in a computer language into another computer language...
of computer languages, such as
programming languageA programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human...
s. Parse trees are distinct from
abstract syntax treeIn computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a certain programming language. Each node of the tree denotes a construct occurring in the source code...
s (also known simply as syntax trees), in that their structure and elements more concretely reflect the syntax of the input language.
Basic description
A parse tree is made up of nodes and branches. The image below represents a linguistic parse tree, here representing the
EnglishEnglish is a West Germanic language that developed in England during the Anglo-Saxon era. As a result of the military, economic, scientific, political, and cultural influence of the British Empire during the 18th, 19th, and early 20th centuries, and of the United States since the mid 20th century,...
sentence "John hit the ball". (The parse tree here is a greatly simplified one; for more information, see
X-bar theoryX-bar theory is a component of linguistic theory which attempts to identify syntactic features common to all languages. It claims that among their phrasal categories, all languages share certain structural similarities, including one known as the "X-bar", which does not appear in traditional phrase...
.) The parse tree is the entire structure, starting from S and ending in each of the leaf nodes (John, hit, the, ball). We use the following abbreviations in the example:
- S for sentence
In linguistics, a sentence is an expression in natural language—a grammatical and lexical unit consisting of one or more words, representing distinct and differentiated concepts, and combined to form a meaningful statement, question, request and command....
, the top-level structure in this example.
- NP for noun phrase
In grammar, a noun phrase is a phrase whose head is a noun or a pronoun, optionally accompanied by a set of modifiers.Noun phrases are very common cross-linguistically, but some languages like Tuscarora and Cayuga have been argued to lack this category.- Form :Noun phrases normally consist of a...
. The first (leftmost) NP, a single noun "John", serves as the subjectThe subject is one of the two main constituents of a clause, according to a tradition that can be tracked back to Aristotle. The other constituent is the predicate...
of the sentence. The second one is the objectAn object in grammar is a sentence element and is often part of the sentence predicate. It denotes somebody or something involved in the subject's "performance" of the verb...
of the sentence.
- VP for verb phrase
In linguistics, a verb phrase or VP is a syntactic structure composed of the predicative elements of a sentence and functions in providing information about the subject of the sentence.-VPs in the generative grammar framework:...
which serves as the predicateIn traditional grammar, a predicate is one of the two main parts of a sentence . For the simple sentence "John [is yellow]," John acts as the subject, and is yellow acts as the predicate, a subsequent description of the subject headed with a verb.In current linguistic semantics, a predicate is an...
.
- V for verb
kalleah hit meIn syntax, a verb is a word that usually denotes an action , an occurrence , or a state of being . Depending on the language, a verb may vary in form according to many factors, possibly including its tense, aspect, mood and voice...
. In this case, it's a transitive verbIn syntax, a transitive verb is a verb that requires both a direct subject and one or more objects.-Examples:Some examples of sentences with transitive verbs:*Harry sees Adam....
"hit".
- Det for determiner
A determiner is a noun modifier that expresses the reference of a noun or noun phrase in the context, including quantity, rather than attributes expressed by adjectives...
, in this instance the definite articleAn article is a word that combines with a noun to indicate the type of reference being made by the noun, and may also specify the volume or numerical scope of that reference. there are only three articles, a, an and and. The articles in the English language are the and a...
"the".
- and N for noun
In linguistics, a noun is a member of a large, open lexical category whose members can occur as the main word in the subject of a clause, the object of a verb, or the object of a preposition....
.

In a parse tree, each node is either a
root node, a
branch node, or a
leaf node. In the example to the right, S is a root node, NP and VP are branch nodes, while John, hit, the, and ball are all leaf nodes.
A node can also be referred to as parent node or a child node. A
parent node is one that has at least one other node linked by a branch under it. In the example, S is a parent of both NP and VP. A
child node is one that has at least one node directly above it to which it is linked by a branch of the tree. Again from our example, hit is a child node of V. The terms
mother and
daughter are also sometimes used for this relationship.
External links