A
recursive descent parser is a
top-downTop-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis...
parserIn computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...
built from a set of
mutually-recursiveMutual recursion is a form of recursion where two mathematical or computational functions are defined in terms of each other.For instance, consider two functions even? and odd? defined as follows:...
procedures (or a non-recursive equivalent) where each such
procedureProcedure may refer to:* Procedure * Algorithm, in mathematics and computing, a set of operations or calculations that accomplish some goal* Instructions or recipes, a set of commands that show how to prepare or make something...
usually implements one of the production rules of the
grammarA formal grammar is a set of rules for forming strings in a formal language. These rules that make up the grammar describe how to form strings from the language's alphabet that are valid according to the language's syntax...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.
A
predictive parser is a recursive descent parser that does not require backtracking. Predictive parsing is possible only for the class of
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
grammars therefore exclude all ambiguous grammars, as well as all grammars that contain
left recursionIn computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to...
. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
grammar.) A predictive parser runs in
linear timeIn computational complexity theory, an algorithm is said to take linear time, or O time, if the asymptotic upper bound for the time it requires is proportional to the size of the input, which is usually denoted n....
.
Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
grammars, but is not guaranteed to terminate unless the grammar is
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
. Even when they terminate, parsers that use recursive descent with backup may require
exponential timeIn complexity theory, exponential time is the computation time of a problem where the time to complete the computation, m, is bounded by an exponential function of the problem size, n...
.
Although predictive parsers are widely used, programmers often prefer to create
LRIn computer science, an LR parser is a parser for source code written in a computer language that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making...
or
LALRAn LALR parser is a type of parser defined in computer science as a Look-Ahead LR parser.LR parsing was invented by Donald Knuth in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the fastest parsing speed of all parsing algorithms...
parsers via parser generators without transforming the grammar into
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
form.
Some authors define recursive descent parsers as the predictive parsers. Other authors use the term more broadly, to include backed-up recursive descent.
Example parser
The following EBNF-like
grammarIn linguistics, grammar is the set of logical and structural rules that govern the composition of sentences, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology and syntax, often complemented by phonetics, phonology,...
(for
Niklaus WirthNiklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
's
PL/0There are at least two programming languages known as PL/0. One is a subset of IBM's general purpose programming language PL/I.The other PL/0, covered here, is a simplified version of the general-purpose programming language Pascal, intended as an educational programming language. It serves as an...
programming language, from
Algorithms + Data Structures = ProgramsAlgorithms + Data Structures = Programsis a book written by Niklaus Wirth covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related...
) is in
LL(1)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
form:
program = block "." .
block =
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .
statement =
[ident ":=" expression
| "call" ident
| "begin" statement {";" statement} "end"
| "if" condition "then" statement
| "while" condition "do" statement
] .
condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .
expression = ["+"|"-"] term {("+"|"-") term} .
term = factor {("*"|"/") factor} .
factor =
ident
| number
| "(" expression ")" .
Terminals are expressed in quotes. Each nonterminal is defined by a rule in the grammar, except for
ident and
number, which are assumed to be implicitly defined.
C implementation
What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable,
sym, which contains the next symbol from the input, and the function
getsym, which updates
sym when called.
The implementations of the functions
getsym and
error are omitted for simplicity.
typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym} Symbol;
Symbol sym;
void getsym(void);
void error(const char msg[]);
void expression(void);
int accept(Symbol s) {
if (sym s) {
getsym;
return 1;
}
return 0;
}
int expect(Symbol s) {
if (accept(s))
return 1;
error("expect: unexpected symbol");
return 0;
}
void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression;
expect(rparen);
} else {
error("factor: syntax error");
getsym;
}
}
void term(void) {
factor;
while (sym times || sym slash) {
getsym;
factor;
}
}
void expression(void) {
if (sym
plus || sym minus)
getsym;
term;
while (sym
plus || sym minus) {
getsym;
term;
}
}
void condition(void) {
if (accept(oddsym)) {
expression;
} else {
expression;
if (sym
eql || sym neq || sym
lss || sym leq || sym
gtr || sym geq) {
getsym;
expression;
} else {
error("condition: invalid operator");
getsym;
}
}
}
void statement(void) {
if (accept(ident)) {
expect(becomes);
expression;
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement;
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition;
expect(thensym);
statement;
} else if (accept(whilesym)) {
condition;
expect(dosym);
statement;
}
}
void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block;
expect(semicolon);
}
statement;
}
void program(void) {
getsym;
block;
expect(period);
}
Implementation in functional languages
Recursive descent parsers are particularly easy to implement in functional languages such as
HaskellHaskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry.- History :...
, Lisp or
MLML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM...
.
See
Functional Pearls: Monadic Parsing in Haskell
See also
- JavaCC
JavaCC is an open source parser generator for the Java programming language. JavaCC is similar to Yacc in that it generates a parser for a formal grammar provided in EBNF notation, except the output is Java source code...
- a recursive descent parser generator
- Coco/R
Coco/R is a compiler generator that takes an L-attributed EBNF grammar of a source language and generates a scanner and a parser for that language....
- a recursive descent parser generator
- ANTLR
In computer based language recognition, ANother Tool for Language Recognition is the name of a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...
- a recursive descent parser generator
- Parsing expression grammar
A parsing expression grammar, or PEG, is a type of analytic formal grammar that describes a formal language in terms of a set of rules for recognizing strings in the language...
- another form representing recursive descent grammar
- Spirit Parser Framework
The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form completely in C++...
- a C++ recursive descent parser generator framework requiring no pre-compile step
- Tail recursive parser
Tail recursive parsers are derived from the more common Recursive descent parsers. Tail recursive parsers are commonly used to parse left recursive grammars. They use a smaller amount of stack space than regular recursive descent parsers. They are also easy to write. Typical recursive descent...
- a variant of the recursive descent parser
External links