A
recursive descent parser is a
top-downTop-down parsing is a type of parsing strategy where in one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar...
parserIn computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
built from a set of
mutually-recursiveMutual recursion is a form of recursion where two mathematical or computational functions are defined in terms of each other.For instance, consider two functions even? and odd? defined as follows: function even?...
procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the
grammarA formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.
A
predictive parser is a recursive descent parser that does not require
backtrackingBacktracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution.The classic textbook example...
. Predictive parsing is possible only for the class of
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
grammars, which are the
context-free grammarIn formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
s for which there exists some positive integer
k that allows a recursive descent parser to decide which production to use by examining only the next
k tokens of input. (The LL(
k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain
left recursionIn computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...
. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(
k) grammar.) A predictive parser runs in linear time.
Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to LL(k) grammars, but is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backup may require exponential time.
Although predictive parsers are widely used, programmers often prefer to create
LRIn computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...
or
LALRIn computer science, an LALR parser is a type of LR parser based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton...
parsers via parser generators without transforming the grammar into
LL(k)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
form.
Example parser
The following EBNF-like
grammarA formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
(for
Niklaus WirthNiklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
's
PL/0At least two programming languages are known as PL/0. One is a subset of IBM's general-purpose programming language PL/I.The other PL/0, covered here, is similar to but much simpler than the general-purpose programming language Pascal, intended as an educational programming language. It serves as...
programming language, from
Algorithms + Data Structures = ProgramsAlgorithms + Data Structures = Programsis a 1976 book written by Niklaus Wirth covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related...
) is in
LL(1)An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
form:
program = block "." .
block =
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .
statement =
[ident ":=" expression
| "call" ident
| "begin" statement {";" statement} "end"
| "if" condition "then" statement
| "while" condition "do" statement
] .
condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .
expression = ["+"|"-"] term {("+"|"-") term} .
term = factor {("*"|"/") factor} .
factor =
ident
| number
| "(" expression ")" .
Terminals are expressed in quotes. Each nonterminal is defined by a rule in the grammar, except for
ident and
number, which are assumed to be implicitly defined.
C implementation
What follows is an implementation of a recursive descent parser for the above language in
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable,
sym, which contains the next symbol from the input, and the function
getsym, which updates
sym when called.
The implementations of the functions
getsym and
error are omitted for simplicity.
typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym} Symbol;
Symbol sym;
void getsym(void);
void error(const char msg[]);
void expression(void);
int accept(Symbol s) {
if (sym s) {
getsym;
return 1;
}
return 0;
}
int expect(Symbol s) {
if (accept(s))
return 1;
error("expect: unexpected symbol");
return 0;
}
void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression;
expect(rparen);
} else {
error("factor: syntax error");
getsym;
}
}
void term(void) {
factor;
while (sym times || sym slash) {
getsym;
factor;
}
}
void expression(void) {
if (sym
plus || sym minus)
getsym;
term;
while (sym
plus || sym minus) {
getsym;
term;
}
}
void condition(void) {
if (accept(oddsym)) {
expression;
} else {
expression;
if (sym
eql || sym neq || sym
lss || sym leq || sym
gtr || sym geq) {
getsym;
expression;
} else {
error("condition: invalid operator");
getsym;
}
}
}
void statement(void) {
if (accept(ident)) {
expect(becomes);
expression;
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement;
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition;
expect(thensym);
statement;
} else if (accept(whilesym)) {
condition;
expect(dosym);
statement;
}
}
void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block;
expect(semicolon);
}
statement;
}
void program(void) {
getsym;
block;
expect(period);
}
Implementation in functional languages
Recursive descent parsers are particularly easy to implement in functional languages such as
HaskellHaskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
, Lisp or
MLML is a general-purpose functional programming language developed by Robin Milner and others in the early 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM...
as they tend to be better suited for recursion in general.
See
Functional Pearls: Monadic Parsing in Haskell
See also
- JavaCC
JavaCC is an open source parser generator and lexical analyzer generator for the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNF notation, except the output is Java source code...
- a recursive descent parser generator
- Coco/R
Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form grammar of a source language and generates a scanner and a parser for that language....
- a recursive descent parser generator
- ANTLR
In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...
- a recursive descent parser generator
- Parsing expression grammar
A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...
- another form representing recursive descent grammar
- Spirit Parser Framework
The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form completely in C++...
- a C++ recursive descent parser generator framework requiring no pre-compile step
- Tail recursive parser
Tail recursive parsers are derived from the more common Recursive descent parsers. Tail recursive parsers are commonly used to parse left recursive grammars. They use a smaller amount of stack space than regular recursive descent parsers. They are also easy to write. Typical recursive descent...
- a variant of the recursive descent parser
- parboiled
parboiled is an open-source Java library released under an Apache License. It provides support for defining PEG parsers directly in Java source code....
- a recursive descent PEG parsing library for Java
- Recursive ascent parser
In computing, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent...
External links