Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Recursive descent parser

Recursive descent parser

Discussion
Ask a question about 'Recursive descent parser'
Start a new discussion about 'Recursive descent parser'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
A recursive descent parser is a top-down
Top-down parsing
Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis...

 parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar.Parsing is also an earlier term for the diagramming of sentences of...

 built from a set of mutually-recursive
Mutual recursion
Mutual recursion is a form of recursion where two mathematical or computational functions are defined in terms of each other.For instance, consider two functions even? and odd? defined as follows:...

 procedures (or a non-recursive equivalent) where each such procedure
Procedure
Procedure may refer to:* Procedure * Algorithm, in mathematics and computing, a set of operations or calculations that accomplish some goal* Instructions or recipes, a set of commands that show how to prepare or make something...

 usually implements one of the production rules of the grammar
Formal grammar
A formal grammar is a set of rules for forming strings in a formal language. These rules that make up the grammar describe how to form strings from the language's alphabet that are valid according to the language's syntax...

. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

A predictive parser is a recursive descent parser that does not require backtracking. Predictive parsing is possible only for the class of LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to...

. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 grammar.) A predictive parser runs in linear time
Linear time
In computational complexity theory, an algorithm is said to take linear time, or O time, if the asymptotic upper bound for the time it requires is proportional to the size of the input, which is usually denoted n....

.

Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 grammars, but is not guaranteed to terminate unless the grammar is LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

. Even when they terminate, parsers that use recursive descent with backup may require exponential time
Exponential time
In complexity theory, exponential time is the computation time of a problem where the time to complete the computation, m, is bounded by an exponential function of the problem size, n...

.

Although predictive parsers are widely used, programmers often prefer to create LR
LR parser
In computer science, an LR parser is a parser for source code written in a computer language that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making...

 or LALR
LALR parser
An LALR parser is a type of parser defined in computer science as a Look-Ahead LR parser.LR parsing was invented by Donald Knuth in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the fastest parsing speed of all parsing algorithms...

 parsers via parser generators without transforming the grammar into LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 form.

Some authors define recursive descent parsers as the predictive parsers. Other authors use the term more broadly, to include backed-up recursive descent.

Example parser


The following EBNF-like grammar
Grammar
In linguistics, grammar is the set of logical and structural rules that govern the composition of sentences, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology and syntax, often complemented by phonetics, phonology,...

 (for Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

's PL/0
PL/0
There are at least two programming languages known as PL/0. One is a subset of IBM's general purpose programming language PL/I.The other PL/0, covered here, is a simplified version of the general-purpose programming language Pascal, intended as an educational programming language. It serves as an...

 programming language, from Algorithms + Data Structures = Programs
Algorithms + Data Structures = Programs
Algorithms + Data Structures = Programsis a book written by Niklaus Wirth covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related...

) is in LL(1)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 form:


program = block "." .

block =
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .

statement =
[ident ":=" expression
| "call" ident
| "begin" statement {";" statement} "end"
| "if" condition "then" statement
| "while" condition "do" statement
] .

condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor =
ident
| number
| "(" expression ")" .


Terminals are expressed in quotes. Each nonterminal is defined by a rule in the grammar, except for ident and number, which are assumed to be implicitly defined.

C implementation


What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.

Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, sym, which contains the next symbol from the input, and the function getsym, which updates sym when called.

The implementations of the functions getsym and error are omitted for simplicity.

typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym} Symbol;

Symbol sym;
void getsym(void);
void error(const char msg[]);
void expression(void);

int accept(Symbol s) {
if (sym s) {
getsym;
return 1;
}
return 0;
}

int expect(Symbol s) {
if (accept(s))
return 1;
error("expect: unexpected symbol");
return 0;
}

void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression;
expect(rparen);
} else {
error("factor: syntax error");
getsym;
}
}

void term(void) {
factor;
while (sym times || sym

slash) {
getsym;
factor;
}
}

void expression(void) {
if (sym

plus || sym

minus)
getsym;
term;
while (sym

plus || sym

minus) {
getsym;
term;
}
}

void condition(void) {
if (accept(oddsym)) {
expression;
} else {
expression;
if (sym

eql || sym

neq || sym

lss || sym

leq || sym

gtr || sym geq) {
getsym;
expression;
} else {
error("condition: invalid operator");
getsym;
}
}
}

void statement(void) {
if (accept(ident)) {
expect(becomes);
expression;
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement;
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition;
expect(thensym);
statement;
} else if (accept(whilesym)) {
condition;
expect(dosym);
statement;
}
}

void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block;
expect(semicolon);
}
statement;
}

void program(void) {
getsym;
block;
expect(period);
}

Implementation in functional languages
Recursive descent parsers are particularly easy to implement in functional languages such as Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry.- History :...

, Lisp or ML
ML programming language
ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM...

.

See Functional Pearls: Monadic Parsing in Haskell
See also
  • JavaCC
    JavaCC
    JavaCC is an open source parser generator for the Java programming language. JavaCC is similar to Yacc in that it generates a parser for a formal grammar provided in EBNF notation, except the output is Java source code...

     - a recursive descent parser generator
  • Coco/R
    Coco/R
    Coco/R is a compiler generator that takes an L-attributed EBNF grammar of a source language and generates a scanner and a parser for that language....

     - a recursive descent parser generator
  • ANTLR
    ANTLR
    In computer based language recognition, ANother Tool for Language Recognition is the name of a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

     - a recursive descent parser generator
  • Parsing expression grammar
    Parsing expression grammar
    A parsing expression grammar, or PEG, is a type of analytic formal grammar that describes a formal language in terms of a set of rules for recognizing strings in the language...

     - another form representing recursive descent grammar
  • Spirit Parser Framework
    Spirit Parser Framework
    The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form completely in C++...

     - a C++ recursive descent parser generator framework requiring no pre-compile step
  • Tail recursive parser
    Tail Recursive Parser
    Tail recursive parsers are derived from the more common Recursive descent parsers. Tail recursive parsers are commonly used to parse left recursive grammars. They use a smaller amount of stack space than regular recursive descent parsers. They are also easy to write. Typical recursive descent...

     - a variant of the recursive descent parser

External links