Recursive descent parser
Encyclopedia
A recursive descent parser is a top-down
Top-down parsing
Top-down parsing is a type of parsing strategy where in one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar...

 parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 built from a set of mutually-recursive
Mutual recursion
Mutual recursion is a form of recursion where two mathematical or computational functions are defined in terms of each other.For instance, consider two functions even? and odd? defined as follows: function even?...

 procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

A predictive parser is a recursive descent parser that does not require backtracking
Backtracking
Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution.The classic textbook example...

. Predictive parsing is possible only for the class of LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 grammars, which are the context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

s for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...

. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k) grammar.) A predictive parser runs in linear time.

Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to LL(k) grammars, but is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backup may require exponential time.

Although predictive parsers are widely used, programmers often prefer to create LR
LR parser
In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...

 or LALR
LALR parser
In computer science, an LALR parser is a type of LR parser based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton...

 parsers via parser generators without transforming the grammar into LL(k)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 form.

Example parser

The following EBNF-like grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 (for Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

's PL/0
PL/0
At least two programming languages are known as PL/0. One is a subset of IBM's general-purpose programming language PL/I.The other PL/0, covered here, is similar to but much simpler than the general-purpose programming language Pascal, intended as an educational programming language. It serves as...

 programming language, from Algorithms + Data Structures = Programs
Algorithms + Data Structures = Programs
Algorithms + Data Structures = Programsis a 1976 book written by Niklaus Wirth covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related...

) is in LL(1)
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 form:


program = block "." .

block =
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .

statement =
[ident ":=" expression
| "call" ident
| "begin" statement {";" statement} "end"
| "if" condition "then" statement
| "while" condition "do" statement
] .

condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor =
ident
| number
| "(" expression ")" .


Terminals are expressed in quotes. Each nonterminal is defined by a rule in the grammar, except for ident and number, which are assumed to be implicitly defined.

C implementation

What follows is an implementation of a recursive descent parser for the above language in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.

Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, sym, which contains the next symbol from the input, and the function getsym, which updates sym when called.

The implementations of the functions getsym and error are omitted for simplicity.


typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym} Symbol;

Symbol sym;
void getsym(void);
void error(const char msg[]);
void expression(void);

int accept(Symbol s) {
if (sym s) {
getsym;
return 1;
}
return 0;
}

int expect(Symbol s) {
if (accept(s))
return 1;
error("expect: unexpected symbol");
return 0;
}

void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression;
expect(rparen);
} else {
error("factor: syntax error");
getsym;
}
}

void term(void) {
factor;
while (sym times || sym

slash) {
getsym;
factor;
}
}

void expression(void) {
if (sym

plus || sym

minus)
getsym;
term;
while (sym

plus || sym

minus) {
getsym;
term;
}
}

void condition(void) {
if (accept(oddsym)) {
expression;
} else {
expression;
if (sym

eql || sym

neq || sym

lss || sym

leq || sym

gtr || sym geq) {
getsym;
expression;
} else {
error("condition: invalid operator");
getsym;
}
}
}

void statement(void) {
if (accept(ident)) {
expect(becomes);
expression;
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement;
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition;
expect(thensym);
statement;
} else if (accept(whilesym)) {
condition;
expect(dosym);
statement;
}
}

void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block;
expect(semicolon);
}
statement;
}

void program(void) {
getsym;
block;
expect(period);
}

Implementation in functional languages
Recursive descent parsers are particularly easy to implement in functional languages such as Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

, Lisp or ML
ML programming language
ML is a general-purpose functional programming language developed by Robin Milner and others in the early 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM...

 as they tend to be better suited for recursion in general.

See Functional Pearls: Monadic Parsing in Haskell
See also
  • JavaCC
    JavaCC
    JavaCC is an open source parser generator and lexical analyzer generator for the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNF notation, except the output is Java source code...

     - a recursive descent parser generator
  • Coco/R
    Coco/R
    Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form grammar of a source language and generates a scanner and a parser for that language....

     - a recursive descent parser generator
  • ANTLR
    ANTLR
    In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

     - a recursive descent parser generator
  • Parsing expression grammar
    Parsing expression grammar
    A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...

     - another form representing recursive descent grammar
  • Spirit Parser Framework
    Spirit Parser Framework
    The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form completely in C++...

     - a C++ recursive descent parser generator framework requiring no pre-compile step
  • Tail recursive parser
    Tail Recursive Parser
    Tail recursive parsers are derived from the more common Recursive descent parsers. Tail recursive parsers are commonly used to parse left recursive grammars. They use a smaller amount of stack space than regular recursive descent parsers. They are also easy to write. Typical recursive descent...

     - a variant of the recursive descent parser
  • parboiled
    Parboiled (Java)
    parboiled is an open-source Java library released under an Apache License. It provides support for defining PEG parsers directly in Java source code....

     - a recursive descent PEG parsing library for Java
  • Recursive ascent parser
    Recursive ascent parser
    In computing, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent...


External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK