Compiler-compiler
Encyclopedia
A compiler-compiler or compiler generator is a tool that creates a parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

, interpreter, or compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 from some form of formal description of a language and machine
Machine
A machine manages power to accomplish a task, examples include, a mechanical system, a computing system, an electronic system, and a molecular machine. In common usage, the meaning is that of a device having parts that perform or assist in performing any type of work...

. The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 (usually in BNF) of a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

, and whose generated output is the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 of a parser often used as a component of a compiler. Similarly, code generator-generators such as (JBurg) exist, but such tools have not yet reached maturity.

The ideal compiler-compiler takes a description of a programming language and a target instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 architecture, and automatically generates a usable compiler from them. In practice, the state of the art has yet to reach this degree of sophistication and most compiler generators are not capable of handling semantic or target architecture information.

Variants

A typical parser generator associates executable code with each of the rules of the grammar that should be executed when these rules are applied by the parser. These pieces of code are sometimes referred to as semantic action routines since they define the semantics of the syntactic structure that is analyzed by the parser. Depending upon the type of parser that should be generated, these routines may construct a parse tree
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

 (or abstract syntax tree
Abstract syntax tree
In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...

), or generate executable code directly.

One of the earliest (1964), surprisingly powerful, versions of compiler-compilers is MetaII
META II
META II is a compiler writing language first released in 1962 by D. V. Schorre. It consists of syntax equations resembling Backus normal form and into which instructions to output assembly language commands are inserted. Compilers have been written in this language for VALGOL I and VALGOL II...

, which accepted grammars and code generation rules, and is able to compile itself and other languages.

Some experimental compiler-compilers take as input a formal description of programming language semantics, typically using denotational semantics
Denotational semantics
In computer science, denotational semantics is an approach to formalizing the meanings of programming languages by constructing mathematical objects which describe the meanings of expressions from the languages...

. This approach is often called 'semantics-based compiling', and was pioneered by Peter Mosses
Peter Mosses
Peter D. Mosses is a British computer scientist.Peter Mosses studied mathematics as an undergraduate at Trinity College, Oxford, and went on to undertake a DPhil supervised by Christopher Strachey in the Programming Research Group while at Wolfson College, Oxford in the early 1970s...

' Semantic Implementation System (SIS) in 1978. However, both the generated compiler and the code it produced were inefficient in time and space. No production compilers are currently built in this way, but research continues.

The Production Quality Compiler-Compiler
PQCC
The Production Quality Compiler-Compiler Project was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength compiler-compiler. PQCC would produce full, optimizing programming language compilers from descriptions of the programming language and the...

 project at Carnegie-Mellon University does not formalize semantics, but does have a semi-formal framework for machine description.

Compiler-compilers exist in many flavors, including bottom-up rewrite machine generators (see JBurg) used to tile syntax trees according to a rewrite grammar for code generation, and attribute grammar
Attribute grammar
An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler....

 parser generators (e.g. ANTLR
ANTLR
In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

 can be used for simultaneous type checking, constant propagation, and more during the parsing stage).

History

The first compiler-compiler to use that name was written by Tony Brooker
Tony Brooker
Tony Brooker graduated in Mathematics from Imperial College in 1945 and returned there in 1947 as Assistant Lecturer. His first computer project was the construction of a fast multiplier unit from electro-mechanical relays. This was taken over by Professor K D Tocher and incorporated into ICCE, the...

 in 1960 and was used to create compilers for the Atlas
Atlas Computer (Manchester)
The Atlas Computer was a joint development between the University of Manchester, Ferranti, and Plessey. The first Atlas, installed at Manchester University and officially commissioned in 1962, was one of the world's first supercomputers, considered to be the most powerful computer in the world at...

 computer at the University of Manchester
University of Manchester
The University of Manchester is a public research university located in Manchester, United Kingdom. It is a "red brick" university and a member of the Russell Group of research-intensive British universities and the N8 Group...

, including the Atlas Autocode
Atlas Autocode
Atlas Autocode was a programming language developed around 1965 at Manchester University for the Atlas Computer. It was developed by Tony Brooker and Derrick Morris as an improvement on the ALGOL programming languages, removing some of Algol's poorer features such as "passing parameters by name"...

 compiler. However it was rather different from modern compiler-compilers, and today would probably be described as being somewhere between a highly customisable generic compiler and an extensible-syntax language
Extensible programming
Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work...

. The name 'compiler-compiler' was far more appropriate for Brooker's system than it is for most modern compiler-compilers, which are more accurately described as parser generators. It is almost certain that the "Compiler Compiler" name has entered common use due to Yacc
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

 rather than Brooker's work being remembered.

Other examples of parser generators in the yacc vein are ANTLR
ANTLR
In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

, Coco/R
Coco/R
Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form grammar of a source language and generates a scanner and a parser for that language....

, CUP, GNU bison
GNU bison
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...

, Eli, FSL, SableCC
SableCC
SableCC is an open source compiler generator in Java. Stable version is licensed under the GNU Lesser General Public License...

 and JavaCC
JavaCC
JavaCC is an open source parser generator and lexical analyzer generator for the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNF notation, except the output is Java source code...

.

Several compiler-compilers

  • ANTLR
    ANTLR
    In computer-based language recognition, ANTLR , or ANother Tool for Language Recognition, is a parser generator that uses LL parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set , first developed in 1989, and is under active development...

  • Bison
    GNU bison
    GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...

  • Coco/R
    Coco/R
    Coco/R is a compiler generator that takes an L-attributed Extended Backus–Naur Form grammar of a source language and generates a scanner and a parser for that language....

  • ELI
    Eli
    Eli or ELI may refer to:*El , or Eli, a variant on the name of God as spoken in Arabic, Hebrew, and Aramaic*Eli , a common first name in Hebrew...

    , an integrated toolset for compiler construction.
  • Lemon
  • parboiled
    Parboiled (Java)
    parboiled is an open-source Java library released under an Apache License. It provides support for defining PEG parsers directly in Java source code....

    , a Java library for building parsers.
  • Packrat parser
  • PQCC
    PQCC
    The Production Quality Compiler-Compiler Project was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength compiler-compiler. PQCC would produce full, optimizing programming language compilers from descriptions of the programming language and the...

    , a compiler-compiler that is more than a parser generator.
  • VisualLangLab
    VisualLangLab
    VisualLangLab is an open source parser generator for JVM languages. VisualLangLab allows users to create and edit graphical grammar-trees that represent grammar rules....

    , a visual parser generator for JVM languages.
  • Yacc
    Yacc
    The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...


Other related topics

  • Parsing expression grammar
    Parsing expression grammar
    A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...

  • LL parser
    LL parser
    An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

  • LR parser
    LR parser
    In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...

  • Simple LR parser
    Simple LR parser
    An SLR parser will typically have more conflict states than an LALR parser. For real-world computer languages, an SLR parser is usually not adequate, but for student projects in a compiler class it is a good learning tool....

  • LALR parser
    LALR parser
    In computer science, an LALR parser is a type of LR parser based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton...

  • GLR parser
    GLR parser
    A GLR parser is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. First described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser"...


Historical

  • Brooker, R. A., et al., The Compiler-Compiler, Annual Review in Automatic Programming, Vol. 3, p. 229. (1963).
  • Brooker, R. A., Morris, D. and Rohl, J. S., Experience with the Compiler Compiler, Computer Journal, Vol. 9, p. 350. (February 1967).
  • Johnson, Stephen C.
    Stephen C. Johnson
    Stephen Curtis Johnson spent nearly 20 years at Bell Labs and AT&T where he wrote yacc, lint, spell and the Portable C Compiler machine .Johnson earned his PhD in mathematics but has spent his entire career in computer science...

    , Yacc—yet another compiler-compiler, Computer Science Technical Report 32, Bell Laboratories, Murray Hill, NJ, July 1975
  • McKeeman, William Marshall; Horning, James J.
    Jim Horning
    James J. "Jim" Horning is an American computer scientist and ACM Fellow.Jim Horning received a PhD in computer science from Stanford University in 1969 for a thesis entitled A Study of Grammatical Inference. He was a founding member, and later Chairman, of the Computer Systems Research Group at the...

    ; Wortman, David B., CS.toronto.edu, A Compiler Generator, Englewood Cliffs, N.J.: Prentice-Hall, 1970. ISBN 0-13-155077-2

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK