Bootstrapping (compilers)
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, bootstrapping is the process of writing a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 (or assembler) in the target programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 which it is intended to compile. Applying this technique leads to a self-hosting
Self-hosting
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger...

 compiler.

A large proportion of programming languages are bootstrapped, including BASIC
BASIC
BASIC is a family of general-purpose, high-level programming languages whose design philosophy emphasizes ease of use - the name is an acronym from Beginner's All-purpose Symbolic Instruction Code....

, C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, Pascal, Factor
Factor programming language
Factor is a stack-oriented programming language created by Slava Pestov. Factor is dynamically typed and has automatic memory management, as well as powerful metaprogramming features. The language has a single implementation featuring a self-hosted optimizing compiler and an interactive development...

, Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

, Modula-2
Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...

, Oberon, OCaml, Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

, Scheme, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

 and more.

Advantages

Bootstrapping a compiler has the following advantages:

  • it is a non-trivial test of the language being compiled;
  • compiler developers only need to know the language being compiled;
  • compiler development can be done in the higher level language being compiled;
  • improvements to the compiler's back-end improve not only general purpose programs but also the compiler itself; and
  • it is a comprehensive consistency check as it should be able to reproduce its own object code.

The chicken and egg problem

If one needs a compiler for language X to obtain a compiler for language X (which is written in language X), how did the first compiler get written? Possible methods to solving this chicken or the egg problem include:
  • Implementing an interpreter
    Interpreter (computing)
    In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

     or compiler
    Compiler
    A compiler is a computer program that transforms source code written in a programming language into another computer language...

     for language X in language Y. Niklaus Wirth
    Niklaus Wirth
    Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

     reported that he wrote the first Pascal compiler in Fortran
    Fortran
    Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

    .
  • Another interpreter or compiler for X has already been written in another language Y; this is how Scheme is often bootstrapped.
  • Earlier versions of the compiler were written in a subset of X for which there existed some other compiler; this is how some supersets of Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    , Haskell
    Haskell (programming language)
    Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

    , and the initial Free Pascal
    Free Pascal
    Free Pascal Compiler is a free Pascal and Object Pascal compiler.In addition to its own Object Pascal dialect, Free Pascal supports, to varying degrees, the dialects of several other compilers, including those of Turbo Pascal, Delphi, and some historical Macintosh compilers...

     compiler are bootstrapped.
  • The compiler for X is cross compiled from another architecture where there exists a compiler for X; this is how compilers for C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

     are usually ported to other platforms. Also this is the method used for Free Pascal
    Free Pascal
    Free Pascal Compiler is a free Pascal and Object Pascal compiler.In addition to its own Object Pascal dialect, Free Pascal supports, to varying degrees, the dialects of several other compilers, including those of Turbo Pascal, Delphi, and some historical Macintosh compilers...

     after the initial bootstrap.
  • Writing the compiler in X; then hand-compiling it from source (most likely in a non-optimized way) and running that on the code to get an optimized compiler. Donald Knuth
    Donald Knuth
    Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

     used this for his WEB
    WEB
    WEB is a computer programming system created by Donald E. Knuth as the first implementation of what he called "literate programming": the idea that one could create software as works of literature, by embedding source code inside descriptive text, rather than the reverse , in an order that is...

     literate programming
    Literate programming
    Literate programming is an approach to programming introduced by Donald Knuth as an alternative to the structured programming paradigm of the 1970s....

     system.


Methods for distributing compilers in source code include providing a portable bytecode
Bytecode
Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

 version of the compiler, so as to bootstrap the process of compiling the compiler with itself.

Such methods are also one way of detecting or avoiding (or both) the potential problem pointed out in Reflections on Trusting Trust.

The T-diagram is a notation
Notation
-Written communication:* Phonographic writing systems, by definition, use symbols to represent components of auditory language, i.e. speech, which in turn refers to things or ideas. The two main kinds of phonographic notational system are the alphabet and syllabary...

 used to explain these compiler bootstrap techniques.

In some cases, the most convenient way to get a complicated compiler running on a system that has little or no software on it involves a series of ever more sophisticated assemblers and compilers.

History

The first language to provide such a bootstrap was NELIAC
NELIAC
The Navy Electronics Laboratory International ALGOL Compiler or NELIAC is a dialect and compiler implementation of the ALGOL 58 programming language developed by the Naval Electronics Laboratory in 1958....

. The first commercial language to do so was PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

.

The first self-hosting
Self-hosting
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger...

 compiler (excluding assemblers) was written for Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

 by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression
S-expression
S-expressions or sexps are list-based data structures that represent semi-structured data. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp family of programming languages...

 definition of the compiler work on itself through the interpreter.
(AI Memo 39)

This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

, such as the proof that the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

is undecidable.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK