All Topics  
Compiler

 

   Email Print
   Bookmark   Link






 

Compiler



 
 
A compiler is a computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
 (or set of programs) that transforms source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 written in a computer language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable
Executable

In computing, an executable causes a computer "to perform indicated tasks according to encoded instruction ," as opposed to a file that only contains data ....
 program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language
High-level programming language

In computing, a high-level programming language is a programming language with strong Abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or more Porting across platforms....
 to a lower level language (e.g., assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
 or machine language).






Discussion
Ask a question about 'Compiler'
Start a new discussion about 'Compiler'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A compiler is a computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
 (or set of programs) that transforms source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 written in a computer language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable
Executable

In computing, an executable causes a computer "to perform indicated tasks according to encoded instruction ," as opposed to a file that only contains data ....
 program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language
High-level programming language

In computing, a high-level programming language is a programming language with strong Abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or more Porting across platforms....
 to a lower level language (e.g., assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
 or machine language). A program that translates from a low level language to a higher level one is a decompiler
Decompiler

A decompiler is the name given to a computer program that performs the reverse operation to that of a compiler. That is, it translates a file containing information at a relatively low level of abstraction into a form having a higher level of abstraction ....
. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter
Rewriting

In mathematics, computer science and logic, rewriting covers a wide range of potentially Deterministic computation methods of replacing subterms of a formula with other terms....
 is usually a program that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis
Lexical analysis

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers....
, preprocessing, parsing
Parsing

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
, semantic analysis, code generation, and code optimization.

The term compiler-compiler
Compiler-compiler

A compiler-compiler or compiler generator is a tool that creates a parsing, interpreter , or compiler from some form of formal description....
 is sometimes used to refer to a parser generator, a tool often used to help create a compiler.

History

Software for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 capacity of early computers also created many technical problems when implementing a compiler.

Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper
Grace Hopper

Rear admiral Grace Murray Hopper was an American computer scientist and United States Navy officer. A pioneer in the field, she was one of the first programmers of the Harvard Mark I calculator, and she developed the first compiler for a computer programming language....
, in 1952, for the A-0 programming language
A-0 programming language

The A-0 system , written by Grace Hopper in 1951 and 1952 for the UNIVAC I, was the first compiler ever developed for an electronic computer. The A-0 functioned more as a Loader or linker than the modern notion of a compiler....
. The FORTRAN
Fortran

Fortran is a general-purpose programming language, procedural programming language, imperative programming language programming language that is especially suited to numerical analysis and scientific computing....
 team led by John Backus
John Backus

For the physicist, see John Backus John Warner Backus was an American computer scientist. He led the team that invented the first widely used High-level programming language programming language and was the inventor of the Backus-Naur form , the almost universally used notation to define formal language syntax....
 at IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
 is generally credited as having introduced the first complete compiler, in 1957. COBOL
COBOL

COBOL is one of the oldest programming languages still in active use. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
 was an early language to be compiled on multiple architectures, in 1960.

In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
s and the increasing complexity of computer architectures, compilers have become more and more complex.

Early compilers were written in assembly language. The first self-hosting
Self-hosting

The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program?for example, a compiler that can compile its own source code....
 compiler — capable of compiling its own source code in a high-level language — was created for Lisp
Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older....
 by Tim Hart and Mike Levin at MIT
Massachusetts Institute of Technology

The Massachusetts Institute of Technology is a private university research university located in Cambridge, Massachusetts, Massachusetts, United States....
 in 1962. Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal
Pascal (programming language)

Pascal is an influential imperative programming and Procedural programming programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structure....
 and C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping
Bootstrapping (compilers)

Bootstrapping is a term used in computer science to describe the techniques involved in writing a compiler in the target programming language which it is intended to compile....
 problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter
Interpreter (computing)

In computer science, an interpreter normally means a computer program that execution , i.e. performs, instructions written in a programming language....
.

Compilers in education

Compiler construction and compiler optimization
Compiler optimization

Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attribute of an executable computer program....
 are taught at universities as part of the computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 curriculum. Such courses are usually supplemented with the implementation of a compiler for an educational programming language
Educational programming language

An educational programming language is a programming language that is designed primarily as a learning instrument and not so much as a tool for writing real-world application programs....
. A well-documented example is Niklaus Wirth
Niklaus Wirth

Niklaus Emil Wirth is a Switzerland computer science, best known for designing several programming languages, including Pascal , and for pioneering several classic topics in software engineering....
's PL/0
PL/0

There are at least two programming languages known as PL/0. One is a subset of IBM's general-purpose programming language programming language PL/I....
 compiler, which Wirth used to teach compiler construction in the 1970s. In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field:

  1. Program development by stepwise refinement (also the title of a 1971 paper by Wirth)
  2. The use of a recursive descent parser
    Recursive descent parser

    A recursive descent parser is a top-down parsing built from a set of Mutual recursion procedures where each such procedure usually implements one of the production rules of the formal grammar....
  3. The use of EBNF to specify the syntax of a language
  4. A code generator producing portable P-code
    P-code

    P-code can refer to:* p-code machine * Code used in the UCSD p-System * P-code is also part of the Global Positioning System signal* In the automotive industry, P-codes are engine diagnostic codes displayed by an OBD-II scanner....
  5. The use of T-diagrams in the formal description of the bootstrapping
    Bootstrapping (compilers)

    Bootstrapping is a term used in computer science to describe the techniques involved in writing a compiler in the target programming language which it is intended to compile....
     problem


Compiler output


One classification of compilers is by the platform
Platform (computing)

In computing, a platform describes some sort of hardware architecture or software framework , that allows Computer software to run. Typical platforms include a computer's Computer architecture, operating system, programming languages and related runtime libraries or graphical user interface....
 on which their generated code executes. This is known as the target platform.

A native or hosted compiler is one whose output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a cross compiler
Cross compiler

A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. Cross compiler programming tools are used to generate executables for embedded system or multiple platforms....
 is designed to run on a different platform. Cross compilers are often used when developing software for embedded system
Embedded system

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints....
s that are not intended to support a software development environment.

The output of a compiler that produces code for a virtual machine
Virtual machine

In computer science, a virtual machine is a software implementation of a machine that executes programs like a real machine.Definitions...
 (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason such compilers are not usually classified as native or cross compilers.

Compiled versus interpreted languages


Higher-level programming languages are generally divided for convenience into compiled language
Compiled language

A compiled language is a programming language whose programming language implementations are typically compilers , and not interpreter s .The term is somewhat vague; in principle any language can be implemented with a compiler or with an interpreter....
s and interpreted language
Interpreted language

In computer programming an interpreted language is a programming language whose implementation often takes the form of an interpreter . Theoretically, any language may be compiler or interpreted, so this designation is applied purely because of common implementation practice and not some underlying property of a language....
s. However, in practice there is rarely anything about a language that requires it to be exclusively compiled, or exclusively interpreted; although it is possible to design languages that may be inherently interpretive. The categorization usually reflects the most popular or widespread implementations of a language — for instance, BASIC is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.

Modern trends toward just-in-time compilation
Just-in-time compilation

In computing, just-in-time compilation , also known as dynamic translation, is a technique for improving the runtime performance of a computer program....
 and bytecode interpretation
Bytecode

Bytecode is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software Interpreter as well as being suitable for further compilation into machine language....
 at times blur the traditional categorizations of compilers and interpreters.

Some language specifications spell out that implementations must include a compilation facility; for example, Common Lisp
Common Lisp

Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in American National Standards Institute standard document Information Technology - Programming Language - Common Lisp, formerly X3.226-1994 ....
. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a runtime library
Runtime library

In computer programming, a runtime library is a special program library used by a compiler, to implement functions built into a programming language, during the runtime of a computer program....
 that includes a version of the compiler itself.

Hardware compilation


The output of some compilers may target hardware
Hardware

Hardware is a general term that refers to the physical cultural artifacts of a technology. It may also mean the physical components of a computer system, in the form of computer hardware....
 at a very low level, for example a Field Programmable Gate Array (FPGA) or structured Application-specific integrated circuit
Application-specific integrated circuit

An application-specific integrated circuit is an integrated circuit customized for a particular use, rather than intended for general-purpose use....
 (ASIC). Such compilers are said to be hardware compilers or synthesis tools because the programs they compile effectively control the final configuration of the hardware and how it operates; the output of the compilation are not instructions that are executed in sequence - only an interconnection of transistors or lookup tables. For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity, Synopsys and other vendors.

Compiler design


In the early days, the approach taken to compiler design used to be directly affected by the complexity of the processing, the experience of the person(s) designing it, and the resources available.

A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. When the source language is large and complex, and high quality output is required the design may be split into a number of relatively independent phases. Having separate phases means development can be parceled up into small parts and given to different people. It also becomes much easier to replace a single phase by an improved one, or to insert new phases later (eg, additional optimizations).

The division of the compilation processes into phases was championed by the Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon University. This project introduced the terms front end, middle end, and back end.

All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at where these two ends meet is always open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).

The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.

The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.

This front-end/middle/back-end approach makes it possible to combine front ends for different languages
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 with back ends for different CPUs. Practical examples of this approach are the GNU Compiler Collection
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
, LLVM, and the Amsterdam Compiler Kit
Amsterdam Compiler Kit

The Amsterdam Compiler Kit is a fast, lightweight and Retargeting suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs, and is Minix's native toolchain....
, which have multiple front-ends, shared analysis and multiple back-ends.

One-pass versus multi-pass compilers

Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.

The ability to compile in a single pass
One-pass compiler

In computer programming, a one-pass compiler is a compiler that passes through the source code of each compilation unit only once. In other words, a one-pass compiler does not "look back" at code it previously processed....
 is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compiler
Multi-pass compiler

A multi-pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. This is in contrast to a one-pass compiler, which traverses the program only once....
s. Many languages were designed so that they could be compiled in a single pass (e.g., Pascal
Pascal (programming language)

Pascal is an influential imperative programming and Procedural programming programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structure....
).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations
Compiler optimization

Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attribute of an executable computer program....
 needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.

While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:

  • A "source-to-source compiler
    Source-to-source compiler

    A "source-to-source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic Parallel compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations or language constructs ....
    " is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. OpenMP
    OpenMP

    The OpenMP is an application programming interface that supports multi-platform shared memory multiprocessing programming in C , C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms....
    ) or language constructs (e.g. Fortran's DOALL statements).
  • Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog
    Prolog

    Prolog is a logic programming language. It is a general purpose language often associated with artificial intelligence and computational linguistics....
     implementations
    • This Prolog machine is also known as the Warren Abstract Machine
      Warren abstract machine

      In 1983, David H. D. Warren designed an abstract machine for the execution of Prolog consisting of a computer storage architecture and an instruction set [War83]....
       (or WAM). Bytecode compilers for Java, Python, and many more are also a subtype of this.
  • Just-in-time compiler
    Just-in-time compilation

    In computing, just-in-time compilation , also known as dynamic translation, is a technique for improving the runtime performance of a computer program....
    , used by Smalltalk and Java systems, and also by Microsoft .Net's Common Intermediate Language
    Common Intermediate Language

    Common Intermediate Language is the lowest-level human-readable programming language in the Common Language Infrastructure and in the .NET Framework....
     (CIL)
    • Applications are delivered in bytecode
      Bytecode

      Bytecode is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software Interpreter as well as being suitable for further compilation into machine language....
      , which is compiled to native machine code just prior to execution.


Front end


The front end analyzes the source code to build an internal representation of the program, called the intermediate representation
Intermediate representation

In computing, an intermediate representation is a data structure that is constructed from input data to a computer program, and from which part or all of the output data of the program is constructed in turn....
 or IR. It also manages the symbol table
Symbol table

In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter , where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its data type, scope level and sometimes its location....
, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over several phases, which includes some of the following:

  1. Line reconstruction. Languages which strop
    Stropping

    When applied to computer languages, stropping refers to the method used to mark letter sequences as having a special property . For instance, some implementations of Algol 68 treat letter sequences prefixed by a single quote, ', as being keywords whereas Algol 60 commonly used only the convention of quotes around the word ....
     their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down
    Top-down parsing

    Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis....
    , recursive-descent
    Recursive descent parser

    A recursive descent parser is a top-down parsing built from a set of Mutual recursion procedures where each such procedure usually implements one of the production rules of the formal grammar....
    , table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode
    Atlas Autocode

    Atlas Autocode was a programming language developed around 1965 at Manchester University for the Atlas Computer. It was developed by Tony Brooker and Derrick Morris as an improvement on the ALGOL programming languages, removing some of Algol's poorer features such as "passing parameters by name" ....
    , and Imp
    Edinburgh IMP

    Edinburgh IMP is a development of ATLAS Autocode, initially developed around 1966-1969 at Edinburgh University, Scotland. IMP was a general-purpose programming language which was used heavily for systems programming....
     (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase.
  2. Lexical analysis
    Lexical analysis

    In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers....
     breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier
    Identifier

    In computer science, Identifiers are Lexical Token s that name entity. The concept is analogy to that of a "name". Identifiers are used extensively in virtually all information processing systems....
     or symbol name
    Symbol

    A symbol is something such as an entity, picture, written word, sound, or particular mark that represents something else by association, resemblance, or convention....
    . The token syntax is typically a regular language
    Regular language

    In theoretical computer science, a regular language is a formal language that satisfies the following equivalent properties:* it can be accepted by a deterministic finite state machine...
    , so a finite state automaton constructed from a regular expression
    Regular expression

    In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters....
     can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.
  3. Preprocessing
    Preprocessor

    In computer science, a preprocessor is a Computer program that processes its input data to produce output that is used as input to another program....
    . Some languages, e.g., C
    C (programming language)

    C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
    , require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.
  4. Syntax analysis involves parsing
    Parsing

    In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
     the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree
    Parse tree

    A parse tree or concrete syntax tree is an tree that represents the syntax structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by nonterminals of the grammar, while the leaf nodes are labeled by terminal symbol of the grammar....
    , which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar
    Formal grammar

    In formal language theory, grammars, also called formal grammars or generative grammars, are a formalism used to describe formal languages – i.e....
     which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.
  5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree
    Parse tree

    A parse tree or concrete syntax tree is an tree that represents the syntax structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by nonterminals of the grammar, while the leaf nodes are labeled by terminal symbol of the grammar....
     and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding
    Object binding

    Several object binding times exist in object oriented systems. Java , for example, has late Name binding leading to more loose coupling systems ....
     (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing
    Parsing

    In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of lexical analysis#Token to determine their grammatical structure with respect to a given formal grammar....
     phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.


Back end


The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.

The main phases of the back end include the following:

  1. Analysis: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use-define chain
    Use-define chain

    A Use-Definition Chain is a data structure that consists of a use, U, a variable, and all the definitions, D, of that variable that can reach that use without any other intervening definitions....
    s, dependence analysis
    Dependence analysis

    In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2....
    , alias analysis
    Alias analysis

    Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be Aliasing if they point to the same location....
    , pointer analysis
    Pointer analysis

    In computer science pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables or storage locations....
    , escape analysis
    Escape analysis

    In programming language compiler optimization theory, escape analysis is a method for determining the dynamic scope of pointers. It is related to pointer analysis and shape analysis ....
     etc. Accurate analysis is the basis for any compiler optimization. The call graph
    Call graph

    A call graph is a directed Graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g....
     and control flow graph
    Control flow graph

    File:Simplified Control Flowgraphs.jpgA control flow graph in computer science is a Group representation, using graph notation, of all paths that might be traversed through a computer program during its execution ....
     are usually also built during the analysis phase.
  2. Optimization
    Compiler optimization

    Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attribute of an executable computer program....
    : the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion
    Inline expansion

    In computing, inline expansion, or inlining, is a compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the size of the final program....
    , dead code elimination
    Dead code elimination

    In compiler theory, dead code elimination is a compiler optimization that removes code that does not affect the program. Removing such code has two benefits....
    , constant propagation, loop transformation, register allocation
    Register allocation

    In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of Central processing unit processor register....
     or even automatic parallelization
    Automatic parallelization

    Automatic parallelization, also auto parallelization, autoparallelization, parallelization, or //ization , the last two of which imply automation when used in context, refers to converting sequential source code into multi-threaded or vectorized code in order to utilize multiple processors simultaneously in a shared-m...
    .
  3. Code generation: the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi-Ullman algorithm
    Sethi-Ullman algorithm

    When Code generation for arithmetic expressions, the compiler has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree ....
    ).


Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, dependence analysis
Dependence analysis

In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2....
 is crucial for loop transformation.

In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a basic block
Basic block

In computing, a basic block is code that has one entry point , one exit point and no jump instructions contained within it. The start of a basic block may be jumped to from more than one location....
 to the procedure/function level, or even over the whole program (interprocedural optimization
Interprocedural optimization

Interprocedural optimization is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used Function of small or medium length....
). Obviously, a compiler can potentially do a better job using a broader view. But that broad view is not free: large scope analysis and optimizations are very costly in terms of compilation time and memory space; this is especially true for interprocedural analysis and optimizations.

Interprocedural analysis and optimizations are common in modern commercial compilers from HP
Hewlett-Packard

The Hewlett-Packard Company , commonly referred to as HP, is a technology corporation headquartered in Palo Alto, California, United States....
, IBM
IBM

International Business Machines Corporation, abbreviated IBM and nicknamed "Big Blue" , is a multinational corporation computer technology and consulting corporation headquartered in Armonk, New York, New York, United States....
, SGI
Silicon Graphics

Silicon Graphics, Inc. is a company manufacturer high-performance computing solutions, including computer hardware and computer software. SGI was founded by James H....
, Intel, Microsoft
Microsoft

Microsoft Corporation is a multinational corporation computer technology corporation that develops, manufactures, licenses, and supports a wide range of computer software products for computing devices....
, and Sun Microsystems
Sun Microsystems

Sun Microsystems, Inc. is a multinational corporation vendor of computers, computer components, computer software, and information technology services, founded on February 24, 1982....
. The open source GCC
GNU Compiler Collection

The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain....
 was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open source compiler with full analysis and optimization infrastructure is Open64
Open64

Open64 is an open source, optimizing compiler for the Intel IA-64 , AMD Opteron and Intel IA-32e architecture. It derives from the Silicon Graphics compilers for the MIPS R10000 processor, called MIPSPro....
, which is used by many organizations for research and commercial purposes.

Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them by default. Users have to use compilation options to explicitly tell the compiler which optimizations should be enabled.

Related techniques


Assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
 is not a high-level language and a program that compiles it is more commonly known as an assembler, with the inverse program known as a disassembler
Disassembler

A disassembler is a computer program that translates machine language into assembly language?the inverse operation to that of an Assembly language#Assembler....
.

A program that translates from a low level language to a higher level one is a decompiler
Decompiler

A decompiler is the name given to a computer program that performs the reverse operation to that of a compiler. That is, it translates a file containing information at a relatively low level of abstraction into a form having a higher level of abstraction ....
.

A program that translates between high-level languages is usually called a language translator, source to source translator, language converter, or language rewriter
Rewriting

In mathematics, computer science and logic, rewriting covers a wide range of potentially Deterministic computation methods of replacing subterms of a formula with other terms....
. The last term is usually applied to translations that do not involve a change of language.

International conferences and organizations

Every year, the European Joint Conferences on Theory and Practice of Software (ETAPS) sponsors a conference on Compiler Construction, with papers from both the academic and industrial sectors.

See also


  • Abstract interpretation
    Abstract interpretation

    In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattice s....
  • Attribute grammar
    Attribute grammar

    An Attribute grammar is a formal way to define Attribute for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler....
  • Bottom-up parsing
    Bottom-up parsing

    Bottom-up parsing is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them....
  • Compiler-compiler
    Compiler-compiler

    A compiler-compiler or compiler generator is a tool that creates a parsing, interpreter , or compiler from some form of formal description....
  • Error avalanche
  • History of compiler writing
    History of compiler writing

    In computing, a compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
  • Just-in-time compilation
    Just-in-time compilation

    In computing, just-in-time compilation , also known as dynamic translation, is a technique for improving the runtime performance of a computer program....
  • Linker
    Linker

    In computer science, a linker or link editor is a computer program that takes one ormore object file generated by a compiler and combines them into a single executable program....
  • List of compilers
    List of compilers

    This page is intended to list all current compilers, compiler generators, interpreters, translators, etc....
  • List of important publications in computer science#Compilers
  • Metacompilation
  • Semantics encoding
    Semantics encoding

    A semantics encoding is a "translation" between formal languages.For programmers, the most familiar form of encoding is the compilation of a programming language into machine code or byte-code....


External links