Intermediate language
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, an intermediate language is the language of an abstract machine
Abstract machine
An abstract machine, also called an abstract computer, is a theoretical model of a computer hardware or software system used in automata theory...

 designed to aid in the analysis of computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

s. The term comes from their use in compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s, where a compiler first translates the source code of a program into a form more suitable for code-improving transformations, as an intermediate step before generating object
Object file
An object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....

 or machine code for a target machine. The design of an intermediate language typically differs from that of a practical machine language in three fundamental ways:
  • Each instruction represents exactly one fundamental operation; e.g. "shift-add" addressing modes common in microprocessors are not present.
  • Control flow
    Control flow
    In computer science, control flow refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated....

     information may not be included in the instruction set.
  • The number of registers available may be large, even limitless.


A popular format for intermediate languages is three address code
Three address code
In computer science, three-address code is a form of representing intermediate code used by compilers to aid in the implementation of code-improving transformations...

.

A variation in the meaning of this term is to refer to those languages used as an intermediate language by some high-level
High-level programming language
A high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...

 programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

 language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s which do not output object or machine code, but output the intermediate language only, to submit to a compiler for such language, which then outputs finished object or machine code. This is usually done to gain optimization
Optimization (computer science)
In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...

 much as treated above, or portability
Porting
In computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...

 by using an intermediate language that has compilers for many processors
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 and operating systems, such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

. Languages used for this fall in complexity between high-level languages and low-level
Low-level programming language
In computer science, a low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture. Generally this refers to either machine code or assembly language...

 languages, such as assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

s.

Intermediate representation

An intermediate representation (IR) is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 that is constructed from input data to a program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

, and from which part or all of the output data of the program is constructed in turn. Use of the term usually implies that most of the information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 present in the input is retained by the intermediate representation, with further annotations or rapid lookup features.

A canonical example is found in most modern compilers, where the linear human-readable text representing a program is transformed into an intermediate graph
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...

 data structure that allows flow analysis and re-arrangements before starting to create the list of actual CPU instructions that will do the work. Use of an intermediate representation allows compiler systems like LLVM to be targeted by many different source languages, and support generation for many different target architectures.

Languages

Though not explicitly designed as an intermediate language, C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

's nature as an abstraction of assembly and its ubiquity as the de-facto system language
System programming language
System programming languages are programming languages that are statically typed, allow arbitrarily complex data structures, are compiled, and are meant to operate largely independently of other programs. Prototypical system programming languages are C and Modula-2...

 in Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 and other operating systems has made it a popular intermediate language: Eiffel
Eiffel (programming language)
Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer and Eiffel Software. The design of the language is closely connected with the Eiffel programming method...

, Sather
Sather
Sather is an object-oriented programming language. It originated circa 1990 at the International Computer Science Institute at the University of California, Berkeley, developed by an international team led by Steve Omohundro...

, Esterel
Esterel
Esterel is a synchronous programming language for the development of complex reactive systems. The imperative programming style of Esterel allows the simple expression of parallelism and preemption...

, some dialect
Programming language dialect
A dialect of a programming language is a variation or extension of the language that does not change its intrinsic nature. With languages such as Scheme and Forth, standards may be considered insufficient, inadequate or even illegitimate by implementors, so often they will deviate from the...

s of Lisp (Lush, Gambit
Gambit (Scheme implementation)
Gambit, also called Gambit-C, is a free software Scheme implementation, consisting of a Scheme interpreter, and a compiler which compiles Scheme to C. Its documentation claims conformance to the R4RS, R5RS, and IEEE standards, as well as several SRFIs...

), Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

 (Glasgow Haskell Compiler
Glasgow Haskell Compiler
The Glorious Glasgow Haskell Compilation System, more commonly known as the Glasgow Haskell Compiler or GHC, is an open source native code compiler for the functional programming language Haskell. The lead developers are Simon Peyton Jones and Simon Marlow...

), Squeak
Squeak
The Squeak programming language is a Smalltalk implementation. It is object-oriented, class-based and reflective.It was derived directly from Smalltalk-80 by a group at Apple Computer that included some of the original Smalltalk-80 developers...

's C-subset Slang, Cython
Cython
Cython is a programming language to simplify writing C and C++ extension modules for the CPython Python runtime. Strictly speaking, Cython syntax is a superset of Python syntax additionally supporting:...

, Vala
Vala (programming language)
Vala is a programming language created with the goal of bringing modern language features to C, with no added runtime needs and with little overhead, by targeting the GObject object system. It is being developed by Jürg Billeter and Raffaele Sandrini. The syntax borrows heavily from C#...

, and others make use of C as an intermediate language. Variants of C have been designed to provide C's features as a portable assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

, including one of the two languages called C--
C--
C-- is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers...

, the C Intermediate Language
C Intermediate Language
CIL is a simplified subset of the C programming language, as well as a set of tools for transforming C programs into that language.Several other tools use CIL as a way to have access to a C abstract syntax tree...

 and the Low Level Virtual Machine.

Sun Microsystem's Java bytecode
Java bytecode
Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

 is the intermediate language used by all compilers targeting the Java Virtual Machine
Java Virtual Machine
A Java virtual machine is a virtual machine capable of executing Java bytecode. It is the code execution component of the Java software platform. Sun Microsystems stated that there are over 4.5 billion JVM-enabled devices.-Overview:...

. The JVM can then do just-in-time compilation
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

 to get executable machine code to improve performances. Similarly, Microsoft's Common Intermediate Language
Common Intermediate Language
Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and is used by the .NET Framework and Mono...

 is an intermediate language designed to be shared by all compilers for the .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

, before static or dynamic compilation to machine code.

The GNU Compiler Collection
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 (GCC) uses internally several intermediate languages to simplify portability and cross-compilation. Among these languages are
  • the historical Register Transfer Language
    Register Transfer Language
    In computer science, register transfer language is a term used to describe a kind of intermediate representation that is very close to assembly language, such as that which is used in a compiler. Academic papers and textbooks also often use a form of RTL as an architecture-neutral assembly language...

     (RTL)
  • the tree language GENERIC
  • the SSA
    Static single assignment form
    In compiler design, static single assignment form is a property of an intermediate representation , which says that each variable is assigned exactly once...

    -based GIMPLE.


While most intermediate languages are designed to support statically typed languages, the Parrot intermediate representation
Parrot intermediate representation
The Parrot intermediate representation , previously called Intermediate code , is one of the two assembly languages for the Parrot virtual machine. The other is Parrot assembly language or PASM...

 is designed to support dynamically typed languages—initially Perl and Python.

The ILOC intermediate language is used in classes on compiler design as a simple target language.

See also

  • Pivot language
    Pivot language
    A pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates A to the pivot language P, then from P to B...

  • Abstract syntax tree
    Abstract syntax tree
    In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...

  • Bytecode
    Bytecode
    Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

     (Intermediate code)
  • Symbol table
    Symbol table
    In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK