Low Level Virtual Machine
Encyclopedia
The Low Level Virtual Machine (LLVM) is a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 infrastructure written in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 that is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s. Originally implemented for C/C++, the language-agnostic design (and the success) of LLVM has since spawned a wide variety of front ends, including Objective-C
Objective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...

, Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

, Ada
Ada (programming language)
Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

, Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

, Java bytecode
Java bytecode
Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

, ActionScript
ActionScript
ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...

, GLSL
GLSL
OpenGL Shading Language , is a high-level shading language based on the syntax of the C programming language...

, Clang
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages. It uses the Low Level Virtual Machine as its back end, and Clang has been part of LLVM releases since LLVM 2.6....

, and others.

The LLVM project started in 2000 at the University of Illinois at Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner
Chris Lattner
Chris Lattner is an American software developer, best known as the primary author of the Low Level Virtual Machine project and related projects, such as the clang compiler. He is currently the chief architect of the Compiler Group at Apple Inc.- Background :...

. LLVM was originally developed as a research infrastructure to investigate dynamic compilation
Dynamic compilation
Dynamic compilation is a process used by some programming language implementations to gain performance during program execution. Although the technique originated in the Self programming language, the best-known language that uses this technique is Java...

 techniques for static and dynamic programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s. LLVM was released under the University of Illinois Open Source License, a BSD-style license
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

. In 2005, Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems. LLVM is an integral part of Apple's latest development tools for Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 and iOS.

Description

LLVM can provide the middle layers of a complete compiler system, taking intermediate form (IF) code from a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 and emitting an optimized IF. This new IF can then be converted and linked into machine-dependent assembly code
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 for a target platform. LLVM can accept the IF from the GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 toolchain, allowing it to be used with a wide array of extant compilers written for that project.

LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at run-time.

LLVM supports a language-independent instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

 and type system
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...

. Each instruction is in static single assignment form
Static single assignment form
In compiler design, static single assignment form is a property of an intermediate representation , which says that each variable is assigned exactly once...

 (SSA), meaning that each variable
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...

 (called a typed register) is assigned once and is frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IF to machine code in a just-in-time compiler (JIT) in a fashion similar to Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. The type system consists of basic types such as integers
Integer (computer science)
In computer science, an integer is a datum of integral data type, a data type which represents some finite subset of the mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values....

 or floats
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

 and five derived types: pointers, arrays, vectors
Array data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...

, structures
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

, and functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a combination of structures, functions and arrays of function pointer
Function pointer
A function pointer is a type of pointer in C, C++, D, and other C-like programming languages, and Fortran 2003. When dereferenced, a function pointer can be used to invoke a function and pass it arguments just like a normal function...

s.

The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for partial evaluation
Partial evaluation
In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs which run faster than the originals while being guaranteed to behave in the same way...

 in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the OpenGL
OpenGL
OpenGL is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL...

 pipeline of Mac OS X Leopard (v10.5) to provide support for missing hardware features.
Graphics code within the OpenGL stack was left in intermediate form, and then compiled when run on the target machine. On systems with high-end GPUs, the resulting code was quite thin, passing the instructions onto the GPU with minimal changes. On systems with low-end GPUs, LLVM would compile optional procedures that run on the local central processing unit
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA
Intel GMA
The Intel Graphics Media Accelerator, or GMA, is a series of Intel integrated graphics processors built into various motherboard chipsets....

 chipsets. A similar system was developed under the Gallium3D LLVMpipe, and incorporated into the GNOME
GNOME
GNOME is a desktop environment and graphical user interface that runs on top of a computer operating system. It is composed entirely of free and open source software...

 shell to allow it to run without a GPU.

In contrast, in the cases where raw performance is benchmarked, LLVM 2.9 trails GCC in code quality (speed of the compiled programs) by about 10% on average, while compiling 20-30% faster.

Front ends

LLVM was originally written to be a replacement for the existing code generator in the GCC stack,
and many of the GCC front ends have been modified to work with it. LLVM currently supports compiling of Ada
Ada (programming language)
Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

, C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, D
D (programming language)
The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++...

, Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

, and Objective-C
Objective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...

, using various front ends, some derived from version 4.0.1 and 4.2 of the GNU Compiler Collection
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 (GCC).

Widespread interest in LLVM has led to a number of efforts to develop entirely new front ends for a variety of languages. The one that has received the most attention is Clang
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages. It uses the Low Level Virtual Machine as its back end, and Clang has been part of LLVM releases since LLVM 2.6....

, a new compiler supporting C, Objective-C and C++. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a modern system that is more easily integrated with integrated development environment
Integrated development environment
An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...

s (IDEs), and has wider support for multithreading
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

. Objective-C development under GCC was stagnant and Apple's changes to the language were supported in a separately maintained branch. Creating their own compiler let them address many of the same problems LLVM addressed for IDE integration and other modern features, while also making the main development branch the main Objective-C implementation.

The Utrecht Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

 compiler can generate code for LLVM which, though the generator is in the early stages of development, has been shown in many cases to be more efficient than the C code generator.
The Glasgow Haskell Compiler
Glasgow Haskell Compiler
The Glorious Glasgow Haskell Compilation System, more commonly known as the Glasgow Haskell Compiler or GHC, is an open source native code compiler for the functional programming language Haskell. The lead developers are Simon Peyton Jones and Simon Marlow...

 (GHC) has a working backend for LLVM that achieves a 30% speed-up of the compiled code when compared to native code compiling via GHC or C code generation followed by compilation, missing only one of the many optimization techniques implemented by the GHC.

There are many other components in various stages of development; including, but not limited to, a Java bytecode
Java bytecode
Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

 front end, a Common Intermediate Language
Common Intermediate Language
Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and is used by the .NET Framework and Mono...

 (CIL) front end, a CPython
CPython
CPython is the default, most-widely used implementation of the Python programming language. It is written in C. In addition to CPython, there are two other production-quality Python implementations: Jython, written in Java, and IronPython, which is written for the Common Language Runtime. There...

 front end,

the MacRuby
MacRuby
MacRuby is an implementation of the Ruby language that runs on the Objective-C runtime and CoreFoundation framework under development by Apple Inc. which "is supposed to replace RubyCocoa". It is based on Ruby 1.9 and uses the high performance Low Level Virtual Machine compiler infrastructure...

 implementation of Ruby 1.9, various front ends for Standard ML
Standard ML
Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

, and a new graph coloring
Graph coloring
In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices share the...

 register allocator.

See also

  • Clang
    Clang
    Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages. It uses the Low Level Virtual Machine as its back end, and Clang has been part of LLVM releases since LLVM 2.6....

     C/C++ compiler
  • LLDB (debugger)
    LLDB (debugger)
    The LLDB Debugger is a high-performance debugger. It is built as a set of reusable components which highly leverage existing libraries in the larger LLVM Project, such as the Clang expression parser and LLVM disassembler.-Current State:...

  • GNU lightning
    GNU lightning
    GNU lightning is a free software library that generates assembly language code at run-time. Supported backends are SPARC , x86 and PowerPC . An ARM port is under way.- Advantages Over Other Libraries :...

  • GNU Compiler Collection
    GNU Compiler Collection
    The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

     (GCC)
  • DotGNU
    DotGNU
    DotGNU is a part of the GNU Project that aims to provide a free software replacement for Microsoft's .NET Framework by Free Software Foundation...

  • Pure (programming language)
    Pure (programming language)
    Pure is a dynamically typed, functional programming language based on term rewriting. It has facilities for user-defined operator syntax, macros, multiple-precision numbers, and compilation to native code through the LLVM...

  • OpenCL
    OpenCL
    OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...

  • Comparison of application virtual machines
    Comparison of Application Virtual Machines
    This article lists some software virtual machines that are typically used for allowing application bytecode to be portably run on many different computer architectures and operating systems. The application is usually run on the computer using an interpreter or just-in-time compilation...


External links

, LLVM Compiler Infrastructure
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK