Compiler optimization
Encyclopedia
Compiler optimization is the process of tuning the output of a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 to minimize or maximize some attributes of an executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

 computer program. The most common requirement is to minimize the time taken to execute a program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

; a less common one is to minimize the amount of memory occupied. The growth of portable computer
Portable computer
A portable computer is a computer that is designed to be moved from one place to another and includes a display and keyboard. Portable computers, by their nature, are generally microcomputers. Portable computers, because of their size, are also commonly known as 'Lunchbox' or 'Luggable' computers...

s has created a market for minimizing the power
Energy conservation
Energy conservation refers to efforts made to reduce energy consumption. Energy conservation can be achieved through increased efficient energy use, in conjunction with decreased energy consumption and/or reduced consumption from conventional energy sources...

 consumed by a program. Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses less resources.

It has been shown that some code optimization problems are NP-complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

, or even undecidable
Undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is impossible to construct a single algorithm that always leads to a correct yes-or-no answer....

. In practice, factors such as the programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

's willingness to wait for the compiler to complete its task place upper limits on the optimizations that a compiler implementor might provide. (Optimization is generally a very CPU- and memory-intensive process.) In the past, computer memory limitations were also a major factor in limiting which optimizations could be performed. Because of all these factors, optimization rarely produces "optimal" output in any sense, and in fact an "optimization" may impede performance in some cases; rather, they are heuristic methods for improving resource usage in typical programs.

Types of optimizations

Techniques used in optimization can be broken up among various scopes which can affect anything from a single statement to the entire program. Generally speaking, locally scoped techniques are easier to implement than global ones but result in smaller gains. Some examples of scopes include:

Peephole optimization
Peephole optimization
In compiler theory, peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. The set is called a "peephole" or a "window"...

s: Usually performed late in the compilation process after machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 has been generated. This form of optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2 might be more efficiently executed by left-shifting the value or by adding the value to itself. (This example is also an instance of strength reduction
Strength reduction
Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

.)

Local optimizations: These only consider information local to a function definition. This reduces the amount of analysis that needs to be performed (saving time and reducing storage requirements) but means that worst case assumptions have to be made when function calls occur or global variables are accessed (because little information about them is available).

Loop optimization
Loop optimization
In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops...

s: These act on the statements which make up a loop, such as a for loop (e.g., loop-invariant code motion
Loop-invariant code motion
In computer programming, loop-invariant code consists of statements or expressions which can be moved outside the body of a loop without affecting the semantics of the program...

). Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.

Interprocedural or whole-program optimization: These analyze all of a program's source code. The greater quantity of information extracted means that optimizations can be more effective compared to when they only have access to local information (i.e., within a single function). This kind of optimization can also allow new techniques to be performed. For instance function inlining
Inline expansion
In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...

, where a call to a function is replaced by a copy of the function body.

Machine code optimization: These analyze the executable task image of the program after all of a executable machine code has been linked. Some of the techniques that can be applied in a more limited scope, such as macro compression (which saves space by collapsing common sequences of instructions), are more effective when the entire executable task image is available for analysis.

In addition to scoped optimizations there are two further general categories of optimization:

Programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

-independent vs language-dependent: Most high-level languages share common programming constructs and abstractions: decision (if, switch, case), looping (for, while, repeat.. until, do.. while), and encapsulation (structures, objects). Thus similar optimization techniques can be used across languages. However, certain language features make some kinds of optimizations difficult. For instance, the existence of pointers in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 makes it difficult to optimize array accesses (see Alias analysis
Alias analysis
Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location....

). However, languages such as PL/1 (that also supports pointers) nevertheless have available sophisticated optimizing compilers to achieve better performance in various other ways. Conversely, some language features make certain optimizations easier. For example, in some languages functions are not permitted to have side effects
Side effect (computer science)
In computer science, a function or expression is said to have a side effect if, in addition to returning a value, it also modifies some state or has an observable interaction with calling functions or the outside world...

. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can immediately infer that the function's result need be computed only once.

Machine independent vs machine dependent: Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform.

The following is an instance of a local machine dependent optimization. To set a register
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...

 to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

s such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". (A potential problem with this is that XOR may introduce a data dependency on the previous value of the register, causing a pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 stall. However, processors often have XOR of a register with itself as a special case that doesn't cause stalls.)

Factors affecting optimization

The machine itself: Many of the choices about which optimizations can and should be done depend on the characteristics of the target machine. It is sometimes possible to parameterize some of these machine dependent factors, so that a single piece of compiler code can be used to optimize different machines just by altering the machine description parameters. GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 is a compiler which exemplifies this approach.

The architecture of the target CPU: Number of CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

  registers
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...

: To a certain extent, the more registers, the easier it is to optimize for performance. Local variable
Local variable
In computer science, a local variable is a variable that is given local scope. Such a variable is accessible only from the function or block in which it is declared. In programming languages with only two levels of visibility, local variables are contrasted with global variables...

s can be allocated in the registers and not on the stack
Call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...

. Temporary/intermediate results can be left in registers without writing to and reading back from memory.
  • RISC vs CISC
    Complex instruction set computer
    A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...

    : CISC instruction sets often have variable instruction lengths, often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see instruction selection
    Instruction selection
    In computer science, instruction selection is the stage of a compiler backend that transforms its tree-based middle-level intermediate representation into a low-level IR very close to its final target language...

    ).
  • Pipelines
    Instruction pipeline
    An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

    : A pipeline is essentially a CPU broken up into an assembly line
    Assembly line
    An assembly line is a manufacturing process in which parts are added to a product in a sequential manner using optimally planned logistics to create a finished product much faster than with handcrafting-type methods...

    . It allows use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to pipeline stalls: where the CPU wastes cycles waiting for a conflict to resolve.
Compilers can schedule, or reorder, instructions so that pipeline stalls occur less frequently.
  • Number of functional units
    Superscalar
    A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...

    : Some CPUs have several ALUs
    Arithmetic logic unit
    In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...

     and FPUs
    Floating point unit
    A floating-point unit is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root...

    . This allows them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts.
Here again, instructions have to be scheduled so that the various functional units are fully fed with instructions to execute.


The architecture of the machine:
  • Cache
    CPU cache
    A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

     size (256 kiB–12 MiB) and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as inline expansion
    Inline expansion
    In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...

     and loop unrolling may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly utilized section of code (like inner loops in various algorithms) suddenly cannot fit in the cache. Also, caches which are not fully associative have higher chances of cache collisions even in an unfilled cache.
  • Cache/Memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications.


Intended use of the generated code:
Debugging
Debugging
Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...

: While a programmer is writing an application, he will recompile and test often, and so compilation must be fast. This is one reason most optimizations are deliberately avoided during the test/debugging phase. Also, program code is usually "stepped through" (see Program animation
Program animation
Program animation or Stepping refers to the very common debugging method of executing code one "line" at a time. The programmer may examine the state of the program, machine, and related data before and after execution of a particular line of code...

) using a symbolic debugger, and optimizing transformations, particularly those that reorder code, can make it difficult to relate the output code with the line numbers in the original source code. This can confuse both the debugging tools and the programmers using them.
General purpose use: Prepackaged software is very often expected to be executed on a variety of machines and CPUs that may share the same instruction set, but have different timing, cache or memory characteristics. So, the code may not be tuned to any particular CPU, or may be tuned to work best on the most popular CPU and yet still work acceptably well on other CPUs.
Special-purpose use: If the software is compiled to be used on one or a few very similar machines, with known characteristics, then the compiler can heavily tune the generated code to those specific machines (if such options are available). Important special cases include code designed for parallel
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 and vector processor
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...

s, for which special parallelizing compilers are employed.
Embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s: These are a common case of special-purpose use. Embedded software can be tightly tuned to an exact CPU and memory size. Also, system cost or reliability may be more important than the code's speed. So, for example, compilers for embedded software usually offer options that reduce code size at the expense of speed, because memory is the main cost of an embedded computer. The code's timing may need to be predictable, rather than as fast as possible, so code caching might be disabled, along with compiler optimizations that require it.

Common themes

To a large extent, compiler optimization techniques have the following themes, which sometimes conflict.

Optimize the common case: The common case may have unique properties that allow a fast path
Fast path
Fast path is a term used in computer science to describe a path with shorterinstruction path length through a program compared to the 'normal' path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently than the 'normal' path, leaving the latter to handle...

at the expense of a slow path. If the fast path is taken most often, the result is better over-all performance.

Avoid redundancy: Reuse results that are already computed and store them for later use, instead of recomputing them.

Less code: Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory usually results in faster execution. Alternatively, in embedded systems, less code brings a lower product cost.

Fewer jumps by using straight line code, also called branch-free code: Less complicated code. Jumps (conditional or unconditional branches) interfere with the prefetching of instructions, thus slowing down code. Using inlining or loop unrolling can reduce branching, at the cost of increasing binary file
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

 size by the length of the repeated code. This tends to merge several basic block
Basic block
In computing, a basic block is a portion of the code within a program with certain desirable properties that make it highly amenable to analysis. Compilers usually decompose programs into their basic blocks as a first step in the analysis process...

s into one.

Locality: Code and data that are accessed closely together in time should be placed close together in memory to increase spatial locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

.

Exploit the memory hierarchy: Accesses to memory are increasingly more expensive for each level of the memory hierarchy
Memory hierarchy
The term memory hierarchy is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. A 'memory hierarchy' in computer storage distinguishes each...

, so place the most commonly used items in registers first, then caches, then main memory, before going to disk.

Parallelize: Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level.

More precise information is better: The more precise the information the compiler has, the better it can employ any or all of these optimization techniques.

Runtime metrics can help: Information gathered during a test run can be used in profile-guided optimization
Profile-guided optimization
Profile-guided optimization is a compiler optimization technique in computer programming to improve program runtime performance. In contrast to traditional optimization techniques that solely use the source code, PGO uses the results of test runs of the instrumented program to optimize the final...

. Information gathered at runtime (ideally with minimal overhead
Computational overhead
In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

) can be used by a JIT
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

 compiler to dynamically improve optimization.

Strength reduction: Replace complex or difficult or expensive operations with simpler ones. For example, replacing division by a constant with multiplication by its reciprocal, or using induction variable analysis to replace multiplication by a loop index with addition.

Loop optimizations

Some optimization techniques primarily designed to operate on loops include:

Induction variable analysis: Roughly, if a variable in a loop is a simple function of the index variable, such as j := 4*i + 1, it can be updated appropriately each time the loop variable is changed. This is a strength reduction
Strength reduction
Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

, and also may allow the index variable's definitions to become dead code
Dead code
Dead code is a computer programming term for code in the source code of a program which is executed but whose result is never used in any other computation...

. This information is also useful for bounds-checking elimination
Bounds-checking elimination
In computer science, bounds-checking elimination is a compiler optimization useful in programming languages or runtimes that enforce bounds checking, the practice of consistently checking every index into an array to verify that the index is within the defined valid range of indexes...

 and dependence analysis
Dependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...

, among other things.

Loop fission
Loop fission
Loop fission is a compiler optimization technique attempting to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. The goal is to break down large loop body into smaller ones to achieve better utilization of locality of reference. It is the...

 or loop distribution: Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

, both of the data being accessed in the loop and the code in the loop's body.

Loop fusion
Loop fusion
Loop fusion, also called loop jamming, is a compiler optimization, a loop transformation, which replaces multiple loops with a single one.- Example in C : int i, a[100], b[100]; for Loop fusion, also called loop jamming, is a compiler optimization, a loop transformation, which replaces multiple...

 or loop combining: Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data.

Loop inversion
Loop inversion
Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block containing a do..while loop.-Example in C: int i, a[100]; i = 0; while Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block...

: This technique changes a standard while loop into a do/while (also known as repeat/until) loop wrapped in an if conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect
Side effect (computer science)
In computer science, a function or expression is said to have a side effect if, in addition to returning a value, it also modifies some state or has an observable interaction with calling functions or the outside world...

-free, the if guard can be skipped.

Loop interchange
Loop interchange
In compiler theory, loop interchange is the process of exchanging the order of two iteration variables.For example, in the code fragment: for i from 0 to 10 for j from 0 to 20 a[i,j] = i + jloop interchange would result in: for j from 0 to 20...

: These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

, depending on the array's layout.

Loop-invariant code motion
Loop-invariant code motion
In computer programming, loop-invariant code consists of statements or expressions which can be moved outside the body of a loop without affecting the semantics of the program...

: If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion
Loop inversion
Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block containing a do..while loop.-Example in C: int i, a[100]; i = 0; while Loop inversion is a compiler optimization, a loop transformation, which replaces a while loop by an if block...

, because not all code is safe to be hoisted outside the loop.

Loop nest optimization
Loop nest optimization
Loop nest optimization applies a set of loop transformations for the purpose of locality optimization or parallelization or other loop overhead reduction of the loop nests...

: Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by performing the operation over small blocks and by using a loop interchange.

Loop reversal: Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies
Dependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...

 and thus enable other optimizations.

Loop unrolling: Unrolling duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which hurt performance by impairing the instruction pipeline. A "fewer jumps" optimization. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time.

Loop splitting
Loop splitting
Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range.-Loop peeling:...

: Loop splitting attempts to simplify a loop or eliminate dependencies
Dependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...

 by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.

Loop unswitching
Loop unswitching
Loop unswitching is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional. This can improve the parallelization of the loop. Since modern processors can...

: Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional.

Software pipelining
Software pipelining
In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor...

: The loop is restructured in such a way that work done in an iteration is split into several parts and done over several iterations. In a tight loop this technique hides the latency between loading and using values.

Automatic parallelization
Automatic parallelization
Automatic parallelization, also auto parallelization, autoparallelization, or parallelization, the last one of which implies automation when used in context, refers to converting sequential code into multi-threaded or vectorized code in order to utilize multiple processors simultaneously in a...

: A loop is converted into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine, including multi-core machines.

Data-flow optimizations

Data flow
Dataflow
Dataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...

 optimizations, based on Data-flow analysis
Data-flow analysis
Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control flow graph is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The...

, primarily depend on how certain properties of data are propagated by control edges in the control flow graph
Control flow graph
A control flow graph in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.- Overview :...

. Some of these include:

Common subexpression elimination
Common subexpression elimination
In computer science, common subexpression elimination is a compiler optimization that searches for instances of identical expressions , and analyses whether it is worthwhile replacing them with a single variable holding the computed value.- Example :In the following code: a = b * c + g; d = b * c...

: In the expression "(a + b) - (a + b)/4", "common subexpression" refers to the duplicated "(a + b)". Compilers implementing this technique realize that "(a+b)" won't change, and as such, only calculate its value once.

Constant folding
Constant folding
Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.- Constant folding...

 and propagation: replacing expressions consisting of constants (e.g., "3 + 5") with their final value ("8") at compile time, rather than doing the calculation in run-time. Used in most modern languages.

Induction variable recognition and elimination: see discussion above about induction variable analysis.

Alias classification and pointer analysis: in the presence of pointers, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored.

Dead store
Dead store
In Computer programming, if we assign a value to a local variable, but the value is not read by any subsequent instruction, then it is referred to as a Dead Store...

 elimination: removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.

SSA-based optimizations

These optimizations are intended to be done after transforming the program into a special form called static single assignment (see SSA form), in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation.

Global value numbering
Global value numbering
Global value numbering is a compiler optimization based on the SSA intermediate representation. It sometimes helps eliminate redundant code that common subexpression elimination does not. At the same time, however, CSE may eliminate code that GVN does not, so both are often found in modern...

: GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN is able to identify some redundancy that common subexpression elimination
Common subexpression elimination
In computer science, common subexpression elimination is a compiler optimization that searches for instances of identical expressions , and analyses whether it is worthwhile replacing them with a single variable holding the computed value.- Example :In the following code: a = b * c + g; d = b * c...

 cannot, and vice versa.

Sparse conditional constant propagation
Sparse conditional constant propagation
In computer science, sparse conditional constant propagation is an optimization frequently applied in compilers after conversion to static single assignment form . It simultaneously removes some kinds of dead code and propagates constants throughout a program...

: Effectively equivalent to iteratively performing constant propagation, constant folding
Constant folding
Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.- Constant folding...

, and dead code elimination
Dead code elimination
In compiler theory, dead code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has two benefits: it shrinks program size, an important...

 until there is no change, but is much more efficient. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the control flow graph
Control flow graph
A control flow graph in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.- Overview :...

 that this makes unreachable.

Code generator optimizations

Register allocation
Register allocation
In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers...

: The most frequently used variables should be kept in processor registers for fastest access. To find which variables to put in registers an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example Chaitin's algorithm
Chaitin's algorithm
Chaitin's algorithm is a bottom-up, graph coloring register allocation algorithm that uses cost/degree as its spill metric. It is named after its designer, Gregory Chaitin...

 using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried.

Instruction selection
Instruction selection
In computer science, instruction selection is the stage of a compiler backend that transforms its tree-based middle-level intermediate representation into a low-level IR very close to its final target language...

: Most architectures, particularly CISC
Complex instruction set computer
A complex instruction set computer , is a computer where single instructions can execute several low-level operations and/or are capable of multi-step operations or addressing modes within single instructions...

 architectures and those with many addressing mode
Addressing mode
Addressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand of each instruction...

s, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level intermediate representation with. For example, on many processors in the 68000 family and on the x86 architecture, complex addressing modes can be used in statements like "lea 25(a1,d5*4), a0", allowing a single instruction to perform a significant amount of arithmetic with less storage.

Instruction scheduling
Instruction scheduling
In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...

: Instruction scheduling is an important optimization for modern pipelined
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....

 processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics.

Rematerialization
Rematerialization
Rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative to spilling registers to memory. It was conceived by Preston Briggs, Keith D...

: Rematerialization recalculates a value instead of loading it from memory, preventing a memory access. This is performed in tandem with register allocation to avoid spills.

Code factoring: If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes tail-recursion.

Trampolines: Many CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code factoring.

Reordering computations: Based on integer linear programming, restructuring compilers enhance data locality and expose more parallelism by reordering computations. Space-optimizing compilers may reorder code to lengthen sequences that can be factored into subroutines.

Functional language optimizations

Although many of these also apply to non-functional languages, they either originate in, are most easily implemented in, or are particularly critical in functional languages such as Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

 and ML
ML programming language
ML is a general-purpose functional programming language developed by Robin Milner and others in the early 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM...

.

Removing recursion
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...

: Recursion is often expensive, as a function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache. Tail recursive algorithms can be converted to iteration
Iteration
Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration.-Mathematics:Iteration in...

, which does not have call overhead and uses a constant amount of stack space, through a process called tail recursion elimination or tail call optimization. Some functional languages (e.g., Scheme and Erlang) mandate that tail calls be optimized by a conforming implementation, due to their prevalence in these languages.

Deforestation
Deforestation (computer science)
In the theory of programming languages in computer science, deforestation is a program transformation to eliminate tree structures....

 (data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 fusion): Because of the high level nature by which data structures are specified in functional languages such as Haskell, it is possible to combine several recursive functions which produce and consume some temporary data structure so that the data is passed directly without wasting time constructing the data structure.

Other optimizations

Please help separate and categorize these further and create detailed pages for them, especially the more complex ones, or link to one where one exists.

Bounds-checking elimination
Bounds-checking elimination
In computer science, bounds-checking elimination is a compiler optimization useful in programming languages or runtimes that enforce bounds checking, the practice of consistently checking every index into an array to verify that the index is within the defined valid range of indexes...

: Many languages, for example Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, enforce bounds-checking of all array accesses. This is a severe performance bottleneck on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds-checking in many situations where it can determine that the index must fall within valid bounds, for example if it is a simple loop variable.

Branch offset optimization (machine independent): Choose the shortest branch displacement that reaches target

Code-block reordering: Code-block reordering alters the order of the basic block
Basic block
In computing, a basic block is a portion of the code within a program with certain desirable properties that make it highly amenable to analysis. Compilers usually decompose programs into their basic blocks as a first step in the analysis process...

s in a program in order to reduce conditional branches and improve locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

.

Dead code elimination
Dead code elimination
In compiler theory, dead code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has two benefits: it shrinks program size, an important...

: Removes instructions that will not affect the behaviour of the program, for example definitions which have no uses, called dead code
Dead code
Dead code is a computer programming term for code in the source code of a program which is executed but whose result is never used in any other computation...

. This reduces code size and eliminates unnecessary computation.

Factoring out of invariants: If an expression is carried out both when a condition is met and is not met, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g., the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. Also known as total redundancy elimination. A more powerful optimization is partial redundancy elimination
Partial redundancy elimination
In compiler theory, partial redundancy elimination is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program...

 (PRE).

Inline expansion
Inline expansion
In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...

 or macro expansion: When some code invokes a procedure
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing great opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures. A "fewer jumps" optimization.

Jump threading
Jump threading
In computing, jump threading is a compiler optimization of one jump directly to a second jump. If the second condition is a subset or inverse of the first, it can be eliminated, or threaded through the first jump. This is easily done in a single pass through the program, following acyclic chained...

: In this pass, consecutive conditional jumps predicated entirely or partially on the same condition are merged. E.g., } to }, and } to }.

Macro Compression: A space optimization that recognizes common sequences of code, creates subprograms ("code macros") that contain the common code, and replaces the occurrences of the common code sequences with calls to the corresponding subprograms. This is most effectively done as a machine code optimization, when all the code is present. The technique was first used to conserve space in an interpretive byte stream used in an implementation of Macro Spitbol
SPITBOL compiler
SPITBOL is a compiled implementation of the SNOBOL4 language. Originally targeted for the IBM System/360 and System/370 family of computers, it has now been ported to most major microprocessors including the SPARC...

 on microcomputers. The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be NP-Complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

, but efficient heuristics attain near-optimal results.

Reduction of cache collisions: (e.g., by disrupting alignment within a page)

Stack height reduction: Rearrange expression tree to minimize resources needed for expression evaluation.

Test reordering: If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g. comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements lazy evaluation
Lazy evaluation
In programming language theory, lazy evaluation or call-by-need is an evaluation strategy which delays the evaluation of an expression until the value of this is actually required and which also avoids repeated evaluations...

, but can be used only when the tests are not dependent on one another. Short-circuiting
Minimal evaluation
Short-circuit evaluation, minimal evaluation, or McCarthy evaluation denotes the semantics of some Boolean operators in some programming languages in which the second argument is only executed or evaluated if the first argument does not suffice to determine the value of the expression: when the...

 semantics can make this difficult.

Interprocedural optimizations

Interprocedural optimization
Interprocedural optimization
Interprocedural optimization is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length...

 works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and global part. Typical interprocedural optimizations are: procedure inlining, interprocedural dead code elimination
Dead code elimination
In compiler theory, dead code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has two benefits: it shrinks program size, an important...

, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis
Alias analysis
Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location....

, array access analysis
Array access analysis
In computer science, array access analysis is a compiler analysis used to decide the read and write access patterns to elements or portions of arrays....

, and the construction of a call graph
Call graph
A call graph is a directed graph that represents calling relationships between subroutines in a computer program. Specifically, each node represents a procedure and each edge indicates that procedure f calls procedure g...

.

Interprocedural optimization is common in modern commercial compilers from SGI
Silicon Graphics
Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

, Intel, Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

, and Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

. For a long time the open source GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

 was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving. Another good open source compiler with full analysis and optimization infrastructure is Open64
Open64
Open64 is an open source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called MIPSPro. It was initially released in 2000 as GNU GPL software under the name Pro64. The following year, University of...

, which is used by many organizations for research and for commercial purposes.

Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.

Problems with optimization

Early in the history of compilers, compiler optimizations were not as good as hand-written ones. As compiler technologies have improved, good compilers can often generate better code than human programmers, and good post pass optimizers can improve highly hand-optimized code even further. For RISC CPU architectures, and even more so for VLIW hardware, compiler optimization is the key for obtaining efficient code, because RISC instruction sets are so compact that it is hard for a human to manually schedule or combine small instructions to get efficient results. Indeed, these architectures were designed to rely on compiler writers for adequate performance.

However, optimizing compilers are by no means perfect. There is no way that a compiler can guarantee that, for all program source code, the fastest (or smallest) possible equivalent compiled program is output; such a compiler is fundamentally impossible because it would solve the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

.

This may be proven by considering a call to a function, foo. This function returns nothing and does not have side effects (no I/O, does not modify global variables and "live" data structures, etc.). The fastest possible equivalent program would be simply to eliminate the function call. However, if the function foo in fact does not return, then the program with the call to foo would be different from the program without the call; the optimizing compiler will then have to determine this by solving the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

.

Additionally, there are a number of other more practical issues with optimizing compiler technology:
  • Optimizing compilers focus on relatively shallow constant-factor performance improvements and do not typically improve the algorithmic complexity of a solution. For example, a compiler will not change an implementation of bubble sort
    Bubble sort
    Bubble sort, also known as sinking sort, is a simple sorting algorithm that works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which...

     to use mergesort instead.
  • Compilers usually have to support a variety of conflicting objectives, such as cost of implementation, compilation speed and quality of generated code.
  • A compiler typically only deals with a part of a program at a time, often the code contained within a single file or module; the result is that it is unable to consider contextual information that can only be obtained by processing the other files.
  • The overhead
    Computational overhead
    In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

     of compiler optimization: Any extra work takes time; whole-program optimization is time consuming for large programs.
  • The often complex interaction between optimization phases makes it difficult to find an optimal sequence in which to execute the different optimization phases.


Work to improve optimization technology continues. One approach is the use of so-called post-pass optimizers (some commercial versions of which date back to mainframe software of the late 1970s). These tools take the executable output by an "optimizing" compiler and optimize it even further. Post pass optimizers usually work on the assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 or machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...

 level (contrast with compilers that optimize intermediate representations of programs). The performance of post pass compilers are limited by the fact that much of the information available in the original source code is not always available to them.

As processor performance continues to improve at a rapid pace, while memory bandwidth improves more slowly, optimizations that reduce memory bandwidth (even at the cost of making the processor execute relatively more instructions) will become more useful. Examples of this, already mentioned above, include loop nest optimization
Loop nest optimization
Loop nest optimization applies a set of loop transformations for the purpose of locality optimization or parallelization or other loop overhead reduction of the loop nests...

 and rematerialization
Rematerialization
Rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative to spilling registers to memory. It was conceived by Preston Briggs, Keith D...

.

List of compiler optimizations

  • Automatic programming
    Automatic programming
    In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level....

  • automatic parallelization
    Automatic parallelization
    Automatic parallelization, also auto parallelization, autoparallelization, or parallelization, the last one of which implies automation when used in context, refers to converting sequential code into multi-threaded or vectorized code in order to utilize multiple processors simultaneously in a...

  • Constant folding
    Constant folding
    Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.- Constant folding...

  • Algebraic simplifications:
  • Value numbering
  • Copy propagation
    Copy propagation
    In compiler theory, copy propagation is the process of replacing the occurrences of targets of direct assignments with their values. A direct assignment is an instruction of the form x = y, which simply assigns the value of y to x.From the following code:...

  • Constant propagation
  • Sparse conditional constant propagation
    Sparse conditional constant propagation
    In computer science, sparse conditional constant propagation is an optimization frequently applied in compilers after conversion to static single assignment form . It simultaneously removes some kinds of dead code and propagates constants throughout a program...

  • Common subexpression elimination
    Common subexpression elimination
    In computer science, common subexpression elimination is a compiler optimization that searches for instances of identical expressions , and analyses whether it is worthwhile replacing them with a single variable holding the computed value.- Example :In the following code: a = b * c + g; d = b * c...

     (CSE)
  • Partial redundancy elimination
    Partial redundancy elimination
    In compiler theory, partial redundancy elimination is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program...

  • Dead code elimination
    Dead code elimination
    In compiler theory, dead code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has two benefits: it shrinks program size, an important...

  • Induction variable elimination, strength reduction
    Strength reduction
    Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

  • Loop optimization
    Loop optimization
    In compiler theory, loop optimization plays an important role in improving cache performance, making effective use of parallel processing capabilities, and reducing overheads associated with executing loops. Most execution time of a scientific program is spent on loops...

    s
    • Loop invariant
      Loop invariant
      In computer science, a loop invariant is an invariant used to prove properties of loops. Informally, a loop invariant is a statement of the conditions that should be true on entry into a loop and that are guaranteed to remain true on every iteration of the loop...

       code motion
    • Loop unrolling
      Loop unwinding
      Loop unwinding, also known as loop unrolling, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size...

  • Software pipelining
    Software pipelining
    In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor...

  • Inlining
  • Code generator
    • Register allocation
      Register allocation
      In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers...

      : local and global
    • Instruction scheduling
      Instruction scheduling
      In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...

    • Branch predication
      Branch predication
      Branch predication is a strategy in computer architecture design for mitigating the costs usually associated with conditional branches, particularly branches to short sections of code...

    • Tail merging and cross jumping
    • Machine idioms and instruction combining
  • Vectorization
  • Phase ordering
  • Profile-guided optimization
    Profile-guided optimization
    Profile-guided optimization is a compiler optimization technique in computer programming to improve program runtime performance. In contrast to traditional optimization techniques that solely use the source code, PGO uses the results of test runs of the instrumented program to optimize the final...

  • Macro compression

List of static code analyses

  • Alias analysis
    Alias analysis
    Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location....

  • Pointer analysis
    Pointer analysis
    In computer science pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables or storage locations. It is often a component of more complex analyses such as escape analysis...

  • Shape analysis
    Shape analysis (software)
    In program analysis, a shape analysis is a static code analysis technique that discovers and verifies properties of linked, dynamically allocated data structures in computer programs. It is typically used at compile time to find software bugs or to verify high-level correctness properties of...

  • Escape analysis
    Escape analysis
    In programming language compiler optimization theory, escape analysis is a method for determining the dynamic scope of pointers. It is related to pointer analysis and shape analysis....

  • Array access analysis
    Array access analysis
    In computer science, array access analysis is a compiler analysis used to decide the read and write access patterns to elements or portions of arrays....

  • Dependence analysis
    Dependence analysis
    In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...

  • Control flow analysis
    Control flow analysis
    Control flow analysis is a static code analysis technique for determining the control flow of a program. The control flow is expressed as a control flow graph .For many languages, the control flow of a program is explicit in a program's source code...

  • Data flow analysis
    • Use-define chain
      Use-define chain
      A Use-Definition Chain is a data structure that consists of a use, U, of a variable, and all the definitions, D, of that variable that can reach that use without any other intervening definitions...

       analysis
    • Live variable analysis
      Live variable analysis
      In compiler theory, live variable analysis is a classic data flow analysis performed by compilers to calculate for each program point the variables that may be potentially read before their next write, that is, the variables that are live at the exit from each program point.Stated simply: a...

    • Available expression
      Available expression
      In the field of compiler optimizations, available expressions is an analysis algorithm that determines for each point in the program the set of expressions that need not be recomputed. Those expressions are said to be available at such a point...

       analysis

See also

  • Algorithmic efficiency
    Algorithmic efficiency
    In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...

  • Full employment theorem
    Full employment theorem
    In computer science and mathematics, the term full employment theorem has been used to refer to a theorem showing that no algorithm can optimally perform a particular task done by some class of professionals. The name arises because such a theorem ensures that there is endless scope to keep...

  • Just-in-time compilation
    Just-in-time compilation
    In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

    (JIT)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK