Branch predication - AbsoluteAstronomy.com

Branch predication is a strategy in computer architecture

Computer architecture

In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....

design for mitigating the costs usually associated with conditional branch

Branch (computer science)

A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...

es, particularly branches to short sections of code. It does this by allowing each instruction to conditionally either perform an operation or do nothing.

Overview

Most computer program

Computer program

A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

s contain code which will be executed only under specific conditions depending on factors which can not be determined before-hand, for example user input. As the majority of processors

Central processing unit

The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

simply execute the next instruction in a sequence, the traditional solution is to insert branch instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing instruction pipelining, a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see branch prediction.

Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following pseudocode

Pseudocode

In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...

:

if condition
do this
else
do that

On a system that uses conditional branching, this might translate to machine instructions looking similar to:

branch if condition to label 1
do that
branch to label 2
label 1:
do this
label 2:
...

With branch predication, all possible branch paths are executed, the correct path is kept and all others are thrown away. The basic idea is that each instruction is associated with a predicate (the word here used similarly to its usage in predicate logic

Predicate logic

In mathematical logic, predicate logic is the generic term for symbolic formal systems like first-order logic, second-order logic, many-sorted logic or infinitary logic. This formal system is distinguished from other systems in that its formulae contain variables which can be quantified...

) and that the instruction will only be executed if the predicate is true. The machine code for the above example using branch predication might look something like this:

(condition) do this
(not condition) do that

Note that besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the do this and do that blocks of code are short enough.

Typically, in order to claim a system has branch predication, most or all of the instructions must have this ability to execute conditionally based on a predicate.

Advantages

The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of pipelined execution and avoiding problems with the cache

CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

. It also has a number of more subtle benefits:

Functions that are traditionally computed using simple arithmetic and bitwise operation
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...

s may be quicker to compute using predicated instructions.
Predicated instructions with different predicates can be mixed with each other and with unconditional code, allowing better instruction scheduling
Instruction scheduling
In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...

and so even better performance.
Elimination of unnecessary branch instructions can make the execution of necessary branches, such as those that make up loops, faster by lessening the load on branch prediction mechanisms.

Disadvantages

Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying whether that instruction should have an effect. When available memory is limited, as on embedded devices, this space cost can be prohibitive. However, some architectures such as Thumb-2 are able to avoid this issue (see below). Other detriments are the following:

Predication complicates the hardware by adding levels of logic
Control unit
A control unit in general is a central part of the machinery that controls its operation, provided that a piece of machinery is complex and organized enough to contain any such unit. One domain in which the term is specifically used is the area of computer design...

to critical paths
Datapath
A datapath is a collection of functional units, such as arithmetic logic units or multipliers, that perform data processing operations. Most central processing units consist of a datapath and a control unit, with a large part of the control unit dedicated to regulating the interaction between the...

and potentially degrades clock speed.
A predicated block includes cycles for all operations, so shorter paths
Control flow graph
A control flow graph in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.- Overview :...

may take longer and be penalized.

Predication is most effective when paths are balanced or when the longest path is the most frequently executed, but determining such a path is very difficult at compile time, even at the presence of profiling information.

Examples

Predicated instructions were popular in European computer designs of the 1950s, including the Mailüfterl

Mailüfterl

Mailüfterl is an Austrian nickname for the first transistorized computer on the European mainland. The first computers of this kind were TRADIC, Harwell CADET and TX-0.It was built in 1955 at the Vienna University of Technology by Heinz Zemanek...

(1955), the Zuse Z22 (1955), the ZEBRA

ZEBRA (computer)

The ZEBRA was one of the first computers to be designed in the Netherlands, and one of the first Dutch computers to be commercially available...

(1958), and the Electrologica X1

Electrologica X1

The Electrologica X1 was a digital computer designed and manufactured in the Netherlands from 1958 to 1965. About thirty were produced and sold in the Netherlands and abroad....

(1958). The IBM ACS-1

ACS-1

The ACS-1 and ACS-360 are two related supercomputers designed by IBM as part of the IBM Advanced Computing Systems project from 1961 to 1969...

design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats.

In Intel's IA-64 architecture, almost every instruction in the IA-64 instruction set

Instruction set

An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

is predicated. The predicates themselves are stored in special purpose registers

Processor register

In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...

; one of the predicate registers is always true so that unpredicated instructions are simply instructions predicated with the value true. The use of predication is essential in the IA-64 implementation of software pipelining

Software pipelining

In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor...

because it avoids the need for writing separated code for prologs and epilogs.

On the ARM architecture

ARM architecture

ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

, almost all instructions can be conditionally executed. Thirteen different predicates are available, each depending on the four flags Carry, Overflow, Zero, and Negative in some way. The ARM's 16-bit Thumb instruction set has no branch predication, in order to save encoding space, but its successor Thumb-2 overcomes this problem using a special instruction which has no effect other than to supply predicates for the next four instructions.

For a concrete example of code exploiting branch predication, see the ARM assembly example in binary GCD algorithm

Binary GCD algorithm

The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor of two nonnegative integers. It gains a measure of efficiency over the ancient Euclidean algorithm by replacing divisions and multiplications with shifts, which are cheaper when...

Overview

Advantages

Disadvantages

Examples

See also