Multiply-accumulate - AbsoluteAstronomy.com

Computing

Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, especially digital signal processing

Digital signal processing

Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

, the multiply–accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator

Accumulator (computing)

In a computer's central processing unit , an accumulator is a register in which intermediate arithmetic and logic results are stored. Without a register like an accumulator, it would be necessary to write the result of each calculation to main memory, perhaps only to be read right back again for...

. The hardware unit that performs the operation is known as a multiplier–accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation. The MAC operation modifies an accumulator a:

When done with floating point

Floating point

In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

numbers, it might be performed with two rounding

Rounding

Rounding a numerical value means replacing it by another value that is approximately equal but has a shorter, simpler, or more explicit representation; for example, replacing $23.4476 with $23.45, or the fraction 312/937 with 1/3, or the expression √2 with 1.414.Rounding is often done on purpose to...

s (typical in many DSPs), or with a single rounding. When performed with a single rounding, it is called a fused multiply–add (FMA) or fused multiply–accumulate (FMAC).

Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic

Combinational logic

In digital circuit theory, combinational logic is a type of digital logic which is implemented by boolean circuits, where the output is a pure function of the present input only. This is in contrast to sequential logic, in which the output depends not only on the present input but also on the...

followed by an adder

Adder (electronics)

In electronics, an adder or summer is a digital circuit that performs addition of numbers.In many computers and other kinds of processors, adders are used not only in the arithmetic logic unit, but also in other parts of the processor, where they are used to calculate addresses, table indices, and...

and an accumulator register that stores the result. The output of the register is fed back to one input of the adder, so that on each clock cycle, the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. The first processors to be equipped with MAC units were digital signal processor

Digital signal processor

A digital signal processor is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.-Typical characteristics:...

s, but the technique is now also common in general-purpose processors.

In floating-point arithmetic

When done with integer

Integer

The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s, the operation is typically exact (computed modulo

Modular arithmetic

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

some power of 2). However, floating-point numbers have only a certain amount of mathematical precision. That is, digital floating-point arithmetic is generally not associative

Associativity

In mathematics, associativity is a property of some binary operations. It means that, within an expression containing two or more occurrences in a row of the same associative operator, the order in which the operations are performed does not matter as long as the sequence of the operands is not...

or distributive

Distributivity

In mathematics, and in particular in abstract algebra, distributivity is a property of binary operations that generalizes the distributive law from elementary algebra.For example:...

. (See Floating point#Accuracy problems.)
Therefore, it makes a difference to the result whether the multiply–add is performed with two roundings, or in one operation with a single rounding (a fused multiply–add).

Fused multiply–add

A fused multiply–add is a floating-point multiply–add operation performed in one step, with a single rounding. That is, where an unfused multiply–add would compute the product b×c, round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply–add would compute the entire sum a+b×c to its full precision before rounding the final result down to N significant bits.

A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:

Dot product
Dot product
In mathematics, the dot product or scalar product is an algebraic operation that takes two equal-length sequences of numbers and returns a single number obtained by multiplying corresponding entries and then summing those products...
Matrix multiplication
Matrix multiplication
In mathematics, matrix multiplication is a binary operation that takes a pair of matrices, and produces another matrix. If A is an n-by-m matrix and B is an m-by-p matrix, the result AB of their multiplication is an n-by-p matrix defined only if the number of columns m of the left matrix A is the...
Polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

evaluation (e.g., with Horner's rule)
Newton's method
Newton's method
In numerical analysis, Newton's method , named after Isaac Newton and Joseph Raphson, is a method for finding successively better approximations to the roots of a real-valued function. The algorithm is first in the class of Householder's methods, succeeded by Halley's method...

for evaluating functions.

Fused multiply–add can usually be relied on to give more accurate results. However, Kahan has pointed out that it can give problems if used unthinkingly. If is evaluated as using fused multiply–add, then the result may be negative even when due to the first multiplication discarding low significance bits. This could then lead to an error if for instance the square root of the result is then evaluated.

When implemented inside a microprocessor

Microprocessor

A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

, an FMA can actually be faster than a multiply operation followed by an add, even though standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.

A useful benefit of including this instruction is that it allows an efficient software implementation of division

Division (digital)

Several algorithms exist to perform division in digital designs. These algorithms fall into two main categories: slow division and fast division. Slow division algorithms produce one digit of the final quotient per iteration. Examples of slow division include restoring, non-performing restoring,...

and square root

Square root

In mathematics, a square root of a number x is a number r such that r2 = x, or, in other words, a number r whose square is x...

operations, thus eliminating the need for dedicated hardware for those operations.

The FMA operation is included in IEEE 754-2008.

The DEC

Digital Equipment Corporation

Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

VAX

VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...

's POLY instruction is used for evaluating polynomials with Horner's rule using a succession of fused multiply–add steps. This instruction has been a part of the VAX instruction set since its original 11/780 implementation in 1977.

The 1999 standard

C99

C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

of the C programming language

C (programming language)

C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

supports the FMA operation through the fma standard math library function.

The fused multiply–add operation was introduced as multiply–add fused in the IBM POWER1

POWER1

The POWER1 is a multi-chip CPU developed and fabricated by IBM that implemented the POWER instruction set architecture . It was originally known as the “RISC System/6000 CPU” or when an abbreviated form, the “RS/6000 CPU” before introduction of successors required the original name to be replaced...

processor, but has been added to numerous other processors since then:

Fujitsu
Fujitsu
is a Japanese multinational information technology equipment and services company headquartered in Tokyo, Japan. It is the world's third-largest IT services provider measured by revenues....

SPARC64 VI
SPARC64 VI
The SPARC64 VI, code-named Olympus-C, is a microprocessor, developed by Fujitsu. It implements the SPARC V9 instruction set architecture and is compliant with the Joint Programming Specification developed by Fujitsu and Sun. It is used by Fujitsu and Sun Microsystems in their SPARC Enterprise...

(2007) and above
HP
Hewlett-Packard
Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...

PA-8000
PA-8000
The PA-8000 , code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard that implemented the PA-RISC 2.0 instruction set architecture . It was a completely new design with no circuitry derived from previous PA-RISC microprocessors...

(1996) and above
SCE
Sony Computer Entertainment
Sony Computer Entertainment, Inc. is a major video game company specializing in a variety of areas in the video game industry, and is a wholly owned subsidiary and part of the Consumer Products & Services Group of Sony...

-Toshiba
Toshiba
is a multinational electronics and electrical equipment corporation headquartered in Tokyo, Japan. It is a diversified manufacturer and marketer of electrical products, spanning information & communications equipment and systems, Internet-based solutions and services, electronic components and...

Emotion Engine
Emotion Engine
The Emotion Engine is a CPU developed and manufactured by Sony Computer Entertainment and Toshiba for use in the Sony PlayStation 2 video game console, as well as early PlayStation 3 models sold in Japan and North America...

(1999)
Intel Itanium
Itanium
Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...

(2001)
STI Cell
Cell (microprocessor)
Cell is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM, an alliance known as "STI". The architectural design and first implementation were carried out at the STI Design Center in Austin, Texas over a four-year period beginning March 2001 on a...

(2006)
(MIPS
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

-compatible) Loongson-2F (2008).
ARM with VFPv4 (which is optional)

An FMA instruction will be implemented in the newer AMD CPUs like 'Bulldozer

Bulldozer (processor)

Bulldozer is the codename Advanced Micro Devices has given to one of the next-generation CPU cores after the K10 microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is a completely new design...

' with FMA4

FMA instruction set

The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

support. Intel plans to implement FMA3

FMA instruction set

The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add operations...

in processors using its Haswell microarchitecture, due in late 2012.

FMA capability is also present in the NVIDIA

NVIDIA

Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...

GeForce 200 Series

The GeForce 200 Series is the 10th generation of Nvidia's GeForce graphics processing units. The series also represents the continuation of the company's unified shader architecture introduced with the GeForce 8 Series and the GeForce 9 Series. Its primary competition came from ATI's Radeon HD 4000...

(GTX 200) GPUs, GeForce 400 Series

GeForce 400 Series

The GeForce 400 Series is the 11th generation of Nvidia's GeForce graphics processing units. The series was originally slated for production in November 2009, but, after a number of delays, launched on March 26, 2010 with availability following in April 2010....

, GeForce 500 Series

GeForce 500 Series

The GeForce 500 Series is a family of graphics processing units developed by Nvidia, based on the refreshed Fermi architecture. Nvidia officially announced the GeForce 500 series on 9 November 2010 with the launch of the GeForce GTX 580.- Overview :...

GPUs and the NVIDIA Tesla

Nvidia Tesla

The Tesla graphics processing unit is nVidia's third brand of GPUs. It is based on high-end GPUs from the G80 , as well as the Quadro lineup. Tesla is nVidia's first dedicated General Purpose GPU...

C1060
Computing Processor & C2050 / C2070
GPU Computing Processor GPGPU

GPGPU

General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

s. FMA has been added to the AMD Radeon

Radeon

Radeon is a brand of graphics processing units and random access memory produced by Advanced Micro Devices , first launched in 2000 by ATI Technologies, which was acquired by AMD in 2006. Radeon is the successor to the Rage line. There are four different groups, which can be differentiated by...

line with the HD 5000 series.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.