All Topics  
Multiply-accumulate

 

   Email Print
   Bookmark   Link






 

Multiply-accumulate



 
 
In computing, especially digital signal processing
Digital signal processing

Digital signal processing is concerned with the representation of the signal s by a sequence of numbers or symbols and the processing of these signals....
, multiply-accumulate is a common operation that computes the product of two numbers and adds that product to an accumulator
Accumulator (computing)

In a computer's central processing unit , an accumulator is a processor register in which intermediate arithmetic logic unit results are stored....
.

When done with floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 numbers it might be performed with two round
Round

Round or rounds can mean:* The shape of a circle or sphere* Rounding , the smoothness of a sediment particle* Roundedness, the roundedness of the lips in the pronunciation of a phoneme...
ings (typical in many DSPs) or with a single rounding. When performed with a single rounding, it is called a fused multiply-add (FMA) or fused multiply-accumulate (FMAC).

Modern computers may contain a dedicated multiply-accumulate unit
Execution unit

In computer engineering, an execution unit is a part of a central processing unit that performs the operations and calculations called for by the computer program....
, or "MAC-unit", consisting of a multiplier implemented in combinational logic
Combinational logic

In digital circuit theory, combinational logic is a type of logic circuit whose output is a pure function of the present input only. This is in contrast to sequential logic, in which the output depends not only on the present input but also on the history of the input....
 followed by an adder
Adder (electronics)

In electronics, an adder or summer is a digital circuit that performs addition of numbers.In modern computers adders reside in the arithmetic logic unit where other operations are performed....
 and an accumulator register which stores the result when clocked.






Discussion
Ask a question about 'Multiply-accumulate'
Start a new discussion about 'Multiply-accumulate'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computing, especially digital signal processing
Digital signal processing

Digital signal processing is concerned with the representation of the signal s by a sequence of numbers or symbols and the processing of these signals....
, multiply-accumulate is a common operation that computes the product of two numbers and adds that product to an accumulator
Accumulator (computing)

In a computer's central processing unit , an accumulator is a processor register in which intermediate arithmetic logic unit results are stored....
.

When done with floating point
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
 numbers it might be performed with two round
Round

Round or rounds can mean:* The shape of a circle or sphere* Rounding , the smoothness of a sediment particle* Roundedness, the roundedness of the lips in the pronunciation of a phoneme...
ings (typical in many DSPs) or with a single rounding. When performed with a single rounding, it is called a fused multiply-add (FMA) or fused multiply-accumulate (FMAC).

Modern computers may contain a dedicated multiply-accumulate unit
Execution unit

In computer engineering, an execution unit is a part of a central processing unit that performs the operations and calculations called for by the computer program....
, or "MAC-unit", consisting of a multiplier implemented in combinational logic
Combinational logic

In digital circuit theory, combinational logic is a type of logic circuit whose output is a pure function of the present input only. This is in contrast to sequential logic, in which the output depends not only on the present input but also on the history of the input....
 followed by an adder
Adder (electronics)

In electronics, an adder or summer is a digital circuit that performs addition of numbers.In modern computers adders reside in the arithmetic logic unit where other operations are performed....
 and an accumulator register which stores the result when clocked. The output of the register is fed back to one input of the adder, so that on each clock the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. The first processors to be equipped with MAC-units were digital signal processor
Digital signal processor

A digital signal processor is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing....
s, but the technique is now common in general-purpose processors too.

In floating-point arithmetic

When done with integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
s, the operation is typically exact (computed modulo
Modular arithmetic

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value — the modulus....
 some power of 2). However, floating-point numbers have only a certain amount of mathematical precision. That is, digital floating-point arithmetic is generally not associative
Associativity

In mathematics, associativity is a property that a binary operation can have. It means that, within an expression containing two or more of the same associative operators in a row, the order that the operations are performed does not matter as long as the sequence of the operands is not changed....
 or distributive
Distributivity

In mathematics, and in particular in abstract algebra, distributivity is a property of binary operations that generalises the distributive law from elementary algebra....
. (See Floating point#Accuracy problems
Floating point

In computing, floating point describes a system for numerical representation in which a String of digits represents a rational number.The term floating point refers to the fact that the radix point can "float": that is, it can be placed anywhere relative to the Significant figures of the number....
.)

Therefore, it makes a difference to the result whether the multiply-add is performed with two roundings, or in one operation with a single rounding. When performed with a single rounding, the operation is termed a fused multiply-add.

Fused multiply-add

A fused multiply-add is a floating-point multiply-add operation performed in one step, with a single rounding. That is, where an unfused multiply-add would compute the product , round it to N significant bits, add the result to a, and round back to N significant bits, a fused multiply-add would compute the entire sum to its full precision before rounding the final result down to N significant bits.

When implemented in a microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
, this is typically faster than a multiply operation followed by an add. Because of this instruction there is no need for a hardware divide
Division (digital)

Several algorithms exist to perform division in digital designs. These algorithms fall into two main categories: slow division and fast division....
 or square root
Square root

In mathematics, a square root of a number x is a number r such that r2 = x, or, in other words, a number r whose square is x....
 unit, since they can both be implemented efficiently in software using the FMA.

A fast FMA can speed up and improve the accuracy of many computations which involve the accumulation of products:
  • Dot product
    Dot product

    In mathematics, the dot product, also known as the scalar product, is an operation which takes two vector over the real numbers R and returns a real-valued scalar quantity....
  • Matrix multiplication
    Matrix multiplication

    In mathematics, matrix multiplication is the operation of multiplying a matrix with either a scalar or another matrix. This article gives an overview of the various ways to perform matrix multiplication....
  • Polynomial
    Polynomial

    In mathematics, a polynomial is an expression constructed from variables and constants, using the operations of addition, subtraction, multiplication, and constant non-negative whole number exponents....
     evaluation (e.g., with Horner's rule)


The FMA operation is included in IEEE 754-2008.

The 1999 standard of the C programming language
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
 supports the FMA operation through the fma standard math library function.

Fused multiply-add capability is implemented in microprocessors such as the IBM POWER1
POWER1

The POWER1 is a Integrated circuit Central processing unit developed and Semiconductor device fabrication by IBM that implemented the IBM POWER instruction set ....
 (1990) and above, the HAL
HAL Computer Systems

HAL Computer Systems, Inc was a Campbell, California-based computer manufacturer. It was founded in 1990 by Andrew Heller, a principal designer of the original IBM POWER architecture....
/Fujitsu
Fujitsu

is a Japanese company specializing in semiconductors, air conditioners, computers , telecommunications, and Service , and is headquartered in Minato, Tokyo, Tokyo....
 SPARC64
SPARC64

SPARC64 is a microprocessor developed by HAL Computer Systems and fabricated by Fujitsu. It implements the SPARC instruction set architecture , the first microprocessor to do so....
 (1995) and above, the HP
Hewlett-Packard

The Hewlett-Packard Company , commonly referred to as HP, is a technology corporation headquartered in Palo Alto, California, United States....
 PA-8000
PA-RISC family

PA-RISC is an instruction set architecture developed by Hewlett-Packard's Systems & VLSI Technology Operation. As the name implies, it is a RISC architecture, where the PA stands for Precision Architecture....
 (1996) and above, the Intel Itanium
Itanium

Itanium is the brand name for 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel has released two processor families using the brand: the original Itanium and the Itanium 2....
 (2001), and the Cell. It will be implemented in AMD processors with SSE5
SSE5

The SSE5 , announced by Advanced Micro Devices on August 30, 2007, is an extension to the 128-bit Streaming SIMD Extensions core instructions in the AMD64 instruction set for the Bulldozer processor core, due to begin production in 2011....
 instruction set support. Intel plans to implement FMA in processors using its Haswell
Intel Haswell (microarchitecture)

Haswell is the code name for a Central processing unit that is being developed by Intel and is the planned successor to Intel Sandy Bridge . Haswell will be designed for the 22 nm manufacturing process and is planned for a commercial release in 2012....
 microarchitecture, due sometime in 2012.

FMA capability is also present in the NVIDIA
NVIDIA

Nvidia is a multinational corporation specializing in the manufacture of graphics processing unit technologies for workstations, desktop computers, and mobile devices....
 GeForce 200 Series
GeForce 200 Series

The GeForce GTX 200 Series is the tenth generation of NVIDIA's GeForce graphics processing units. The series also represents the continuation of the company's Unified Shader Architecture introduced with the GeForce 8 Series and the GeForce 9 Series....
 (GTX 200) GPUs and the NVIDIA Tesla
Nvidia Tesla

The Tesla Graphics processing unit is NVIDIA third brand of GPUs. It's based on high-end GPUs from the GeForce 8 Series and on, as well as the NVIDIA Quadro lineup....
 T10 GPGPU
GPGPU

General-purpose computing on graphics processing units is the technique of using a graphics processing unit, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit....
s.

Reference