Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Digital signal processor

Digital signal processor

Overview
A digital signal processor (DSP) is a specialized microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 with an architecture optimized for the fast operational needs of digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

.

Digital signal processing algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s typically require a large number of mathematical operations to be performed quickly and repetitively on a set of data. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted again to analog form, as diagrammed below.
Discussion
Ask a question about 'Digital signal processor'
Start a new discussion about 'Digital signal processor'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
A digital signal processor (DSP) is a specialized microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 with an architecture optimized for the fast operational needs of digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

.

Typical characteristics


Digital signal processing algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s typically require a large number of mathematical operations to be performed quickly and repetitively on a set of data. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted again to analog form, as diagrammed below. Many DSP applications have constraints on latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.

Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power supply and space constraints. A specialized digital signal processor, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialized cooling or large batteries.

The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.

Architecture


By the standards of general-purpose processors, DSP instruction sets are often highly irregular. One implication for software architecture is that hand-optimized assembly-code
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 routines are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms.

Hardware features visible through DSP instruction sets commonly include:
  • Hardware modulo addressing, allowing circular buffer
    Circular buffer
    A circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end.This structure lends itself easily to buffering data streams.-Uses:...

    s to be implemented without having to constantly test for wrapping.
  • A memory architecture designed for streaming data, using DMA
    Direct memory access
    Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

     extensively and expecting code to be written to know about cache hierarchies and the associated delays.
  • Driving multiple arithmetic units may require memory architectures to support several accesses per instruction cycle
  • Separate program and data memories (Harvard architecture
    Harvard architecture
    The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters...

    ), and sometimes concurrent access on multiple data busses
  • Special SIMD
    SIMD
    Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

     (single instruction, multiple data) operations
  • Some processors use VLIW
    Very long instruction word
    Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...

     techniques so each instruction drives multiple arithmetic units in parallel
  • Special arithmetic operations, such as fast multiply–accumulates (MACs). Many fundamental DSP algorithms, such as FIR filters
    Finite impulse response
    A finite impulse response filter is a type of a signal processing filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response filters, which have internal feedback and may continue to respond indefinitely...

     or the Fast Fourier transform
    Fast Fourier transform
    A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...

     (FFT) depend heavily on multiply–accumulate performance.
  • Bit-reversed addressing, a special addressing mode
    Addressing mode
    Addressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand of each instruction...

     useful for calculating FFTs
  • Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing
  • Deliberate exclusion of a memory management unit
    Memory management unit
    A memory management unit , sometimes called paged memory management unit , is a computer hardware component responsible for handling accesses to memory requested by the CPU...

    . DSPs frequently use multi-tasking operating systems, but have no support for virtual memory
    Virtual memory
    In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...

     or memory protection. Operating systems that use virtual memory require more time for context switching among processes
    Process (computing)
    In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

    , which increases latency.

Program flow

  • Floating-point unit integrated directly into the datapath
    Datapath
    A datapath is a collection of functional units, such as arithmetic logic units or multipliers, that perform data processing operations. Most central processing units consist of a datapath and a control unit, with a large part of the control unit dedicated to regulating the interaction between the...

  • Pipelined architecture
  • Highly parallel multiplier–accumulators (MAC units)
  • Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations

Memory architecture

  • DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time:
    • Super Harvard architecture
    • Harvard architecture
      Harvard architecture
      The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters...

    • Modified von Neumann architecture
      Von Neumann architecture
      The term Von Neumann architecture, aka the Von Neumann model, derives from a computer architecture proposal by the mathematician and early computer scientist John von Neumann and others, dated June 30, 1945, entitled First Draft of a Report on the EDVAC...

  • Use of direct memory access
    Direct memory access
    Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

  • Memory-address calculation unit

Data operations

  • Saturation arithmetic
    Saturation arithmetic
    Saturation arithmetic is a version of arithmetic in which all operations such as addition and multiplication are limited to a fixed range between a minimum and maximum value. If the result of an operation is greater than the maximum it is set to the maximum, while if it is below the minimum it is...

    , in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
  • Fixed-point arithmetic
    Fixed-point arithmetic
    In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...

     is often used to speed up arithmetic processing
  • Single-cycle operations to increase the benefits of pipelining

Instruction sets

  • Multiply–accumulate (MAC, including fused multiply–add, FMA) operations, which are used extensively in all kinds of matrix
    Matrix (mathematics)
    In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

     operations, such as convolution
    Convolution
    In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

     for filtering, dot product
    Dot product
    In mathematics, the dot product or scalar product is an algebraic operation that takes two equal-length sequences of numbers and returns a single number obtained by multiplying corresponding entries and then summing those products...

    , or even polynomial evaluation (see Horner scheme
    Horner scheme
    In numerical analysis, the Horner scheme , named after William George Horner, is an algorithm for the efficient evaluation of polynomials in monomial form. Horner's method describes a manual process by which one may approximate the roots of a polynomial equation...

    )
  • Instructions to increase parallelism: SIMD
    SIMD
    Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

    , VLIW, superscalar architecture
  • Specialized instructions for modulo
    Modular arithmetic
    In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

     addressing in ring buffers
    Circular buffer
    A circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end.This structure lends itself easily to buffering data streams.-Uses:...

     and bit-reversed addressing mode for FFT
    Fast Fourier transform
    A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...

     cross-referencing
  • Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.

History


Prior to the advent of stand-alone DSP chips discussed below, most DSP applications were implemented using bit-slice
Bit slicing
Bit slicing is a technique for constructing a processor from modules of smaller bit width. Each of these components processes one bit field or "slice" of an operand...

 processors. The AMD 2901
AMD Am2900
Am2900 is a family of integrated circuits created in 1975 by Advanced Micro Devices . They were constructed with bipolar devices, in a bit-slice topology, and were designed to be used as modular components each representing a different aspect of a computer control unit...

 bit-slice chip with its family of components was a very popular choice. There were reference designs from AMD, but very often the specifics of a particular design were application specific. These bit slice architectures would sometimes include a peripheral multiplier chip. Examples of these multipliers were a series from TRW
TRW
TRW Inc. was an American corporation involved in a variety of businesses, mainly aerospace, automotive, and credit reporting. It was a pioneer in multiple fields including electronic components, integrated circuits, computers, software and systems engineering. TRW built many spacecraft,...

 including the TDC1008 and TDC1010, some of which included an accumulator, providing the requisite multiply–accumulate (MAC) function.

In 1978, Intel released the 2920 as an "analog signal processor". It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market. In 1979, AMI released the S2811. It was designed as a microprocessor peripheral, and it had to be initialized by the host. The S2811 was likewise not successful in the market.

In 1980 the first stand-alone, complete DSPs – the NEC µPD7720
NEC µPD7720
The NEC µPD7720 is the name of fixed point digital signal processors from NEC . It was introduced in 1980, at which time it was the first commercial DSP in the industry...

 and AT&T
AT&T
AT&T Inc. is an American multinational telecommunications corporation headquartered in Whitacre Tower, Dallas, Texas, United States. It is the largest provider of mobile telephony and fixed telephony in the United States, and is also a provider of broadband and subscription television services...

 DSP1
AT&T DSP1
The AT&T DSP1 was a pioneering digital signal processor created by Bell Labs.The DSP1 started in 1977 with a Bell Labs study that recommended creating a large-scale integrated circuit for digital signal processing...

 – were presented at the International Solid-State Circuits Conference
International Solid-State Circuits Conference
International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The Conference offers a unique opportunity for engineers working at the cutting edge of IC design to maintain technical currency, and to network with leading...

 '80. Both processors were inspired by the research in PSTN
Public switched telephone network
The public switched telephone network is the network of the world's public circuit-switched telephone networks. It consists of telephone lines, fiber optic cables, microwave transmission links, cellular networks, communications satellites, and undersea telephone cables, all inter-connected by...

 telecommunication
Telecommunication
Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...

s.

The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.

The first DSP produced by Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

 (TI), the TMS32010
Texas Instruments TMS320
Texas Instruments TMS320 is a blanket name for a series of digital signal processors from Texas Instruments. It was introduced on April 8, 1983 through the TMS32010 processor, which was then the fastest DSP on the market....

 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs. Another successful design was the Motorola
Motorola
Motorola, Inc. was an American multinational telecommunications company based in Schaumburg, Illinois, which was eventually divided into two independent public companies, Motorola Mobility and Motorola Solutions on January 4, 2011, after losing $4.3 billion from 2007 to 2009...

 56000
Motorola 56000
The Motorola DSP56000 is a family of digital signal processor chips produced by Motorola Semiconductor starting in the 1980s and is still being produced in more advanced models in the 2000s. The 56k series was quite popular for a time in a number of computers, including the NeXT, Atari Falcon,...

.

About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate tight loops, they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21 ns for a MAC. Members of this generation were for example the AT&T DSP16A or the Motorola DSP56001.

The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier-transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.

The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased, a 3 ns MAC now became possible.

Modern DSPs


Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

 circuitry and a wider bus system. Not all DSP's provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300

Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

 produces the C6000 series DSP’s, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (instructions per second
Instructions per second
Instructions per second is a measure of a computer's processor speed. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches, whereas realistic workloads typically lead to significantly lower IPS values...

), use VLIW (very long instruction word
Very long instruction word
Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction level parallelism . A processor that executes every instruction one after the other may use processor resources inefficiently, potentially leading to poor performance...

), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSP's, and the newest generation C6000 chips support floating point as well as fixed point processing.

Freescale produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz.

XMOS
XMOS
XMOS is a fabless semiconductor company that develops multi-core multi-threaded processors designed to execute several real-time tasks, DSP, and control flow all at once.-Company history:...

 produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80 Mb/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGA's

CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR)
Software-defined radio
A software-defined radio system, or SDR, is a radio communication system where components that have been typically implemented in hardware are instead implemented by means of software on a personal computer or embedded computing devices...

 modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs.

Analog Devices
Analog Devices
Analog Devices, Inc. , known as ADI, is an American multinational semiconductor company specializing in data conversion and signal conditioning technology, headquartered in Norwood, Massachusetts...

 produce the SHARC
Super Harvard Architecture Single-Chip Computer
The Super Harvard Architecture Single-Chip Computer is a high performance floating-point and fixed-point DSP from Analog Devices,...

-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALU
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...

s, SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 instructions and audio processing-specific components and peripherals. The Blackfin
Blackfin
The Blackfin is a family of 16- or 32-bit microprocessors developed, manufactured and marketed by Analog Devices. The family is characterized by their built-in, fixed-point digital signal processor functionality supplied by 16-bit Multiply–accumulates , accompanied on-chip by a small and...

 family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s like μCLinux, velOSity and Nucleus RTOS
Nucleus RTOS
Nucleus OS is a real-time operating system and toolset created by the Embedded Systems Division of Mentor Graphics for various central processing unit platforms. Nucleus OS is an embedded software solution and is in an estimated 2.11 billion devices worldwide.Development is typically done on a...

 while operating on real-time data.

NXP Semiconductors produce DSP's based on TriMedia
TriMedia (Mediaprocessor)
TriMedia is a family of very long instruction word media processors from NXP Semiconductors . TriMedia is a Harvard architecture CPU that features many DSP and SIMD operations to efficiently process audio and video data streams...

 VLIW technology, optimized for audio and video processing. In some products the DSP core is hidden as a fixed-function block into a SoC
System-on-a-chip
A system on a chip or system on chip is an integrated circuit that integrates all components of a computer or other electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single chip substrate...

, but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both fixed-point arithmetic
Fixed-point arithmetic
In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...

 as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding.

Most DSP's use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSP's may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSP's to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.

Generally, DSP's are dedicated integrated circuits; however DSP functionality can also be produced by using field-programmable gate array
Field-programmable gate array
A field-programmable gate array is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence "field-programmable"...

 chips (FPGA’s).

Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the ARM Cortex-A8
ARM Cortex-A8
The ARM Cortex-A8 is a processor core designed by ARM Holdings implementing the ARM v7 instruction set architecture. Compared to the ARM11 core, the Cortex-A8 is dual-issue superscalar, achieving roughly twice the instructions executed per clock cycle....

 and the OMAP3 processors include a Cortex-A8 and C6000 DSP.

External links