All Topics  
Direct memory access

 

   Email Print
   Bookmark   Link






 

Direct memory access



 
 
Direct memory access (DMA) is a feature of modern computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
s and microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
s that allows certain hardware subsystems within the computer to access system memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 for reading and/or writing independently of the central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
. Many hardware systems use DMA including disk drive controllers, graphics cards, network card
Network card

A network card, network adapter, network interface controller , network interface card, or LAN adapter is a computer hardware component designed to allow computers to communicate over a computer network....
s, sound card
Sound card

A sound card is a computer expansion card that facilitates the input and output of sound to/from a computer under control of computer programs....
s and GPUs
Graphics processing unit

A graphics processing unit or GPU is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than general-purpose Central processing unit for a range of com...
. DMA is also used for intra-chip data transfer in multi-core processors, especially in multiprocessor system-on-chip
MPSoC

The multiprocessor System-on-Chip is a system-on-a-chip which uses multiple processors , usually targeted for embedded applications. It is used by platforms that contain multiple, usually heterogeneous, processing elements with specific functionalities reflecting the need of the expected application domain, a memory hierarchy and I/O comp...
s, where its processing element is equipped with a local memory (often called scratchpad memory
Scratchpad RAM

Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress....
) and DMA is used for transferring data between the local memory and the main memory.






Discussion
Ask a question about 'Direct memory access'
Start a new discussion about 'Direct memory access'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Direct memory access (DMA) is a feature of modern computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
s and microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
s that allows certain hardware subsystems within the computer to access system memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 for reading and/or writing independently of the central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
. Many hardware systems use DMA including disk drive controllers, graphics cards, network card
Network card

A network card, network adapter, network interface controller , network interface card, or LAN adapter is a computer hardware component designed to allow computers to communicate over a computer network....
s, sound card
Sound card

A sound card is a computer expansion card that facilitates the input and output of sound to/from a computer under control of computer programs....
s and GPUs
Graphics processing unit

A graphics processing unit or GPU is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than general-purpose Central processing unit for a range of com...
. DMA is also used for intra-chip data transfer in multi-core processors, especially in multiprocessor system-on-chip
MPSoC

The multiprocessor System-on-Chip is a system-on-a-chip which uses multiple processors , usually targeted for embedded applications. It is used by platforms that contain multiple, usually heterogeneous, processing elements with specific functionalities reflecting the need of the expected application domain, a memory hierarchy and I/O comp...
s, where its processing element is equipped with a local memory (often called scratchpad memory
Scratchpad RAM

Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress....
) and DMA is used for transferring data between the local memory and the main memory. Computers that have DMA channels can transfer data to and from devices with much less CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 overhead than computers without a DMA channel. Similarly a processing element inside a multi-core processor can transfer data to and from its local memory without occupying its processor time, overlapping computation and data transfer.

Without DMA, using programmed input/output
Programmed input/output

Programmed input/output is a method of transferring data between the Central processing unit and a peripheral such as a network adapter or an AT Attachment storage device....
 (PIO) mode for communication with peripheral devices, or load/store instructions in the case of multicore chips, the CPU is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. This is especially useful in real-time computing
Real-time computing

In computer science, real-time computing is the study of Computer hardware and computer software systems that are subject to a "real-time constraint"?i.e., operational deadlines from event to system response....
 applications where not stalling behind concurrent operations is critical. Another and related application area is various forms of stream processing
Stream processing

Stream processing is a computer programming paradigm, related to SIMD, that allows some applications to more easily exploit a limited form of parallel computing....
 where it is essential to have data processing and transfer in parallel, in order to achieve sufficient throughput.

Principle

DMA is an essential feature of all modern computers, as it allows devices to transfer data without subjecting the CPU to a heavy overhead. Otherwise, the CPU would have to copy each piece of data from the source to the destination, making it unavailable for other tasks. This situation is aggravated because access to I/O devices over a peripheral bus is generally slower than normal system RAM. With DMA, the CPU gets freed from this overhead and can do useful tasks during data transfer (though the CPU bus would be partly blocked by DMA). In the same way, a DMA engine in an embedded processor allows its processing element to issue a data transfer and carries on its own task while the data transfer is being performed.

A DMA transfer copies a block of memory from one device to another. While the CPU initiates the transfer by issuing a DMA command, it does not execute it. For so-called "third party" DMA, as is normally used with the ISA
Industry Standard Architecture

Industry Standard Architecture was a computer bus standard for IBM compatible computers....
 bus, the transfer is performed by a DMA controller which is typically part of the motherboard chipset. More advanced bus designs such as PCI
Peripheral Component Interconnect

The PCI Local Bus , or Conventional PCI, is a computer bus for attaching computer hardware in a computer. These devices can take either the form of an integrated circuit fitted onto the motherboard itself, called a planar device in the PCI specification or an expansion card that fits into a socket....
 typically use bus mastering
Bus mastering

In computing, bus mastering is a feature supported by many computer buss that enables a device connected to the bus to initiate transactions. Also called "First-party DMA", to contrast it with Third-party DMA, the situation where the system DMA controller is actually doing the transfer....
 DMA, where the device takes control of the bus and performs the transfer itself. In an embedded processor
System-on-a-chip

System-on-a-chip or system on chip refers to integrating all components of a computer or other Electronics system into a single integrated circuit ....
 or multiprocessor system-on-chip
MPSoC

The multiprocessor System-on-Chip is a system-on-a-chip which uses multiple processors , usually targeted for embedded applications. It is used by platforms that contain multiple, usually heterogeneous, processing elements with specific functionalities reflecting the need of the expected application domain, a memory hierarchy and I/O comp...
, it is a DMA engine connected to the on-chip bus that actually administers the transfer of the data, in coordination with the flow control mechanisms of the on-chip bus.

A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. Such an operation does not stall
Stall

In aerodynamics, a stall is a sudden reduction in the lift forces generated by an airfoil. This occurs when the critical angle of attack of the airfoil is exceeded, typically about 15 degrees but may vary a lot depending of the airfoil and Reynolds number....
 the processor, which as a result can be scheduled to perform other tasks. DMA is essential to high performance embedded system
Embedded system

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints....
s. It is also essential in providing so-called zero-copy
Zero-copy

"Zero-copy" describes computer operations in which the Central processing unit does not perform the task of copying data from one RAM area to another....
 implementations of peripheral device driver
Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s as well as functionalities such as network packet routing
Routing

Routing is the process of selecting paths in a network along which to send network traffic. Routing is performed for many kinds of networks, including the PSTN, Computer network , and transport network....
, audio playback
Digital audio

Digital audio uses digital signals for sound reproduction. This includes Analog-to-digital converter, Digital-to-analog converter, storage, and transmission....
 and streaming video. Multicore embedded processors (in the form of multiprocessor system-on-chip
MPSoC

The multiprocessor System-on-Chip is a system-on-a-chip which uses multiple processors , usually targeted for embedded applications. It is used by platforms that contain multiple, usually heterogeneous, processing elements with specific functionalities reflecting the need of the expected application domain, a memory hierarchy and I/O comp...
) often use one or more DMA engines in combination with scratchpad memories
Scratchpad RAM

Scratchpad memory , also known as scratchpad, scatchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress....
 for both increased efficiency and lower power consumption. In computer clusters for high-performance computing
High-performance computing

High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers....
, DMA among multiple computing nodes is often used under the name of remote DMA
Remote Direct Memory Access

Remote Direct Memory Access allows direct memory access from the main memory of one computer into that of another without involving either one's operating system....
.

Cache coherency problem

DMA can lead to cache coherency
Cache coherency

In computing, cache coherence refers to the integrity of data stored in local caches of a shared resource. Cache coherence is a special case of memory coherence....
 problems. Imagine a CPU equipped with a cache and an external memory that can be accessed directly by devices using DMA. When the CPU accesses location X in the memory, the current value will be stored in the cache. Subsequent operations on X will update the cached copy of X, but not the external memory version of X. If the cache is not flushed to the memory before the next time a device tries to access X, the device will receive a stale value of X.

Similarly, if the cached copy of X is not invalidated when a device writes a new value to the memory, then the CPU will operate on a stale value of X.

This issue can be addressed in one of two ways in system design: Cache-coherent systems implement a method in hardware whereby external writes are signaled to the cache controller which then invalidates (for DMA reads) or flushes (for DMA writes) the cache lines in question. Non-coherent systems leave this to software, where the OS must then ensure that the cache lines are flushed before an outgoing DMA transfer is started and invalidated before a memory range affected by an incoming DMA transfer is accessed. The OS must make sure that the memory range is not accessed by any running threads in the meantime. The latter approach introduces some overhead to the DMA operation, as most hardware requires a loop to invalidate each cache line individually.

Hybrids also exist, where the secondary L2 cache is coherent while the L1 cache (typically on-CPU) is managed by software.

DMA engine

In addition to hardware interaction, DMA can also be used to offload expensive memory operations, such as large copies or scatter-gather
Vectored I/O

Vectored I/O, also known as scatter/gather I/O, is a method of input and output by which a single procedure call sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers....
 operations, from the CPU to a dedicated DMA engine

Examples


ISA

For example, a PC
Personal computer

A personal computer is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator....
's ISA
Industry Standard Architecture

Industry Standard Architecture was a computer bus standard for IBM compatible computers....
 DMA controller is based on the Intel 8237
Intel 8237

An Intel chipset to help transferring data from peripheral devices to and from system memory without occupying CPU....
 Multimode DMA controller, that is a software-hardware combination which either consists of or emulates this part. In the first edition of ISA, also known as the PC/XT, there was only one DMA controller capable of providing four DMA channels (numbered 0-3). These DMA channels performed 8-bit transfers and could only address the first megabyte of RAM. With the Intel 80286
Intel 80286

The Intel 286, introduced on February 1, 1982, was an x86 16-bit microprocessor with 134,000 transistors.It was widely used in IBM PC compatible computers during the mid 1980s to early 1990s....
, a second DMA controller, was added (channels 5-7; channel 4 is unusable), it was rewired to be able to address the full 286 memory address space of 16 MB. This second controller performed 16-bit transfers.

Due to their lagging performance (2.5 Mbit/s), these devices have been largely obsolete since the advent of the 80386 processor and its capacity for 32-bit transfers. They are still supported to the extent they are required to support built-in legacy PC hardware on modern machines.

Each DMA channel has a 16-bit address register and a 16-bit count register associated with it. To initiate a data transfer the device driver sets up the DMA channel's address and count registers together with the direction of the data transfer, read or write. It then instructs the DMA hardware to begin the transfer. When the transfer is complete, the device interrupts the CPU.

Scatter-gather DMA allows the transfer of data to and from multiple memory areas in a single DMA transaction. It is equivalent to the chaining together of multiple simple DMA requests. The motivation is to off-load multiple input/output
Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
 interrupt and data copy tasks from the CPU.

DRQ stands for DMA request; DACK for DMA acknowledge. These symbols, seen on hardware schematic
Schematic

A schematic is a diagram that represents the elements of a system using abstract, graphic symbols rather than realistic pictures. A schematic usually omits all details that are not relevant to the information the schematic is intended to convey, and may add unrealistic elements that aid comprehension....
s of computer systems with DMA functionality, represent electronic signaling lines between the CPU and DMA controller. Each DMA channel has one Request and one Acknowledge line. A properly configured device that uses DMA must be jumpered (or software-configured) to use both lines of the assigned DMA channel.

Standard ISA DMA assignments:
0 DRAM
Dram

Dram or DRAM may refer to* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dynamic random access memory* Database of Recorded American Music...
 Refresh (obsolete),
1 User hardware,
2 Floppy disk
Floppy disk

A floppy disk is a data storage medium that is composed of a disk of thin, flexible magnetic storage medium encased in a square or rectangle plastic shell....
 controller,
3 Hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
 (obsoleted by PIO
Pio

Pio can refer to*Pio of Pietrelcina , the stigmatic Capuchin friar, a Roman Catholic saint*Pio Terei, a New Zealand actor, singer and comedian...
 modes, and replaced by UDMA
Udma

Udma is a census town in Kasaragod district in the Indian States and territories of India of Kerala....
 modes),
4 Cascade from XT DMA controller,
5 Hard Disk (PS/2
PS/2

Not to be confused with the Sony PlayStation 2 also known as the PS2.PS/2 may refer to* The IBM Personal System/2, a computer released in 1987...
 only), user hardware for all others,
6 User hardware,
7 User hardware.


PCI

As mentioned above, a PCI
Peripheral Component Interconnect

The PCI Local Bus , or Conventional PCI, is a computer bus for attaching computer hardware in a computer. These devices can take either the form of an integrated circuit fitted onto the motherboard itself, called a planar device in the PCI specification or an expansion card that fits into a socket....
 architecture has no central DMA controller, unlike ISA. Instead, any PCI component can request control of the bus ("become the bus master") and request to read and write from the system memory. More precisely, a PCI component requests bus ownership from the PCI bus controller (usually the southbridge
Southbridge (computing)

The Southbridge, also known as an Input/output Controller Hub or a Platform Controller Hub in Intel systems , is a chip that implements the "slower" capabilities of the motherboard in a northbridge/southbridge chipset computer architecture....
 in a modern PC design), which will arbitrate
Arbiter (electronics)

Arbiters are electronic devices that allocate access to shared resources....
 if several devices request bus ownership simultaneously, since there can only be one bus master at one time. When the component is granted ownership, it will issue normal read and write commands on the PCI bus, which will be claimed by the bus controller and forwarded to the memory controller using a scheme which is specific to every chipset.

As an example, on a modern AMD Socket AM2
Socket AM2

The Socket AM2, renamed from Socket M2 , is a CPU socket designed by AMD for desktop processors, including the performance, mainstream and value segments....
-based PC, the southbridge will forward the transactions to the northbridge
Northbridge (computing)

The northbridge, also known as a memory controller hub or an integrated memory controller in Intel systems , is one of the two chips in the core logic chipset on a PC motherboard, the other being the Southbridge ....
 (which is integrated on the CPU die) using HyperTransport
HyperTransport

HyperTransport , formerly known as Lightning Data Transport , is a bidirectional serial/parallel high-bandwidth, Memory latency Point-to-point that was introduced on April 2 2001....
, which will in turn convert them to DDR2
DDR2

DDR2 can refer to:* DDR2 SDRAM* discoidin domain receptor tyrosine kinase 2...
 operations and send them out on the DDR2 memory bus. As can be seen, there are quite a number of steps involved in a PCI DMA transfer; however, that poses little problem, since the PCI device or PCI bus itself are an order of magnitude slower than rest of components (see list of device bandwidths
List of device bandwidths

This is a list of device bandwidths: the net bit rate of some computer devices employing methods of data transport is quantified in units of kilobits per second , megabits per second , or gigabits per second as appropriate....
).

A modern x86 CPU may use more than 4 GB of memory, utilizing PAE
Physical Address Extension

In computing, Physical Address Extension is a feature of x86 and x86-64 processors that enable the use of more than 4 gigabytes of physical memory to be used in 32-bit systems, given appropriate operating system support....
, a 36-bit addressing mode. In such a case, a device using DMA with a 32-bit address bus is unable to address memory above the 4 GB line. The new Double Address Cycle (DAC) mechanism, if implemented on both the PCI bus and the device itself, enables 64-bit DMA addressing. Otherwise, the operating system would need to work around the problem by using costly double buffers (Windows nomenclature) also known as bounce buffers (Linux).

IO Accelerator in Xeon

As an example of DMA engine incorporated in a general-purpose CPU, newer Intel Xeon
Xeon

The Xeon brand refers to many families of Intel Corporation's x86 architecture multiprocessing Central processing units ? for dual processor and multi-processor configuration on a single motherboard targeted at non-consumer markets of server and workstation computers, and also at blade servers and embedded systems....
 chipsets include a DMA engine technology called I/O Acceleration Technology (I/OAT), meant to improve network performance on high-throughput network interfaces, in particular gigabit Ethernet
Gigabit Ethernet

Gigabit Ethernet is a term describing various technologies for transmitting Ethernet frames at a rate of a Data rate units#gigabit_per_second, as defined by the IEEE 802.3-2005 standard....
 and faster. However, various benchmarks with this approach by Intel's Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
 developer Andrew Grover indicate no more than 10% improvement in CPU utilization with receiving workloads, and no improvement when transmitting data.

AHB

In systems-on-a-chip
System-on-a-chip

System-on-a-chip or system on chip refers to integrating all components of a computer or other Electronics system into a single integrated circuit ....
 and embedded system
Embedded system

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints....
s, typical system bus infrastructure is a complex on-chip bus such as AMBA AHB
AMBA High-performance Bus

AHB is a bus protocol introduced in AMBA Specification version 2 published by ARM Ltd company.In addition to previous release, it has the following features:...
. AMBA defines two kinds of AHB components: master and slave. A slave interface is similar to programmed I/O through which the software (running on embedded CPU, e.g. ARM
Arm

In anatomy, an arm is one of the upper limbs of an animal. The term arm can also be used for analogous structures, such as one of the paired upper limbs of a four-legged animal, or the cephalopod arm....
) can write/read I/O registers or (less commonly) local memory blocks inside the device. A master interface can be used by the device to perform DMA transactions to/from system memory without heavily loading the CPU.

Therefore high bandwidth devices such as network controllers that need to transfer huge amounts of data to/from system memory will have two interface adapters to the AHB bus: a master and a slave interface. This is because on-chip buses like AHB do not support tri-stating
Three-state logic

In digital electronics three-state, tri-state, or 3-state logic gate allows output ports to have a value of logical 0, 1, or Hi-Z. A Hi-Z output puts the pin in a high impedance state, effectively removing the pin from its influence on the circuit....
 the bus or alternating the direction of any line on the bus. Like PCI, no central DMA controller is required since the DMA is bus-mastering, but an arbiter
Arbiter (electronics)

Arbiters are electronic devices that allocate access to shared resources....
 is required in case of multiple masters present on the system.

Internally, a multichannel DMA engine is usually present in the device to perform multiple concurrent scatter-gather
Vectored I/O

Vectored I/O, also known as scatter/gather I/O, is a method of input and output by which a single procedure call sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers....
 operations as programmed by the software.

Cell

As an example usage of DMA in a multiprocessor-system-on-chip
MPSoC

The multiprocessor System-on-Chip is a system-on-a-chip which uses multiple processors , usually targeted for embedded applications. It is used by platforms that contain multiple, usually heterogeneous, processing elements with specific functionalities reflecting the need of the expected application domain, a memory hierarchy and I/O comp...
, IBM/Sony/Toshiba's Cell processor incorporates a DMA engine for each of its 9 processing elements including one power-processor element (PPE) and eight synergistic processor elements (SPEs). Since the SPE's load/store instructions can read/write only its own local memory, an SPE entirely depends on DMAs to transfer data to and from the main memory and local memories of other SPEs. Thus the DMA acts as a primary means of data transfer among cores inside this CPU
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 (in contrast to cache-coherent CMP architectures such as Intel's coming general-purpose GPU
GPGPU

General-purpose computing on graphics processing units is the technique of using a graphics processing unit, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit....
, Larrabee
Larrabee (GPU)

Larrabee is the Codename for a graphics processing unit chip that Intel is developing separately from its Intel GMA. Larrabee is expected to compete with GeForce and Radeon products from NVIDIA and ATI Technologies respectively....
).

DMA in Cell is fully cache coherent (note however local stores of SPEs operated upon by DMA do not act as globally coherent cache in the standard sense
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
). In both read ("get") and write ("put"), a DMA command can transfer either a single block area of size up to 16KB, or a list of 2 to 2048 such blocks. The DMA command is issued by specifying a pair of a local address and a remote address: for example when a SPE program issues a put DMA command, it specifies an address of its own local memory as the source and a virtual memory address (pointing to either the main memory or the local memory of another SPE) as the target, together with a block size. According to a recent experiment, an effective peak performance of DMA in Cell (3 GHz, under uniform traffic) reaches 200GB per second.

See also

  • Remote Direct Memory Access
    Remote Direct Memory Access

    Remote Direct Memory Access allows direct memory access from the main memory of one computer into that of another without involving either one's operating system....
  • Blitter
    Blitter

    In a computer system, a blitter is a co-processor or a logic block on a microprocessor that is dedicated to rapid data transfer within that computer's RAM....
  • AT Attachment
    AT Attachment

    AT Attachment and AT Attachment Packet Interface are Electrical connector standardization for the connection of computer storage devices such as hard disks, solid-state drives, and CD-ROM drives in computers....