Memory bandwidth
Encyclopedia
Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory
Semiconductor memory
Semiconductor memory is an electronic data storage device, often used as computer memory, implemented on a semiconductor-based integrated circuit. Examples of semiconductor memory include non-volatile memory such as Read-only memory , magnetoresistive random access memory , and flash memory...

 by a processor
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.

Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than (and is guaranteed not to exceed) the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.

Conventions

Perhaps surprisingly, there are at least three different conventions for counting the quantity of data transferred in the numerator of bytes/second, as discussed in more detail in .
  1. bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second.
  2. STREAM convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes. Using the previous 1 million byte copy example, the STREAM bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second.
  3. hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache
    CPU cache
    A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...

     policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware.


The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write.
The STREAM convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform.
The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code. For example, some computer systems have the ability to avoid write allocate traffic using special instructions, leading to the possibility of misleading comparisons of bandwidth based on different amounts of data traffic performed.

Computation

Theoretical maximum memory bandwidth is typically computed by multiplying the width of the interface by the frequency at which it transfers data. This is also referred to as the burst rate of the interface, in recognition of the possibility that this rate may not be sustainable over long periods (i.e., the throughput
Throughput
In communication networks, such as Ethernet or packet radio, throughput or network throughput is the average rate of successful message delivery over a communication channel. This data may be delivered over a physical or logical link, or pass through a certain network node...

 may be less than the theoretical maximum memory bandwidth).

The nomenclature standards often differ across memory technologies, but for commodity DDR SDRAM
DDR SDRAM
Double data rate synchronous dynamic random access memory is a class of memory integrated circuits used in computers. DDR SDRAM has been superseded by DDR2 SDRAM and DDR3 SDRAM, neither of which are either forward or backward compatible with DDR SDRAM, meaning that DDR2 or DDR3 memory modules...

, DDR2 SDRAM
DDR2 SDRAM
DDR2 SDRAM is a double data rate synchronous dynamic random-access memory interface. It supersedes the original DDR SDRAM specification and has itself been superseded by DDR3 SDRAM...

, and DDR3 SDRAM
DDR3 SDRAM
In computing, DDR3 SDRAM, an abbreviation for double data rate type three synchronous dynamic random access memory, is a modern kind of dynamic random access memory with a high bandwidth interface. It is one of several variants of DRAM and associated interface techniques used since the early 1970s...

 memory the computation is:
  • Base DRAM frequency in MHz (millions of DRAM clock cycles per second).
  • Memory interface (or bus) width. Each standard DDR, DDR2, or DDR3 memory interface is 64 bits (8 bytes) wide. (The width is sometimes referred to in lines or lanes, rather than bits, though these are synonymous here.)
  • Number of interfaces. Current computers typically use two memory interfaces in dual-channel mode for an effective 128-bit width.
  • Number of bits per clock cycle per line. This is 2 for DDR, DDR2, and DDR3 dual data rate technologies.


So a recent computer system with a dual-channel configuration and two DDR2-800 modules, each running at 400 MHz (actual bus speed, which is half of the nominal speed of 800 MHz, but in DDR2 is twice the memory's actual clock of 400 MHz), would have a theoretical maximum memory bandwidth of:
  • (400 million hertz * (2 interfaces) * (64 lines/interface) * (2 bits/line-cycle)) = 102,400 Mbit/s, or 12,800 MB/s, or 12.8 GB/s.


The naming conventions of DDR, DDR2 and DDR3 modules typically cite a nominal MHz rating (e.g., DDR2-1066) which is not the bus speed or memory speed, but the number of transfers possible per second, and an additional nominal rating of the maximum throughput of the module (e.g., DDR2-800 is also called PC2-6400) which reflects the theoretical maximum bandwidth in mebibytes per second. So with this in mind, the above computation can be simplified as having two PC2-6400 modules in a dual-channel 128-bit configuration, or 2 × 6,400 MiB/s.

The choice of two memory interfaces in the above example is a common configuration, but single-channel configurations are common in low-end and low-power devices, and more than two channels are used in some high-performance systems. , advanced personal computers and graphics cards use even more combined buses than dual-channel, and combine four (e.g., Mac Pro), five (e.g., nVidia 8800GTS), six (e.g., nVidia 8800GTX), or more sets of 64-bit memory modules and buses to reach 256-bit, 320-bit, 384-bit or greater total memory bus width. In this sort of multi-channel configuration, memory must be broken out so that there is at least one 64-bit wide chip or module for each channel. So for a 256-bit wide 4 GiB configuration with DDR2 modules, one must have 4×1 GiB modules (or 8x512 MiB, 16x256 MiB, etc.) since each of these standard modules provides only a 64-bit interface.

Note that in systems with error-correcting memory, the additional width of the interfaces (typically 72 bits rather than 64 bits) is not counted in the bandwidth computations, as neither the extra memory nor the extra bandwidth is available for user data.

See also



A major factors in real world performance of Random Access Memory systems:
  • CAS latency
    CAS Latency
    Column Address Strobe latency, or CL, is the delay time between the moment a memory controller tells the memory module to access a particular memory column on a RAM memory module, and the moment the data from given array location is available on the module's output pins...

  • SDRAM latency
    SDRAM latency
    SDRAM latency refers to delays in transmitting data between the CPU and SDRAM. SDRAM latency is often measured in memory bus clock cycles. However, the CPU operates faster than the memory, so it must wait while the proper segment of memory is located and read, before the data can be sent back...

  • Memory timings
    Memory timings
    Memory timings refer collectively to a set of four numerical parameters called CL, tRCD, tRP, and tRAS, commonly represented as a series of four numbers separated with dashes, in that respective order . However, it is not unusual for tRAS to be omitted, or for a fifth value, the Command rate, to...



Further reading on semiconductor memory
Semiconductor memory
Semiconductor memory is an electronic data storage device, often used as computer memory, implemented on a semiconductor-based integrated circuit. Examples of semiconductor memory include non-volatile memory such as Read-only memory , magnetoresistive random access memory , and flash memory...

:
  • Dynamic random-access memory
  • Random Access Memory

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK