All Topics  
Non-Uniform Memory Access

 

   Email Print
   Bookmark   Link






 

Non-Uniform Memory Access



 
 
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.

NUMA architectures logically follow in scaling from symmetric multiprocessing
Symmetric multiprocessing

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory....
 (SMP) architectures.






Discussion
Ask a question about 'Non-Uniform Memory Access'
Start a new discussion about 'Non-Uniform Memory Access'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.

NUMA architectures logically follow in scaling from symmetric multiprocessing
Symmetric multiprocessing

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory....
 (SMP) architectures. Their commercial development came in work by Burroughs (later Unisys
Unisys

Unisys Corporation , based in Blue Bell, Pennsylvania, Pennsylvania, United States, and incorporated in Delaware, is a global provider of information technology services and programs....
), Convex Computer
Convex Computer

Convex Computer was a company that produced a number of Vector processor minisupercomputers, supercomputers for small-to-medium-sized businesses....
 (later Hewlett-Packard
Hewlett-Packard

The Hewlett-Packard Company , commonly referred to as HP, is a technology corporation headquartered in Palo Alto, California, United States....
), Silicon Graphics
Silicon Graphics

Silicon Graphics, Inc. is a company manufacturer high-performance computing solutions, including computer hardware and computer software. SGI was founded by James H....
, Sequent Computer Systems
Sequent Computer Systems

Sequent Computer Systems, or Sequent, was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing Open system , innovating in both hardware and software ....
 and Data General
Data General

Data General was one of the first minicomputer firms from the late 1960s. Three of the four founders were former employees of Digital Equipment Corporation....
 during the 1990s. Techniques developed by these companies later featured in a variety of Unix-like
Unix-like

A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
 operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
s, and somewhat in Windows NT
Windows NT

Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was originally designed to be a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix....
.

Basic concept

Modern CPUs
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 operate considerably faster than the main memory to which they are attached. In the early days of high-speed computing and supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
s the CPU generally ran slower than its memory, until the performance lines crossed in the 1970s. Since then, CPUs, increasingly starved for data, have had to stall while they wait for memory accesses to complete. Many supercomputer designs of the 1980s and 90s focused on providing high-speed memory access as opposed to faster processors, allowing them to work on large data sets at speeds other systems could not approach.

Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this means installing an ever-increasing amount of high-speed cache memory and using increasingly sophisticated algorithms to avoid "cache misses". But the dramatic increase in size of the operating systems and of the applications run on them have generally overwhelmed these cache-processing improvements. Multi-processor systems make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access memory at a time.

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for server
Server (computing)

A server is a computer program that provides services to other computer programs , in the same or other computer. The physical computer that runs a server program is also often referred to as server....
s and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).

Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between banks. This operation has the effect of slowing down the processors attached to those banks, so the overall speed increase due to NUMA will depend heavily on the exact nature of the tasks run on the system at any given time.

Cache coherent NUMA (ccNUMA)

Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 to exploit locality of reference
Locality of reference

In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related computer storage locations being frequently accessed....
 in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead.

Although simpler to design and build, non-cache-coherent NUMA systems become prohibitively complex to program in the standard von Neumann architecture
Von Neumann architecture

The von Neumann architecture is a design model for a stored-program digital computer that uses a central processing unit and a single separate computer storage structure to hold both instructions and data ....
 programming model. As a result, all NUMA computers sold to the market use special-purpose hardware to maintain cache coherence, and thus class as "cache-coherent NUMA", or ccNUMA.

Typically, this takes place by using inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location. For this reason, ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession. Operating-system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 support for NUMA attempts to reduce the frequency of this kind of access by allocating processors and memory in NUMA-friendly ways and by avoiding scheduling and locking algorithms that make NUMA-unfriendly accesses necessary.

Current ccNUMA systems are multiprocessor systems based on the AMD Opteron, which can be implemented without external logic, and Intel
Intel Corporation

Intel Corporation is the world's largest semiconductor company and the inventor of the X86 architecture series of microprocessors, the processors found in most personal computers....
 Itanium
Itanium

Itanium is the brand name for 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel has released two processor families using the brand: the original Itanium and the Itanium 2....
, which requires the chipset to support NUMA. Examples of ccNUMA enabled chipsets are the SGI Shub (Super hub), the Intel E8870, the HP sx2000 (used in the Integrity and Superdome servers), and those found in recent NEC Itanium-based systems. Earlier ccNUMA systems such as those from Silicon Graphics
Silicon Graphics

Silicon Graphics, Inc. is a company manufacturer high-performance computing solutions, including computer hardware and computer software. SGI was founded by James H....
 were based on MIPS
MIPS architecture

MIPS is a RISC instruction set architecture developed by MIPS Technologies . In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced were MIPS implementations....
 processors and the DEC
Digital Equipment Corporation

Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
 Alpha 21364
Alpha 21364

The Alpha 21364, code-named "Marvel", also known as EV7, is a microprocessor developed by Compaq that implemented the DEC Alpha instruction set architecture ....
 (EV7) processor.

Intel announced NUMA introduction to its x86 and Itanium servers in late 2007 with Nehalem and Tukwila
Tukwila (processor)

Tukwila is the code-name for a future generation of Intel's Itanium processor family following Itanium 2 and Montecito . It was expected to come to market in late 2008....
 CPUs. Both CPU families will share a common socket; the interconnection is called Intel Quick Path Interconnect (QPI).

NUMA vs. cluster computing


One can view NUMA as a very tightly coupled form of cluster computing
Cluster Computing

Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks....
. The addition of virtual memory
Virtual memory

Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....
 paging to a cluster architecture can allow the implementation of NUMA entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater than that of hardware-based NUMA.

See also

  • Uniform Memory Access
    Uniform Memory Access

    Uniform Memory Access is a shared memory architecture used in parallel computers.All the processors in the UMA model share the physical memory uniformly....
     (UMA)
  • Cluster computing
    Cluster Computing

    Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks....
  • Symmetric multiprocessing
    Symmetric multiprocessing

    In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory....
     (SMP)
  • Cache only memory architecture
    Cache only memory architecture

    Cache only memory architecture is a computer memory organization for use in multiprocessors in which the local memories at each node are used as cache....
     (COMA)
  • Supercomputer
    Supercomputer

    A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
  • Silicon Graphics
    Silicon Graphics

    Silicon Graphics, Inc. is a company manufacturer high-performance computing solutions, including computer hardware and computer software. SGI was founded by James H....
    , SGI
  • HiperDispatch
    HiperDispatch

    HiperDispatch is a workload dispatching feature found in the newest IBM mainframe models running recent releases of z/OS. HiperDispatch was introduced in February, 2008....


External links

  • - a technical white paper
    White paper

    A white paper is an authoritative report or guide that often addresses problems and how to solve them. White papers are used to educate readers and help people make decisions....
     from Novell