All Topics  
Memory barrier

 

   Email Print
   Bookmark   Link






 

Memory barrier



 
 
Memory barrier, also known as membar or memory fence, is a class of instruction
Instruction (computer science)

In computer science, an instruction is a single operation of a central processing unit defined by an instruction set architecture. In a broader sense, an "instruction" may be any representation of an element of an executable program, such as a bytecode....
s which cause a central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.

CPUs employ performance optimizations that can result in out-of-order execution
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
, including memory load and store operations. Memory operation reordering normally goes unnoticed within a single thread of execution
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
, but causes unpredictable behaviour in concurrent programs and device driver
Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s unless carefully controlled.






Discussion
Ask a question about 'Memory barrier'
Start a new discussion about 'Memory barrier'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Memory barrier, also known as membar or memory fence, is a class of instruction
Instruction (computer science)

In computer science, an instruction is a single operation of a central processing unit defined by an instruction set architecture. In a broader sense, an "instruction" may be any representation of an element of an executable program, such as a bytecode....
s which cause a central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.

CPUs employ performance optimizations that can result in out-of-order execution
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
, including memory load and store operations. Memory operation reordering normally goes unnoticed within a single thread of execution
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
, but causes unpredictable behaviour in concurrent programs and device driver
Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory model
Memory model (computing)

In computing, a Memory model describes how Thread interact through memory, or more generally specify what assumptions the compiler is allowed to make when generating code for segmented memory or Page platforms....
. Some architectures provide multiple barriers for enforcing different ordering constraints.

Memory barriers are typically used when implementing low-level machine code
Machine code

Machine code or machine language is a system of instructions and data executed directly by a computer's central processing unit. Machine code may be regarded as a primitive programming language or as the lowest-level representation of a compiled and/or assembly language computer program....
 that operates on memory shared by multiple devices. Such code includes synchronization
Synchronization (computer science)

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of process , and synchronization of data....
 primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware
Computer hardware

A personal computer is made up of computer hardware, multiple physical components onto which can be loaded into a multitude of software that perform the functions of the computer....
.

An illustrative example


When a program runs on a single CPU, the hardware performs the necessary book-keeping to ensure that programs execute as if all memory operations were performed in program order, hence memory barriers are not necessary. However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or memory mapped
Memory-mapped I/O

Memory-mapped I/O and port I/O are two complementary methods of performing input/output between the Central processing unit and peripheral devices in a computer....
 peripherals, out-of-order access may affect program behavior. For example a second CPU may see memory changes made by the first CPU in a sequence which differs from program order.

The following two processor program gives a concrete example of how such out-of-order execution can affect program behavior:

Initially, memory locations x and f both hold the value 0. The program running on processor #1 loops until the value of f is non-zero, then it prints the value of x. The program running on processor #2 stores the value 42 into x and then stores the value 1 into f. Pseudo code for the two program fragments is shown below. The steps of the program correspond to individual processor instructions.

Processor #1:
 loop:
  load the value in location f, if it is 0 goto loop
 print the value in location x

Processor #2: store the value 42 into location x store the value 1 into location f


You might expect the print statement to always print the number "42"; however, if processor #2's store operations are executed out-of-order, it is possible that f would be updated before x, and the print statement might print "0". For most programs this situation is not acceptable. A memory barrier can be inserted before processor #2's assignment to f to ensure that the new value of x was visible to other processors at or prior to the change in the value of f.

Low-level architecture-specific primitives


Memory barriers are low-level primitives which are part of the definition of an architecture's memory model. Like instruction sets, memory models vary considerably between architectures, so it is not appropriate to generalize about memory barrier behavior. The conventional wisdom is that using memory barriers correctly requires careful study of the architecture manuals for the hardware one is programming. That said, the following paragraph offers a glimpse of some memory barriers which exist in the wild.

Some architectures provide only a single memory barrier instruction sometimes called "full fence". A full fence ensures that all load and store operations prior to the fence will have been committed prior to any loads and stores issued following the fence. Other architectures provide separate "acquire" and "release" memory barriers which address the visibility of read-after-write operations from the point of view of a reader (sink) or writer (source) respectively. Some architectures provide separate memory barriers to control ordering between different combinations of system memory and I/O
I/O

I/O may refer to:* Input/output, a system of communication for information processing systems* The input-output model, an economic model of flow prediction between sectors...
 memory. When more than one memory barrier instruction is available it is important to consider that the cost of different instructions may vary considerably.

Multithreaded programming and memory visibility


Multithreaded programs usually use synchronisation primitives provided by a high-level programming environment, such as Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
, or an API
Application programming interface

An application programming interface is a set of subroutine, data structures, class and/or Protocol provided by library and/or operating system Service s in order to support the building of applications....
 such as POSIX
POSIX

POSIX or "Portable Operating System Interface" is the collective name of a family of related standardizations specified by the Institute of Electrical and Electronics Engineers to define the application programming interface , along with shell and utilities interfaces for software compatible with variants of the Unix operating system, altho...
 pthreads or Win32. Primitives such as mutex
Mutual exclusion

Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections....
es and semaphores
Semaphore (programming)

In computer science, a semaphore is a protected variable or abstract data type which constitutes the classic method for restricting access to shared resources such as shared memory in a multiprogramming environment....
 are provided to synchronize access to resources from parallel threads of execution. These primitives are usually implemented with the memory barriers required to provide the expected memory visibility semantics. In such environments explicit use of memory barriers is not generally necessary.

Each API or programming environment in principle has its own high-level memory model that defines its memory visibility semantics. Although programmers do not usually need to use memory barriers in such high level environments, it is important to understand their memory visibility semantics, to the extent possible. Such understanding is not necessarily easy to achieve because memory visibility semantics are not always consistently specified or documented.

Just as programming language semantics are defined at a different level of abstraction to machine language opcode
Opcode

In computer technology, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question ....
s, a programming environment's memory model is defined at a different level of abstraction to that of a hardware memory model. It is important to understand this distinction and realize that there is not always a simple mapping between low-level hardware memory barrier semantics and the high-level memory visibility semantics of a particular programming environment. As a result, a particular platform's implementation of (say) pthreads may employ stronger barriers than required by the specification. Programs which take advantage of memory visibility as-implemented rather than as-specified may not be portable.

Out-of-order execution versus compiler reordering optimizations


Memory barrier instructions only address reordering effects at the hardware level. Compilers may also reorder instructions as part of the program optimization process. Although the effects on parallel program behavior can be similar in both cases, in general it is necessary to take separate measures to inhibit compiler reordering optimizations for data that may be shared by multiple threads of execution. Note that such measures are usually only necessary for data which is not protected by synchronization primitives such as those discussed in the previous section.

In C and C++, the volatile keyword was intended to allow C and C++ programs to directly access Memory-mapped I/O
Memory-mapped I/O

Memory-mapped I/O and port I/O are two complementary methods of performing input/output between the Central processing unit and peripheral devices in a computer....
. Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified in source code with no omissions. Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by Memory-mapped I/O. A C or C++ compiler may not reorder reads and writes to volatile memory locations, nor may it omit a read or write to a volatile memory location, allowing a pointer to volatile memory to be used for Memory-mapped I/O.

The C and C++ standards do not address multiple threads (or multiple processors), and as such, the usefulness of volatile depends on the compiler and hardware. Although volatile guarantees that the reads and writes will happen in the exact order specified in the source code, the compiler may generate code which reorders a volatile read or write with non-volatile reads or writes, thus limiting its usefulness as a inter-thread flag or mutex. Moreover, you are not guaranteed that volatile reads and writes will be seen in the same order by other processors due to caching, meaning volatile variables may not even work as inter-thread flags or mutexes.

Some languages and compilers may provide sufficient facilities to implement functions which address both the compiler reordering and machine reordering
Out-of-order execution

In computer engineering, out-of-order execution, OoOE, is a paradigm used in most high-performance microprocessors to make use of Instruction cycle that would otherwise be wasted by a certain type of costly delay....
 issues. In Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 version 1.5 (also known as version 5), the volatile keyword is now guaranteed to prevent certain hardware and compiler re-orderings, as part of the new Java Memory Model
Java Memory Model

The Java memory model describes how Thread in the Java interact through memory. Together with the description of single-threaded execution of code, the memory model provides the formal semantics of programming languages of the Java programming language....
. The proposed C++ memory model does not use volatile, instead C++0x
C++0x

C++0x is the planned new Open standard for the C++. It is intended to replace the existing C++ standard, ISO/IEC 14882, which was published in 1998 and updated in 2003....
 will include special atomic types and operations with semantics similar to those of volatile in the Java Memory Model.

See also

  • Lock-free and wait-free algorithms


External links