Concurrent data structure
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a concurrent data structure is a
particular way of storing and organizing data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 for access by
multiple computing threads
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

 (or processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

) on a computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

.

Historically, such data structures were used on uniprocessor
Uniprocessor
A uniprocessor system is a computer system with a single central processing unit. As more and more computers employ multiprocessing architectures, such as SMP and MPP, the term is used to refer to systems that still have only one CPU. Most desktop computers are now shipped with multiprocessing...


machines with operating systems that supported multiple
computing threads (or processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

). The term concurrency
Concurrency (computer science)
In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other...

 captured the
multiplexing
Multiplexing
The multiplexed signal is transmitted over a communication channel, which may be a physical transmission medium. The multiplexing divides the capacity of the low-level communication channel into several higher-level logical channels, one for each message signal or data stream to be transferred...

/interleaving
Interleaving
In computer science and telecommunication, interleaving is a way to arrange data in a non-contiguous way to increase performance.It is typically used:* In error-correction coding, particularly within data transmission, disk storage, and computer memory....

 of the threads' operations on the
data by the operating system, even though the processors never
issued two operations that accessed the data simultaneously.

Today, as multiprocessor
Multiprocessor
Computer system having two or more processing units each sharing main memory and peripherals, in order to simultaneously process programs.Sometimes the term Multiprocessor is confused with the term Multiprocessing....

 computer architectures that provide
parallelism
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 become the dominant computing platform (through the
proliferation of multi-core processors), the term has come to
stand mainly for data structures that can be accessed by multiple
threads which may actually access the data simultaneously because
they run on different processors that communicate with one another.
The concurrent data structure (sometimes also called a shared data structure) is usually considered to reside in an abstract storage
environment called shared memory
Shared memory
In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...

, though this memory may be
physically implemented as either a "tightly coupled" or a
distributed collection of storage modules.

Basic principles

Concurrent data structures, intended for use in
parallel or distributed computing environments, differ from
"sequential" data structures, intended for use on a processor
machine, in several ways
. Most notably, in a sequential environment
one specifies the data structure's properties and checks that they
are implemented correctly, by providing safety properties. In
a concurrent environment, the specification must also describe
liveness properties which an implementation must provide.
Safety properties usually state that something bad never happens,
while liveness properties state that something good keeps happening.
These properties can be expressed, for example, using Linear Temporal Logic
Linear temporal logic
In logic, Linear temporal logic is a modal temporal logic with modalities referring to time. In LTL, one can encode formulae about the future of paths such as that a condition will eventually be true, that a condition will be true until another fact becomes true, etc. It is a fragment of the more...

.

The type of liveness requirements tend to define the data structure.
The method
Method (computer science)
In object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...

 calls can be blocking
Blocking (computing)
Blocking occurs when a subroutine does not return until it either completes its task or fails with an error or exception. A process that is blocked is one that waits for some event, such as a resource becoming available or the completion of an I/O operation.In a multitasking computer system,...

 or non-blocking. Data structures are not
restricted to one type or the other, and can allow combinations
where some method calls are blocking and others are non-blocking
(examples can be found in the Java concurrency
Java concurrency
The Java language and the JVM have been designed to support concurrent programming, and all execution in takes place in the context of threads. Objects and resources can be accessed by many separate threads; each thread has its own path of execution but can potentially access any object in the...

 software
library).

The safety properties of concurrent data structures must capture their
behavior given the many possible interleavings of methods
called by different threads. It is quite
intuitive to specify how abstract data structures
behave in a sequential setting in which there are no interleavings.
Therefore, many mainstream approaches for arguing the safety properties of a
concurrent data structure (such as serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

, linearizability
Linearizability
In concurrent programming, an operation is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes...

, sequential consistency
Sequential consistency
Sequential consistency is one of the consistency models used in the domain of concurrent programming . It was first defined as the property that requires that ".....

, and
quiescent consistency ) specify the structures properties
sequentially, and map its concurrent executions to
a collection of sequential ones.

In order to guarantee the safety and liveness properties, concurrent
data structures must typically (though not always) allow threads to
reach consensus
Consensus (computer science)
Consensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...

 as to the results
of their simultaneous data access and modification requests. To
support such agreement, concurrent data structures are implemented
using special primitive synchronization operations (see synchronization primitives)
available on modern multiprocessor machine
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...

s
that allow multiple threads to reach consensus. This consensus can be reached achieved in a blocking manner by using locks
Spinlock
In software engineering, a spinlock is a lock where the thread simply waits in a loop repeatedly checking until the lock becomes available. Since the thread remains active but isn't performing a useful task, the use of such a lock is a kind of busy waiting...

, or without locks, in which case it is non-blocking. There is a wide body
of theory on the design of concurrent data structures (see
bibliographical references).

Design and Implementation

Concurrent data structures are significantly more difficult to design
and to verify as being correct than their sequential counterparts.

The primary source of this additional difficulty is concurrency, exacerbated by the fact that
threads must be thought of as being completely asynchronous:
they are subject to operating system preemption, page faults,
interrupts, and so on.

On today's machines, the layout of processors and
memory, the layout of data in memory, the communication load on the
various elements of the multiprocessor architecture all influence performance.
Furthermore, there is a tension between correctness and performance: algorithmic enhancements that seek to improve performance often make it more difficult to design and verify a correct
data structure implementation.

A key measure for performance is scalability, captured by the speedup of the implementation. Speedup is a measure of how
effectively the application is utilizing the machine it is running
on. On a machine with P processors, the speedup is the ratio of the structures execution time on a single processor to its execution time on T processors. Ideally, we want linear speedup: we would like to achieve a
speedup of P when using P processors. Data structures whose
speedup grows with P are called scalable. The extent to which one can scale the performance of a concurrent data structure is captured by a formula known as Amdahl's law
Amdahl's law
Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved...

 and
more refined versions of it such as Gustafson's law
Gustafson's law
Gustafson's Law is a law in computer science which says that problems with large, repetitive data sets can be efficiently parallelized. Gustafson's Law contradicts Amdahl's law, which describes a limit on the speed-up that parallelization can provide. Gustafson's law was first described by John...

.

A key issue with the performance of concurrent data structures is the level of memory contention: the overhead in traffic to and from memory as a
result of multiple threads concurrently attempting to access the same
locations in memory. This issue is most acute with blocking implementations
in which locks control access to memory. In order to
acquire a lock, a thread must repeatedly attempt to modify that
location. On a cache-coherent
Cache coherence
In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...


multiprocessor (one in which processors have
local caches that are updated by hardware in order to keep them
consistent with the latest values stored) this results in long
waiting times for each attempt to modify the location, and is
exacerbated by the additional memory traffic associated with
unsuccessful attempts to acquire the lock.

Further reading

  • Nancy Lynch
    Nancy Lynch
    Nancy Ann Lynch is a professor at the Massachusetts Institute of Technology. She is the NEC Professor of Software Science and Engineering in the EECS department and heads the Theory of Distributed Systems research group at MIT's Computer Science and Artificial Intelligence Laboratory.She is the...

     "Distributed Computing"
  • Hagit Attiya and Jennifer Welch "Distributed Computing: Fundamentals, Simulations And Advanced Topics, 2nd Ed"
  • Doug Lea
    Doug Lea
    Doug Lea is a professor of computer science at State University of New York at Oswego where he specializes in concurrent programming and the design of concurrent data structures. He was on the Executive Committee of the Java Community Process and chaired JSR 166, which added concurrency utilities...

    , "Concurrent Programming in Java: Design Principles and Patterns"
  • Maurice Herlihy
    Maurice Herlihy
    Maurice Herlihy is a computer scientist active in the field of multiprocessor synchronization. Herlihy has contributed to the design of concurrent algorithms, and in particular to the exposition and quantification of the properties and uses of hardware synchronization operations...

     and Nir Shavit
    Nir Shavit
    Nir Shavit is a Professor in the Computer Science Department at Tel Aviv University.Nir Shavit received B.Sc. and M.Sc. degrees in Computer Science from the Technion - Israel Institute of Technology in 1984 and 1986, and a Ph.D. in Computer Science from the Hebrew University of Jerusalem in 1990...

    , "The Art of Multiprocessor Programming"
  • Mattson, Sanders, and Massingil "Patterns for Parallel Programming"

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK