A
distributed algorithm is an
algorithmIn mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
designed to run on
computer hardwarePersonal computer hardware are component devices which are typically installed into or peripheral to a computer case to create a personal computer upon which system software is installed including a firmware interface such as a BIOS and an operating system which supports application software that...
constructed from interconnected
processorsThe central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
. Distributed algorithms are used in many varied application areas of
distributed computingDistributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
, such as telecommunications, scientific computing, distributed
information processingInformation processing is the change of information in any manner detectable by an observer. As such, it is a process which describes everything which happens in the universe, from the falling of a rock to the printing of a text file from a digital computer system...
, and real-time
process controlProcess control is a statistics and engineering discipline that deals with architectures, mechanisms and algorithms for maintaining the output of a specific process within a desired range...
. Standard problems solved by distributed algorithms include
leader electionIn distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers . Before the task is begun, all network nodes are unaware which node will serve as the "leader," or coordinator, of the task...
,
consensusConsensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...
, distributed
searchIn computer science, a search algorithm is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots...
,
spanning treeIn the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some of the edges of G. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex...
generation,
mutual exclusionMutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
, and
resource allocationResource allocation is used to assign the available resources in an economic way. It is part of resource management. In project management, resource allocation is the scheduling of activities and the resources required by those activities while taking into consideration both the resource...
.
Distributed algorithms are typically executed
concurrentlyIn computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other...
, with separate parts of the algorithm being run simultaneously on independent processors, and having limited information about what the other parts of the algorithm are doing. One of the major challenges in developing and implementing distributed algorithms is successfully coordinating the behavior of the independent parts of the algorithm in the face of processor failures and unreliable communications links. The choice of an appropriate distributed algorithm to solve a given problem depends on both the characteristics of the problem, and characteristics of the system the algorithm will run on such as the type and probability of processor or link failures, the kind of inter-process communication that can be performed, and the level of timing synchronization between separate processes.
Standard problems
Atomic commitAn atomic commit is an operation in which a set of distinct changes is applied as a single operation. If the changes are applied then the atomic commit is said to have succeeded. If there is a failure before the atomic commit can be completed then all of the changes completed in the atomic commit...
- An atomic commit is an operation where a set of distinct changes is applied as a single operation. If the atomic commit succeeds, it means that all the changes have been applied. If there is a failure before the atomic commit can be completed, the "commit" is aborted and no changes will be applied.
- Algorithms for solving the atomic commit protocol include the two-phase commit protocol
In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol . It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction...
and the three-phase commit protocolIn computer networking and databases, the three-phase commit protocol is a distributed algorithm which lets all nodes in a distributed system agree to commit a transaction. Unlike the two-phase commit protocol however, 3PC is non-blocking. Specifically, 3PC places an upper bound on the amount...
.
ConsensusConsensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...
- Consensus algorithms try to solve the problem of a number of processes agreeing on a common decision.
- More precisely, a Consensus protocol must satisfy the four formal properties below.
- Termination: every correct process decides some value.
- Validity: if all processes propose the same value
, then every correct process decides
.
- Integrity: every correct process decides at most one value, and if it decides some value
, then
must have been proposed by some process.
- Agreement: if a correct process decides
, then every correct process decides
.
- A typical algorithm for solving consensus is the paxos algorithm
Paxos is a family of protocols for solving consensus in a network of unreliable processors.Consensus is the process of agreeing on one result among a group of participants...
.
Distributed search
Leader electionIn distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers . Before the task is begun, all network nodes are unaware which node will serve as the "leader," or coordinator, of the task...
- Leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task is begun, all network nodes are unaware which node will serve as the "leader," or coordinator, of the task. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.
Mutual exclusionMutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
Reliable Broadcast
- Reliable broadcast is a communication primitive in distributed systems. A reliable broadcast is defined by the following properties:
- Validity - if a correct process sends a message, then some correct process will eventually deliver that message
- Agreement - if a correct process delivers a message, then all correct processes eventually deliver that message
- Integrity - every correct process delivers the same message at most once and only if that message has been sent by a process
- A reliable broadcast can have sequential, causal or total ordering.
ReplicationReplication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or...
Resource allocationResource allocation is used to assign the available resources in an economic way. It is part of resource management. In project management, resource allocation is the scheduling of activities and the resources required by those activities while taking into consideration both the resource...
Spanning treeSpanning tree can refer to:* Spanning tree , a tree which contains every vertex of a more general graph* Spanning tree protocol, a protocol for finding spanning trees in bridged networks...
generation
Symmetry breaking, e.g. vertex coloring
Further reading
- C. Rodríguez, M. Villagra and B. Barán, , Bionetics2007, pp. 66–69, 2007.
External links