Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Consensus (computer science)

Consensus (computer science)

Overview
Consensus is a problem in distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 that encapsulates the task of group agreement in the presence of faults.

In particular, any process
Process (computing)
In computing, a process is an instance of a computer program, consisting of one or more threads, that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently....

 in the group may crash at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication
State machine replication
-State Machine Definition:For the subsequent discussion a State Machine will be defined as the following tuple of values :* A set of States* A set of Inputs* A set of Outputs...

.

A process is called "correct" if it does not fail at any point during its execution. Unlike Terminating Reliable Broadcast
Terminating Reliable Broadcast
Terminating Reliable Broadcast is a problem in distributed computing that encapsulates the task of broadcasting a message to a set of receiving processes in the presence of faults...

, the typical Consensus problem does not label any single process as a "sender".
Discussion
Ask a question about 'Consensus (computer science)'
Start a new discussion about 'Consensus (computer science)'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
Consensus is a problem in distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 that encapsulates the task of group agreement in the presence of faults.

In particular, any process
Process (computing)
In computing, a process is an instance of a computer program, consisting of one or more threads, that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently....

 in the group may crash at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication
State machine replication
-State Machine Definition:For the subsequent discussion a State Machine will be defined as the following tuple of values :* A set of States* A set of Inputs* A set of Outputs...

.

Problem Description


A process is called "correct" if it does not fail at any point during its execution. Unlike Terminating Reliable Broadcast
Terminating Reliable Broadcast
Terminating Reliable Broadcast is a problem in distributed computing that encapsulates the task of broadcasting a message to a set of receiving processes in the presence of faults...

, the typical Consensus problem does not label any single process as a "sender". Every process "proposes" a value; the goal of the protocol is for all correct processes to choose a single value from among those proposed. A process may perform many I/O
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world – possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

 operations during protocol execution, but must eventually "decide" a value by passing it to the application on that process that invoked the Consensus protocol.

Valid consensus protocols must provide important guarantees to all processes involved. All correct processes must eventually decide the same value, for example, and that value must be one of those proposed. A correct process is therefore guaranteed that the value it decides was also decided by all other correct processes, and can act on that value accordingly.

More precisely, a Consensus protocol must satisfy the four formal properties below.
  • Termination: every correct process decides some value.
  • Validity: if all processes propose the same value , then every correct process decides .
  • Integrity: every correct process decides at most one value, and if it decides some value , then must have been proposed by some process.
  • Agreement: if a correct process decides , then every correct process decides .


The possibility of faults in the system makes these properties more difficult to satisfy. A simple but invalid Consensus protocol might have every process broadcast its proposal to all others, and have a process decide on the smallest value received. Such a protocol, as described, does not satisfy Agreement if faults can occur: if a process crashes after sending its proposal to some processes, but before sending it to others, then the two sets of processes may decide different values.

Impossibility


Consensus has been shown to be impossible to solve in several models
Model of computation
In computability theory and computational complexity theory, a model of computation is the definition of the set of allowable operations used in computation and their respective costs...

 of distributed computing.

In an asynchronous system, where processes have no common clock and run at arbitrarily varying speeds, the problem is impossible to solve if one process may crash
Crash (computing)
A crash or in computing is a condition where a program stops performing its expected function and also stops responding to other parts of the system. Often the offending program may simply appear to freeze...

 and processes communicate by sending messages to one another .
The technique used to prove this result is sometimes called an FLP impossibility proof, named after its creators, Michael J. Fischer
Michael J. Fischer
Michael John Fischer is a computer scientist who works in the fields of distributed computing, parallel computing, cryptography, algorithms and data structures, and computational complexity.-Career:...

, Nancy A. Lynch
Nancy Lynch
Nancy Ann Lynch is a professor at the Massachusetts Institute of Technology. She is the NEC Professor of Software Science and Engineering in the EECS department and heads the Theory of Distributed Systems research group at MIT's Computer Science and Artificial Intelligence Laboratory.She is the...

 and Michael S. Paterson, who won the Dijkstra Prize
Dijkstra Prize
The Edsger W. Dijkstra Prize in Distributed Computing is given for outstanding papers on the principles of distributed computing, whose significance and impact on the theory and/or practice of distributed computing has been evident for at least a decade...

 for this result. The technique has been widely used to prove other impossibility results. For example, a similar proof can be used to show that consensus is also impossible in asynchronous systems where processes communicate by reading and writing shared variables if one process may crash .

The FLP result does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. There exist algorithms, even under the asynchronous model, that can reach consensus with probability one. The FLP proof hinges on demonstrating the existence of an order of message receipts that causes the system to never reach consensus. This "bad" input however may be vanishingly unlikely in practice.

In a synchronous system, where all processes run at the same speed, consensus is impossible if processes communicate by sending messages to one another and one third of the processes can experience Byzantine failures
Byzantine fault tolerance
Byzantine fault tolerance is a sub-field of error tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....

.

Important Consensus Protocols


Google has implemented a distributed lock service library called Chubby . Chubby maintains locks information in small files which are stored in a replicated database to achieve high availability in the face of failures. The database is implemented on top of a fault-tolerant log layer which is based on the Paxos consensus algorithm
Paxos algorithm
Paxos is a family of protocols for solving consensus in a network of unreliable processors.Consensus is the process of agreeing on one result among a group of participants...

. In this scheme, Chubby clients communicate with the Paxos master in order to access/update the replicated log, i.e., read/write to the files .