Lockstep
Encyclopedia
Lockstep systems are redundant
Redundancy (engineering)
In engineering, redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe....

 computing systems that run the same set of operations at the same time in parallel
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

. The output from lockstep operations can be compared to determine if there has been a fault.
Sometimes a timeshift (delay) is set between the 2 systems, which increases the detection probability of errors induced by external influences (e.g. Voltage spike
Voltage spike
In electrical engineering, spikes are fast, short duration electrical transients in voltage , current , or transferred energy in an electrical circuit....

s, Ionizing radiation
Ionizing radiation
Ionizing radiation is radiation composed of particles that individually have sufficient energy to remove an electron from an atom or molecule. This ionization produces free radicals, which are atoms or molecules containing unpaired electrons...

, or In situ
In situ
In situ is a Latin phrase which translated literally as 'In position'. It is used in many different contexts.-Aerospace:In the aerospace industry, equipment on board aircraft must be tested in situ, or in place, to confirm everything functions properly as a system. Individually, each piece may...

 Reverse engineering
Reverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...

).

To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic transaction; in other words, either all of it happens, or none of it happens, but not something in between.

The term "lockstep
Lockstep
Lockstep systems are redundant computing systems that run the same set of operations at the same time in parallel. The output from lockstep operations can be compared to determine if there has been a fault....

" originates in the prison usage, where it refers to the synchronized walking, in which the marchers walk as closely together as physically practical.

Dual Modular Redundancy

Where the computing systems are duplicated, but both actively process each step, it is difficult to arbitrate between them if their outputs differ at the end of a step. For this reason, it is common practice to run DMR systems as "master/slave" configurations with the slave as a "hot-standby" to the master, rather than in lockstep. Since there is no advantage in having the slave unit actively process each step, a common method of working is for the master to copy its state at the end of each step's processing to the slave. Should the master fail at some point, the slave is ready to continue from the previous known good step.

While either the lockstep or the DMR approach (when combined with some means of detecting errors in the master) can provide redundancy against hardware failure in the master, they do not protect against software failure. If the master fails because of a software error, it is highly likely that the slave - in attempting to repeat the execution of the step which failed - will simply repeat the same error and fail in the same way, an example of a common mode failure.

Triple Modular Redundancy

Where the computing systems are triplicated, it becomes possible to treat them as "voting" systems. If one unit's output disagrees with the other two, it is detected as having failed. The matched output from the other two is treated as correct.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK