All Topics  
Race condition

 

   Email Print
   Bookmark   Link






 

Race condition



 
 
A race condition or race hazard
Hazard (logic)

Hazards in any system are obviously an un-desirable effect caused by either a deficency in the system or external influences. In digital logic hazards are usually referred to in one of three ways: Static Hazards, Dynamic Hazards and Function Hazards, These logic hazards are all subsets of the same problem which changes in the input variables do not...
 is a flaw in a system
System

System is a set of interacting or interdependent entities, real or abstract, forming an integrated whole.The concept of an "integrated whole" can also be stated in terms of a system embodying a set of relationships which are differentiated from relationships of the set to other elements, and from relationships between an element of the se...
 or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing
Timing

Timing is the spacing of events in time. Some typical uses are:* The act of measuring the elapsed time of something or someone, often at athletic events such as swimming or running, where participants are timed with a device such as a stopwatch....
 of other events. The term originates with the idea of two signals racing each other to influence the output
Output

Output is the term denote either an exit or changes which exit a system and which activate/modify a process. It is an abstract concept, used in the model ing, system design and system exploitation....
 first.

Race conditions can occur in electronics
Electronics

Electronics refers to the flow of charge through nonmetal electrical conductor , whereas electrical refers to the flow of charge through metal electrical conductor....
 systems, especially logic circuits
Logic gate

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output. The logic normally performed is Boolean logic and is most commonly found in digital circuits....
, and in computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 software, especially multithreaded or distributed programs.

pical example of a race condition may occur in a system of logic gate
Logic gate

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output. The logic normally performed is Boolean logic and is most commonly found in digital circuits....
s, where inputs vary.






Discussion
Ask a question about 'Race condition'
Start a new discussion about 'Race condition'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A race condition or race hazard
Hazard (logic)

Hazards in any system are obviously an un-desirable effect caused by either a deficency in the system or external influences. In digital logic hazards are usually referred to in one of three ways: Static Hazards, Dynamic Hazards and Function Hazards, These logic hazards are all subsets of the same problem which changes in the input variables do not...
 is a flaw in a system
System

System is a set of interacting or interdependent entities, real or abstract, forming an integrated whole.The concept of an "integrated whole" can also be stated in terms of a system embodying a set of relationships which are differentiated from relationships of the set to other elements, and from relationships between an element of the se...
 or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing
Timing

Timing is the spacing of events in time. Some typical uses are:* The act of measuring the elapsed time of something or someone, often at athletic events such as swimming or running, where participants are timed with a device such as a stopwatch....
 of other events. The term originates with the idea of two signals racing each other to influence the output
Output

Output is the term denote either an exit or changes which exit a system and which activate/modify a process. It is an abstract concept, used in the model ing, system design and system exploitation....
 first.

Race conditions can occur in electronics
Electronics

Electronics refers to the flow of charge through nonmetal electrical conductor , whereas electrical refers to the flow of charge through metal electrical conductor....
 systems, especially logic circuits
Logic gate

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output. The logic normally performed is Boolean logic and is most commonly found in digital circuits....
, and in computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 software, especially multithreaded or distributed programs.

Electronics

A typical example of a race condition may occur in a system of logic gate
Logic gate

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output. The logic normally performed is Boolean logic and is most commonly found in digital circuits....
s, where inputs vary. If a particular output depends on the state of the inputs, it may only be defined for steady-state signals. As the inputs change state, a small delay will occur before the output changes, due to the physical nature of the electronic system. For a brief period, the output may change to an unwanted state before settling back to the designed state. Certain systems can tolerate such glitch
Glitch

A glitch is a short-lived fault in a system. The term is particularly common in the computing and electronics industries, and in circuit bending, as well as among players of video games, although it is applied to all types of systems including human organizations and nature....
es
, but if for example this output signal functions as a clock for further systems that contain memory, the system can rapidly depart from its designed behaviour (in effect, the temporary glitch becomes permanent).

For example, consider a two input AND gate fed with a logic signal A on one input and its negation, NOT A, on another input. In theory, the output (A AND NOT A) should never be high. However, if changes in the value of A take longer to propagate to the second input than the first when A changes from false to true, a brief period will ensue during which both inputs are true, and so the gate's output will also be true.

Proper design techniques (e.g. Karnaugh map
Karnaugh map

The Karnaugh map, also known as a Veitch diagram , is a tool to facilitate the simplification of Boolean algebra integrated circuit expressions....
s) encourage designers to recognise and eliminate race conditions before they cause problems.

As well as these problems, some logic elements can enter metastable state
Metastability in electronics

Metastability in electronics is the ability of a non-equilibrium electronic state to persist for a long period of time . Note this definition does not guarantee all of the properties that are sometimes demanded for a metastable state in statistical mechanics....
s, which create further problems for circuit designers.

Critical and non-critical race conditions

A critical race occurs when the order in which internal variables are changed determines the eventual state that the state machine
Finite state machine

A finite state machine or finite state automaton or simply a state machine, is a model of behavior composed of a finite number of state s, transitions between those states, and actions....
 will end up in.

A non-critical race occurs when the order in which internal variables are changed does not alter the eventual state. In other words, a non-critical race occurs when moving to a desired state means that more than one internal state variable must be changed at once, but no matter in what order these internal state variables change, the resultant state will be the same anyway.

Static, dynamic, and essential race conditions

Static race conditions : These are caused when a signal and its complement are combined together. Dynamic race conditions : These result in multiple transitions when only one is intended. They are due to interaction between gates (Dynamic race conditions can be eliminated by using not more than two levels of gating). Essential race conditions : These are caused when an input has two transitions in less than the total feedback propagation time. Sometimes they are cured using inductive delay-line elements to effectively increase the time duration of an input signal.

Simplified explanation


In many work environments, it is often difficult for software developers or engineers to explain a race condition to managers, who may not be familiar with technical details. In this case, a horse race metaphor example can be provided:

Example: A horse race represents the computer application. Each horse represents a specific element in the application, that are all running in parallel, such as user interface graphics versus network communications running concurrently. The software may require that a specific horse must win the race in order for the application to run properly. The application may work flawlessly if horse #5 finishes first, while the application crashes if the wrong horse finishes the race. One possible solution to this problem, is to add synchronization
Synchronization

Synchronization or synchronisation is timekeeping which requires the coordination of events to operate a system in unison. The familiar Conducting of an orchestra serves to keep the orchestra in time....
 to guarantee that horse #5 finishes before #2. Metaphorically, this is similar to jockey #2 and #5 teaming up to let #5 run ahead of #2.

Applications can run at different speeds on different computer systems or different situations (i.e. variations in Internet performance or processor loading), which may cause the application to crash randomly on some systems and never on others. The unpredictable nature of an intermittent bug caused by a race condition, is a frequent source of frustration within the profession of software development
Software development

Software development is the set of activities that results in software products. Software development may include research, new development, modification, reuse, re-engineering, maintenance, or any other activities that result in software products....
.

Computing


Race conditions arise in software when separate processes or threads
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 of execution depend on some shared state.

Here is a simple example:

Let us assume that two threads T1 and T2 each want to increment the value of a global integer by one. Ideally, the following sequence of operations would take place:

  1. Integer i = 0; (memory)
  2. T1 reads the value of i from memory into register1 : 0
  3. T1 increments the value of i in register1: (register1 contents) + 1 = 1
  4. T1 stores the value of register1 in memory : 1
  5. T2 reads the value of i from memory into register2 : 1
  6. T2 increments the value of i in register2: (register2 contents) + 1 = 2
  7. T2 stores the value of register2 in memory : 2
  8. Integer i = 2; (memory)


In the case shown above, the final value of i is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:

  1. Integer i = 0; (memory)
  2. T1 reads the value of i from memory into register1 : 0
  3. T2 reads the value of i from memory into register2 : 0
  4. T1 increments the value of i in register1: (register1 contents) + 1 = 1
  5. T2 increments the value of i in register2: (register2 contents) + 1 = 1
  6. T1 stores the value of register1 in memory : 1
  7. T2 stores the value of register2 in memory : 1
  8. Integer i = 1; (memory)


The final value of i is 1 instead of the expected result of 2. This occurs because the increment operations of the second case are non-atomic. Atomic operations are those that cannot be interrupted while accessing some resource such as a memory location. In the first case, T1 was not interrupted while accessing the variable i, so its operation was atomic.

For another example, consider the following two tasks, in pseudocode
Pseudocode

Pseudocode is a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of some programming language, but is intended for human reading rather than machine reading....
: global int A = 0

// increments the value of A and print "RX" // activated whenever an interrupt is received from the serial controller task Received A = A + 1 print "RX" end task

// prints out only the even numbers // is activated every second task Timeout if (A is divisible by 2) print A end if end task

Output would look something like: 0 0 0 RX RX 2 RX RX 4 4

Now consider this chain of events, which might occur next:

  1. timeout occurs, activating task Timeout
  2. task Timeout evaluates A and finds it is divisible by 2, so elects to execute the "print A" next.
  3. data is received on the serial port, causing an interrupt and a switch to task Received
  4. task Received runs to completion, incrementing A and printing "RX"
  5. control returns to task Timeout
  6. task timeout executes print A, using the current value of A, which is 5.


Mutex
Mutual exclusion

Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections....
es are used to address this problem in concurrent programming.

Real-world examples


File systems
In file system
File system

In computing, a file system is a method for store and organize computer files and the data they contain to make it easy to find and access them....
s, two or more programs may "collide" in their attempts to modify or access a file, which could result in data corruption. File locking
File locking

File locking is a mechanism that enforces access to a computer file by only one user or computer process at any specific time. The purpose of locking is to prevent the classic interceding update scenario....
 provides a commonly-used solution. A more cumbersome remedy involves organizing the system in such a way that one unique process (running a daemon
Daemon (computer software)

In Unix and other computer computer multitasking operating systems, a daemon is a computer program that runs in the background , rather than under the direct control of a user; they are usually initiated as background Computer processes....
 or the like) has exclusive access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process (which of course requires synchronization at the process level).

A different form of race condition exists in file systems where unrelated programs may affect each other by suddenly using up available resources such as disk space (or memory, or processor cycles). Software not carefully designed to anticipate and handle this rare situation may then become quite fragile and unpredictable. Such a risk may be overlooked for a long time in a system that seems very reliable. But eventually enough data may accumulate or enough other software may be added to critically destabilize many parts of a system. Probably the best known example of this occurred with the near-loss of the Mars Rover "Spirit"
Spirit rover

MER-A , known as Spirit, is the first of the two rover s of NASA's Mars Exploration Rover Mission. It landed successfully on Mars on 04:35 Ground UTC on January 4, 2004, three weeks before its twin Opportunity rover landed on the other side of the planet....
 not long after landing, but this is a commonly overlooked hazard in many computer systems. A solution is for software to request and reserve all the resources it will need before beginning a task; if this request fails then the task is postponed, avoiding the many points where failure could have occurred. (Alternately, each of those points can be equipped with error handling, or the success of the entire task can be verified afterwards, before continuing on.) A more common but incorrect approach is to simply verify that enough disk space (for example) is available before starting a task; this is not adequate because in complex systems the actions of other running programs can be unpredictable.

Networking
In networking, consider a distributed chat network like IRC, where a user acquires channel-operator privileges in any channel he starts. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither server will yet have received the other server's signal that it has allocated that channel. (Note that this problem has been largely solved
Internet Relay Chat

Internet Relay Chat is a form of real-time Internet text messaging or synchronous conferencing. It is mainly designed for Many-to-many in discussion forums, called #Channels, but also allows One-to-one via instant messaging, as well as chat and data transfers via Direct Client-to-Client....
 by various IRC server implementations.)

In this case of a race condition, the concept of the "shared resource
Resource (computer science)

A resource, or system resource, is any physical or virtual component of limited availability within a computer system. Every device connected to a computer system is a resource....
" covers the state of the network (what channels exist, as well as what users started them and therefore have what prk operation). Where users find such a solution unacceptable, a pragmatic solution can have the system 1) recognize when a race condition has occurred; and 2) repair the ill effects.

Life-critical systems
Software flaws in life-critical system
Life-critical system

A life-critical system or safety-critical system is a system whose failure ormalfunction may result in:* death or serious injury to people, or...
s can be disastrous. Race conditions were among the flaws in the Therac-25
Therac-25

The Therac-25 was a radiation therapy machine produced by Atomic Energy of Canada Limited and CGR of France after the Therac-6 and Therac-20 units....
 radiation therapy
Radiation therapy

Radiation therapy is the medicine use of ionizing radiation as part of cancer oncology to control malignant cell s . Radiotherapy may be used for curative or Adjuvant chemotherapy cancer treatment....
 machine, which led to the death of three patients and injuries to several more. Another example is the Energy Management System provided by GE Energy and used by Ohio-based FirstEnergy Corp. (among other power facilities). A race condition existed in the alarm subsystem; when three sagging power lines were tripped simultaneously, the condition prevented alerts from being raised to the monitoring technicians, delaying their awareness of the problem. This software flaw eventually led to the North American Blackout of 2003. GE Energy later developed a software patch to correct the previously undiscovered error.

Computer security

A specific kind of race condition involves checking for a predicate (e.g. for authentication
Authentication

Authentication is the act of establishing or confirming something as authentic, that is, that claims made by or about the subject are true....
), then acting on the predicate, while the state can change between the time of check and the time of use. When this kind of bug exists in security
Computer security

Computer security is a branch of technology known as information security as applied to computers. The objective of computer security can include protection of information from theft or corruption, or the preservation of availability, as defined in the security policy....
-conscious code, a security vulnerability called a time-of-check-to-time-of-use
Time-of-check-to-time-of-use

A time-of-check-to-time-of-use bug is a software bug caused by changes in a system between the checking of a condition and the use of the results of that check....
 (TOCTTOU) bug is created.

Asynchronous finite state machines

Even after ensuring that single bit transitions occur between states, the asynchronous machine
Finite state machine

A finite state machine or finite state automaton or simply a state machine, is a model of behavior composed of a finite number of state s, transitions between those states, and actions....
 will fail if multiple inputs change at the same time. The solution to this is to design a machine so that each state is sensitive to only one input change.

See also

  • Concurrency control
    Concurrency control

    In computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible....
  • Deadlock
    Deadlock

    A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like 'the chicken or the egg'....
  • Synchronization
    Synchronization

    Synchronization or synchronisation is timekeeping which requires the coordination of events to operate a system in unison. The familiar Conducting of an orchestra serves to keep the orchestra in time....
  • Therac-25
    Therac-25

    The Therac-25 was a radiation therapy machine produced by Atomic Energy of Canada Limited and CGR of France after the Therac-6 and Therac-20 units....
  • Linearizability
    Linearizability

    In concurrent programming, an operation is atomic, or linearizable, if it appears to take effect instantaneously. Implementation details may be ignored by the user, except insofar as they affect performance....


External links

  • Paper "" by Robert M. Fuhrer, Bill Lin and Steven M. Nowick
  • Paper "" by Luciano Lavagno, Cho W. Moon, Robert K. Brayton and Alberto Sangiovanni-Vincentelli
    Alberto Sangiovanni-Vincentelli

    Alberto Sangiovanni-Vincentelli is an academic researcher, teacher, entrepreneur, technical advisor and business man. He is a co-founder of the two largest Electronic design automation companies: Cadence Design Systems and Synopsys, Inc., a recipient of Phil Kaufman Award, 2001....
  • Article "" by David A. Wheeler
    David A. Wheeler

    David A. Wheeler is a computer scientist . He is best known for his work on Open source software/Free software and Computer security....
  • Chapter "" (Secure Programming for Linux and Unix HOWTO)
  • , with sample source code and comparison to C code, by Chiral Software