ACID
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transaction
Database transaction
A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

s are processed reliably. In the context of database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even though that might involve multiple changes (such as debiting one account and crediting another), is a single transaction.

Jim Gray defined these properties of a reliable transaction system in
the late 1970s and developed technologies to automatically achieve them.
In 1983, Andreas Reuter and Theo Härder coined the acronym ACID to describe them.

Atomicity

Atomicity requires that database modifications must follow an "all or nothing" rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged.

To be compliant with the 'A', a system must guarantee the atomicity in each and every situation, including power failures / errors / crashes.

This guarantees that 'an incomplete transaction' cannot exist.

Consistency

The consistency property ensures that any transaction the database performs will take it from one consistent state to another.

Consistency states that only consistent (valid according to all the rules defined) data will be written to the database.

Quite simply, whatever rows will be affected by the transaction will remain consistent with each and every rule that is applied to them (including but not only: constraints, cascades, triggers).

While this is extremely simple and clear, it's worth noting that this consistency requirement applies to everything changed by the transaction, without any limit (including triggers firing other triggers launching cascades that eventually fire other triggers etc.) at all.

Isolation

Isolation refers to the requirement that no transaction should be able to interfere with another transaction at all.

In other words, it should not be possible that two transactions affect the same rows run concurrently, as the outcome would be unpredicted and the system thus made unreliable.

This property of ACID is often relaxed (i.e. partly respected) because of the huge speed decrease this type of concurrency management implies.

In effect the only strict way to respect the isolation property is to use a serial model where no two transactions can occur on the same data at the same time and where the result is predictable (i.e. transaction B will happen after transaction A in every single possible case).

In reality, many alternatives are used due to speed concerns, but none of them guarantee the same reliability.

Durability

Durability
Durability (computer science)
In database systems, durability is the ACID property which guarantees that transactions that have committed will survive permanently.For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.Durability can be...

 means that once a transaction has been committed, it will remain so.

In other words, every committed transaction is protected against power loss/crash/errors and cannot be lost by the system and can thus be guaranteed to be completed.

In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently. If the database crashes right after a group of SQL statements execute, it should be possible to restore the database state to the point after the last transaction committed.

Examples

The following examples are used to further explain the ACID properties. In these examples, the database has two fields, A and B, in two records. An integrity constraint requires that the value in A and the value in B must sum to 100. The following SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 code creates a table as described above:

CREATE TABLE acidtest (A INTEGER, B INTEGER CHECK (A + B = 100));

Atomicity failure

The transaction subtracts 10 from A and adds 10 to B. If it succeeds, it would be valid, because the data continues to satisfy the constraint. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retains A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction complete or neither.

Consistency failure

Consistency is a very general term that demands the data meets all validation rules. In the previous example, the validation is a requirement that A + B = 100. Also, it may be implied that both A and B must be integers. A valid range for A and B may also be implied. All validation rules must be checked to ensure consistency.

Assume that a transaction attempts to subtract 10 from A without altering B. Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90. That is not consistent according to the rules of the database. The entire transaction must be cancelled and the affected rows rolled back to their pre-transaction state.
In reality, this goes much further than simply A and B, as it implies every single cascade or trigger chain related to the events in the transaction, and thus every check on all of the indirectly impacted values as well.
If there had been other constraints, triggers or cascades, every single change operation would have been checked in the same way as the above before the transaction was committed.

Isolation failure

To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation.

Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four actions:
  • subtract 10 from A
  • add 10 to B.
  • subtract 10 from B
  • add 10 to A.


If these operations are performed in order, isolation is maintained, although T2 must wait. Consider what happens, if T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data.

By interleaving the transactions, the actual order of actions might be: , , , . Again consider what happens, if T1 fails. T1 still subtracts 10 from A. Now, T2 adds 10 to A restoring it to its initial value. Now T1 fails. What should A's value be? T2 has already changed it. Also, T1 never changed B. T2 subtracts 10 from it. If T2 is allowed to complete, B's value will be 10 too low, and A's value will be unchanged, leaving an invalid database. This is known as a write-write failure, because two transactions attempted to write to the same data field.

Durability failure

Assume that a transaction transfers 10 from A to B. It removes 10 from A. It then adds 10 to B. At this point, a "success" message is sent to the user. However, the changes are still queued in the disk buffer
Disk buffer
In computer storage, disk buffer is the embedded memory in a hard drive acting as a buffer between the rest of the computer and the physical hard disk platter that is used for storage...

 waiting to be committed to the disk. Power fails and the changes are lost. The user assumes that the changes have been made.

Implementation

Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time.

There are two popular families of techniques: write ahead logging
Write ahead logging
In computer science, write-ahead logging is a family of techniques for providing atomicity and durability in database systems....

 and shadow paging. In both cases, lock
Lock (computer science)
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...

s must be acquired on all information that is updated, and depending on the level of isolation, possibly on all data that is read as well. In write ahead logging, atomicity is guaranteed by copying the original (unchanged) data to a log before changing the database. That allows the database to return to a consistent state in the event of a crash.

In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the transaction commits.

Locking vs multiversioning

Many databases rely upon locking to provide ACID capabilities. Locking means that the transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that are read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions. For example, if user A is running a transaction that has to read a row of data that user B wants to modify, user B must wait until user A's transaction completes. Two phase locking
Two phase locking
In databases and transaction processing two-phase locking, is a concurrency control method that guarantees serializability.It is also the name of the resulting set of database transaction schedules...

 is often applied to guarantee full isolation.

An alternative to locking is multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...

, in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks. I.e., writing transactions do not block reading transactions, and readers do not block writers. Going back to the example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation relaxes the isolation property, namely snapshot isolation
Snapshot isolation
In databases, and transaction processing , snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database , and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates...

.

Distributed transactions

Guaranteeing ACID properties in a distributed transaction
Distributed transaction
A distributed transaction is an operations bundle, in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources...

 across a distributed database where no single node is responsible for all data affecting a transaction presents additional complications. Network connections might fail, or one node might successfully complete its part of the transaction and then be required to roll back its changes, because of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking
Two-phase locking
In databases and transaction processing two-phase locking, is a concurrency control method that guarantees serializability.It is also the name of the resulting set of database transaction schedules...

) provides atomicity for distributed transaction
Distributed transaction
A distributed transaction is an operations bundle, in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources...

s to ensure that each participant in the transaction agrees on whether the transaction should be committed or not. Briefly, in the first phase, one node (the coordinator) interrogates the other nodes (the participants) and only when all reply that they are prepared does the coordinator, in the second phase, formalize the transaction.

See also

  • Open Systems Interconnection
    Open Systems Interconnection
    Open Systems Interconnection is an effort to standardize networking that was started in 1977 by the International Organization for Standardization , along with the ITU-T.-History:...

  • Transactional NTFS
    Transactional NTFS
    Transactional NTFS is a component of Windows Vista and later operating systems. It brings the concept of atomic transactions to the NTFS file system, allowing Windows application developers to write file output routines that are guaranteed either to succeed completely or to fail completely.-...

  • Basically Available, Soft state, Eventual consistency
    Eventual consistency
    Eventual consistency is one of the consistency models used in the domain of parallel programming, for example in distributed shared memory, distributed transactions, and optimistic replication, it means that given a sufficiently long period of time over which no changes are sent, all updates can be...

     (BASE)
  • Concurrency control
    Concurrency control
    In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK