Snapshot isolation
Encyclopedia
In database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s, and transaction processing
Transaction processing
In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state...

 (transaction management), snapshot isolation is a guarantee that all reads made in a transaction
Database transaction
A transaction comprises a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions...

 will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

Snapshot isolation has been adopted by several major database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

s, such as SQL Anywhere
SQL Anywhere
SQL Anywhere is a relational database management system product from the company Sybase iAnywhere, a subsidiary of Sybase.- Features :...

, InterBase
InterBase
InterBase is a relational database management system currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other DBMSs by its small footprint, close to zero administration requirements, and multi-generational architecture...

, Firebird
Firebird (database server)
Firebird is an open source SQL relational database management system that runs on Linux, Windows, and a variety of Unix. The database forked from Borland's open source edition of InterBase in 2000, but since Firebird 1.5 the code has been largely rewritten ....

, Oracle
Oracle database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 and Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

 (2005 and later). The main reason for its adoption is that it allows better performance than serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

, yet still avoids most of the concurrency anomalies that serializability avoids (but not always all). In practice snapshot isolation is implemented within multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...

 (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has also been used to critique the ANSI
Ansi
Ansi is a village in Kaarma Parish, Saare County, on the island of Saaremaa, Estonia....

 SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI).

Snapshot isolation is called "serializable" mode in Oracle
Oracle database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

 and PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

 versions prior to 9.1, which may cause confusion with the "real serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic.

Definition

A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write-write conflict will cause the transaction to abort.

In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies.

As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2.

If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = -$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 - $200) is now -$200), or T2 happens first and similarly prevents T1 from committing.

Under snapshot isolation, however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = -$100, and V1 + V2 = -$200.

If built on multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...

, snapshot isolation allows transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed.

Making Snapshot Isolation Serializable

Fekete et al. (2005) have shown that potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

 property.
  • Materialize the conflict: Add a special conflict table, which both transactions update in order to create a direct write-write conflict.
  • Promotion: Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write-write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE).


In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write-write conflict that would prevent the two from succeeding concurrently. This approach violates the normal form
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...

.

Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write-write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible.

In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance.

However, the strange situation, where a database system's user is responsible for guaranteeing serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

 by careful programming, is changing. An integrated Snapshot isolation (SI) solution for serializability, transparent to the user, Serializable snapshot isolation (SerializableSI), has emerged in Cahill et al. (2008). It provides a low overhead modification of the SI technique that ensures serializability. Performance results there with a slightly modified database system for utilizing the technique demonstrate that the technique has good correlation with SI, achieving serializability with only small penalty for the transaction loads tested. The first practical implementation is included in version 9.1 of PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...

. It is also possible that a similar solution has existed unnoticed since 1993. Fekete et al. (2005) utilizes a theory developed earlier in Raz (1993) for Multi-version Commitment ordering (see Multi-version CO (MVCO) in Commitment ordering
Commitment ordering
In concurrency control of databases, transaction processing , and related applications, Commitment ordering is a class of interoperable Serializability techniques, both centralized and distributed. It allows optimistic implementations...

) with neither being aware of MVCO, nor using MVCO, nor referencing it. However it is referenced in later articles on the subject, e.g., in Cahill et al. (2008), and explicitly outlined in a presentation by Fekete (2009), which summarizes several years of research in this area. The articles use the theory for analyzing conflicts in Snapshot isolation (SI) without using MVCO. Combining SI with MVCO (COSI) makes SI serializable as well, with relatively low overhead. Furthermore, the resulting combination, being MVCO compliant, allows COSI compliant database systems to transparently participate in a CO solution for distributed/global serializability. Performance comparison of SI with COSI (the combination of MVCO and SI) is not available yet, and it is also unclear how COSI compares with SerializableSI, the method in Cahill et al. (2008). However, a good correlation exists between SI and MVCO in the sense that all serializable SI schedules can be made MVCO by COSI (by possible commit delays, a minus), with no aborted transactions (a plus; versus SerializableSI that is known to unnecessarily abort and restart a certain percentage of transactions even in serializable SI schedules).

History

Snapshot isolation arose from work on multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...

 databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase
InterBase
InterBase is a relational database management system currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other DBMSs by its small footprint, close to zero administration requirements, and multi-generational architecture...

 later owned by Borland
Borland
Borland Software Corporation is a software company first headquartered in Scotts Valley, California, Cupertino, California and finally Austin, Texas. It is now a Micro Focus subsidiary. It was founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn.-The 1980s:...

 provided SI as far back as 1984.

Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...

transactions.

Further reading

  • Gerhard Weikum, Gottfried Vossen, Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery, Morgan Kaufmann, 2002, ISBN 1558605088
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK