Optimistic replication - AbsoluteAstronomy.com

Optimistic replication is a strategy for replication in which replicas are allowed to diverge.

Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along. Optimistic replication does away with this in favor of eventual consistency

Eventual consistency

Eventual consistency is one of the consistency models used in the domain of parallel programming, for example in distributed shared memory, distributed transactions, and optimistic replication, it means that given a sufficiently long period of time over which no changes are sent, all updates can be...

, meaning that replicas are guaranteed to converge only when the system has been quiesce

Quiesce

Quiesce is used to describe pausing or altering the state of running processes on a computer, particularly those that might modify information stored on disk during a backup, in order to guarantee a consistent and usable backup...

d for a period of time. As a result there is no longer a need to wait for all of the copies to be synchronized when updating data, which helps concurrency

Concurrency

Concurrency, concurrent, or concurrence may refer to:* Concurrence, a legal term referring to the need to prove both actus reus and mens rea...

and parallelism

Parallelism

Parallelism may refer to:* Angle of parallelism, the angle at one vertex of a right hyperbolic triangle that has two hyperparallel sides* Conscious parallelism, price-fixing between competitors in an oligopoly that occurs without an actual spoken agreement between the parties* Parallel computing,...

. The trade-off is that different replicas may require explicit reconciliation later on, which might then prove either difficult or even insoluble.

Algorithms

An optimistic replication algorithm consists of five elements:

Operation submission: Users submit operations at independent sites.
Propagation: Each site shares the operations it knows about with the rest of the system.
Scheduling: Each site decides on an order for the operations it knows about.
Conflict resolution: If there are any conflicts among the operations a site has scheduled, it must modify them in some way.
Commitment: The sites agree on a final schedule and conflict resolution result, and the operations are made permanent.

There are two strategies for propagation: state transfer, where sites propagate a representation of the current state, and operation transfer, where sites propagate the operations that were performed (essentially, a list of instructions on how to reach the new state).

Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on general information, such as when or where an operation was submitted. Semantic systems are able to make use of application-specific information to make smarter decisions. Note that state transfer systems generally have no information about the semantics of the data being transferred, and so they have to use syntactic scheduling and conflict resolution.

Examples

One well-known example of a system based on optimistic replication is the CVS

Concurrent Versions System

The Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...

version control system

Revision control

Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...

, or any other version control system which uses the copy-modify-merge paradigm. CVS covers each of the five elements:

Operation submission: Users edit local versions of files.
Propagation: Users manually pull updates from a central server, or push changes out once the user feels they are ready.
Scheduling: Operations are scheduled in the order that they are received by the central server.
Conflict resolution: When a user pushes to or pulls from the central repository, any conflicts will be flagged for that user to fix manually.
Commitment: Once the central server accepts the changes which a user pushes, they are permanently committed.

A special case of replication is synchronization

Data synchronization

Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device...

, where there are only two replicas. For example, personal digital assistants (PDAs)

Personal digital assistant

A personal digital assistant , also known as a palmtop computer, or personal data assistant, is a mobile device that functions as a personal information manager. Current PDAs often have the ability to connect to the Internet...

allow users to edit data either on the PDA or a computer, and then to merge these two datasets together. Note, however, that replication is a broader problem than synchronization, since there may be more than two replicas.

Other examples include:

Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

, and other systems which use the Thomas Write Rule (See Rfc677)
Multi-master database replication
Multi-master replication
Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and...
The Coda distributed filesystem
Operational Transformation
Operational transformation
Operational transformation is a technology for supporting a range of collaboration functionalities in advanced groupware systems. OT was originally invented for consistency maintenance and concurrency control in collaborative editing of plain text documents...

, a theoretical framework for group editing
Peer-to-peer wikis
The Bayou distributed database
IceCube

Implications

Applications built on top of optimistic replicated databases need to be careful about ensuring that the delayed updates observed do not impair the correctness of the application.

As a simple example, if an application contains a way of viewing some part of the database state, and a way of editing it, then users may edit that state but then not see it changing in the viewer. Alarmed that their edit "didn't work", they may try it again, potentially more than once. If the updates are not idempotent (e.g., they increment a value), this can lead to disaster. Even if they are idempotent, the spurious updates place a burden on the database system - and the situation in which replication delays become particularly noticeable is when the database system is at a high level of load anyway; this can become a vicious circle.

Testing of applications is often done on a testing environment, smaller in size (perhaps only a single server) and less loaded than the "live" environment. The replication behaviour of such an installation may differ from a live environment in ways that mean that replication lag is unlikely to be observed in testing - masking replication-sensitive bugs. Application developers must be very careful about the assumptions they make about the effect of a database update, and must be sure to simulate lag in their testing environments.

Optimistically replicated databases have to be very careful about offering features such as validity constraints on data. If any given update may or may not be accepted based on the current state of the record, then two updates (A and B) may be individually legal against the starting state of the system, but one or more of the updates may not be legal against the state of the system after the other update (e.g., A and B are both legal, but AB or BA are illegal). If A and B are both initiated at roughly the same time within the database, then A may be successfully applied on some nodes and B on others, but as soon as A and B "meet" and one is attempted on a node which has already applied the other, a conflict will be found. The system must, in this case, decide which update finally "wins", and arrange for any nodes that have already applied the losing update to revert it. However, some nodes may temporarily expose the state with the reverted update, and there may be no way to inform the user who initiated the update of its failure, without requiring them to wait (potentially forever) for confirmation of acceptance at every node.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.