Operational transformation
Encyclopedia
Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced groupware systems. OT was originally invented for consistency maintenance and concurrency control
Concurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...

 in collaborative editing of plain text documents. Two decades of research has extended its capabilities and expanded its applications to include group undo, locking, conflict resolution, operation notification and compression, group-awareness, HTML/XML and tree-structured document editing, collaborative office productivity tools, application-sharing, and collaborative computer-aided media design tools (see OTFAQ). In 2009 OT has been adopted as a core technique behind the collaboration features in Google Wave
Google Wave
Apache Wave is a software framework for real-time collaborative editing online. Google Inc. originally developed it as Google Wave.It was announced at the Google I/O conference on May 27, 2009....

 and Google Docs, which are taking OT to a new range of web-based applications.

History

Operational Transformation was pioneered by C. Ellis and S. Gibbs in the GROVE (GRoup Outline Viewing Edit) system in 1989. Several years later, some correctness issues were identified and several approaches
were independently proposed to solve these issues, which was followed by another decade of continuous efforts of extending and improving OT by a community of dedicated researchers. In 1998, a Special Interest Group of Collaborative Editing (SIGCE) was set up to promote communication and collaboration among CE and OT researchers. Since then, SIGCE holds annual CE workshops in conjunction with major CSCW (Computer Supported Cooperative Work
Computer supported cooperative work
The term computer-supported cooperative work was first coined by Irene Greif and Paul M. Cashman in 1984, at a workshop attended by individuals interested in using technology to support people in their work. At about this same time, in 1987 Dr...

) conferences, such as ACM CSCW, GROUP and ECSCW.

System architecture

Collaborative systems using OT typically adopt a replicated architecture for the storage of shared documents to ensure good responsiveness in high latency environments, such as the Internet. The shared documents are replicated at the local storage of each collaborating site, so editing operations can be performed at local sites immediately and then propagated to remote sites. Remote editing operations arriving at a local site are typically transformed and then executed. The transformation ensures that application-dependent consistency criteria are achieved across all sites. The lock-free, nonblocking property of OT makes the local response time not sensitive to networking latencies. As a result, OT is particularly suitable for implementing collaboration features such as group editing in the Web/Internet context.

Basics

The basic idea of OT can be illustrated by using a simple text editing scenario as follows. Given a text document with a string "abc" replicated at two collaborating sites; and two concurrent operations:
  1. O1 = Insert[0, "x"] (to insert character "x" at position "0")
  2. O2 = Delete[2, "c"] (to delete the character "c" at position "2")


generated by two users at collaborating sites 1 and 2, respectively. Suppose the two operations are executed in the order of O1 and O2 (at site 1). After executing O1, the document becomes "xabc". To execute O2 after O1, O2 must be transformed against O1 to become: O2' = Delete[3, "c"], whose positional parameter is incremented by one due to the insertion of one character "x" by O1. Executing O2' on "xabc" shall delete the correct character "c" and the document becomes "xab". However, if O2 is executed without transformation, then it shall incorrectly delete character "b" rather than "c". The basic idea of OT is to transform (or adjust) the parameters of an editing operation according to the effects of previously executed concurrent operations so that the transformed operation can achieve the correct effect and maintain document consistency.

Consistency models

One functionality of OT is to support consistency maintenance in collaborative editing systems. A number of consistency models have been proposed in the research community, some generally for collaborative editing systems, and some specifically for OT algorithms.

The CC model

In, two consistency properties have been required for collaborative editing systems:
  • Precedence (Causality) property: ensures the execution order of causally dependent operations be the same as their natural cause-effect order during the process of collaboration. The causal relationship between two operations is defined formally by Lamport's "happened-before
    Happened-before
    In computer science, the happened-before relation is a means of ordering events based on the potential causal relationship of pairs of events in a concurrent system, especially asynchronous distributed systems...

    " relation. When two operations are not causally dependent, they are concurrent. Two concurrent operations can be executed in different order on two different document copies.
  • Convergence: ensures the replicated copies of the shared document be identical at all sites at quiescence (i.e., all generated operations have been executed at all sites).


Since concurrent operations may be executed in different orders and editing operations are not commutative in general, copies of the document at different sites may diverge (inconsistent). The first OT algorithm was proposed in to achieve convergence in a group text editor; the state-vector (or vector clock in classic distributed computing) was used to preserve the precedence property.

The CCI model

The CCI model was proposed as a general framework for consistency management in collaborative editing systems. Under the CCI model, three consistency properties are grouped together:
  • Causality Preservation : the same as the precedence property in the CC Model.
  • Convergence: the same as the convergence property in the CC Model.
  • Intention Preservation: ensures that the effect of executing an operation on any document state be the same as the intention of the operation. The intention of an operation O is defined as the execution effect which can be achieved by applying O on the document state from which O was generated.


The CCI model extends the CC model with a new criterion: Intention Preservation. The essential difference between convergence and intention preservation is that the former can always be achieved by a serialization protocol, but the latter may not be achieved by any serialization protocol if operations were always executed in their original forms. Achieving the nonserialisable intention preservation property has been a major technical challenge. OT has been found particularly suitable for achieving convergence and intention preservation in collaborative editing systems.

The CCI model is independent of document types or data models, operation types, or supporting techniques (OT, multi-versioning, serialization, undo/redo). It was not intended for correctness verification for techniques (e.g. OT) that are designed for specific data and operation models and for specific applications. In, the notion of intention preservation was defined and refined at three levels: First, it was defined as a generic consistency requirement for collaborative editing systems; Second, it was defined as operation context-based pre- and post- transformation conditions for generic OT functions; Third, it was defined as specific operation verification criteria to guide the design of OT functions for two primitive operations: string-wise insert and delete, in collaborative plain text editors.

The CSM model

The condition of intention preservation was not formally specified in the CCI model for purposes of formal proofs. The SDT and LBT approaches try to formalize an alternative conditions that can be proved. The consistency model proposed in these two approaches consist of the following formal conditions:
  • Causality: the same definition as in CC Model

  • Single-operation effects:the effect of executing any operation in any execution state achieves the same effect as in its generation state

  • Multi-operation effects: the effects relation of any two operations is maintained after they are both executed in any states

The CA model

The above CSM model requires that a total order of all objects in the system be specified. Effectively, the specification is reduced to new objects introduced by insert operations. However, specification of the total order entails application-specific policies such as those to break insertion ties (i.e., new objects inserted by two current operations at the same position). Consequently, the total order becomes application specific. Moreover, in the algorithm, the total order must be maintained in the transformation functions and control procedure, which increases time/space complexities of the algorithm.

Alternatively, the CA model is based on the Admissibility Theory. The CA model includes two aspects:
  • Causality: the same definition as in CC Model
  • Admissibility: The invocation of every operation is admissible in its execution state, i.e., every invocation must not violate any effects relation (object ordering) that has been established by earlier invocations.


These two conditions imply convergence. All cooperating sites converge in a state in which there is a same set of objects that are in the same order. Moreover, the ordering is effectively determined by the effects of the operations when they are generated. Since the two conditions also impose additional constraints on object ordering, they are actually stronger than convergence. The CA model and the design/prove approach are elaborated in the 2005 paper . It no longer requires that a total order of objects be specified in the consistency model and maintained in the algorithm, which hence results in reduced time/space complexities in the algorithm.

OT system structure

OT is a system of multiple components. One established strategy of designing OT systems is to separate the high-level Transformation Control (or Integration) Algorithms from the low-level Transformation Functions.
The transformation control algorithm is concerned with determining:
  1. Which operation should be transformed against a causally-ready new operation
  2. The order of the transformations

The control algorithm invokes a corresponding set of transformation functions, which determine how to transform one operation against another according to the operation types, positions, and other parameters. The correctness responsibilities of these two layers are formally specified by a set of transformation properties and conditions. Different OT systems with different control algorithms, functions, and communication topologies require maintaining different sets of transformation properties. The separation of an OT system into these two layers allows for the design of generic control algorithms that are applicable to different kinds of application with different data and operation models.

The other alternative approach was proposed in. In their approach, an OT algorithm is correct if it satisfies two formalized correctness criteria:
  1. Causality preservation
  2. Admissibility preservation

As long as these two criteria are satisfied, the data replicas converge (with additional constraints) after all operations are executed at all sites. There is no need to enforce a total order of execution for the sake of achieving convergence. Their approach is generally to first identify and prove sufficient conditions for a few transformation functions, and then design a control procedure to ensure those sufficient conditions. This way the control procedure and transformation functions work synergistically to achieve correctness, i.e., causality and admissibility preservation. In their approach, there is no need to satisfy transformation properties such as TP2 because it does not require that the (inclusive) transformation functions work in all possible cases.

OT data and operation models

There exist two underlying models in each OT system: the data model that defines the way data objects in a document are addressed by operations, and the operation model that defines the set of operations that can be directly transformed by OT functions. Different OT systems may have different data and operation models. For example, the data model of the first OT system is a single linear address space; and its operation model consists of two primitive operations: character-wise insert and delete. The basic operation model has been extended to include a third primitive operation update to support collaborative Word document processing and 3D model editing. The basic OT data model has been extended into a hierarchy of multiple linear addressing domains
, which is capable of modeling a broad range of documents. A data adaption process is often required to map application-specific data models to an OT-compliant data model.

There exist two approaches to supporting application level operations in an OT system:
  1. Generic operation model approach: which is to devise transformation functions for three primitive operations: insert, delete, and update. This approach needs an operation adaptation process to map application operations to these primitive operations. In this approach, the OT operation model is generic, so transformation functions can be reused for different applications.
  2. Application-specific operation model approach: which is to devise transformation functions for each pair of application operations. For an application with m different operations, m x m transformation functions are needed for supporting this application. In this approach, transformation functions are application-specific and cannot be reused in different applications.

OT functions

Various OT functions have been designed for OT systems with different capabilities and used for different applications.
OT functions used in different OT systems may be named differently, but they can be classified into two categories:
  • one is Inclusion Transformation (or Forward Transformation): IT(Oa, Ob) or , which transforms operation Oa against another operation Ob in such a way that the impact of Ob is effectively included; and
  • the other is Exclusion Transformation (or Backward Transformation): ET (Oa, Ob) or , which transforms operation Oa against another operation Ob in such a way that the impact of Ob is effectively excluded.


For example, suppose a type String with an operation ins(p, c,sid) where p is the position of insertion, c the character to insert and
sid the identifier of the site that has generated the operation. We can write the following transformation function:

T(ins(),ins()) :-
if () return ins()
else if ( and ) return ins()
else return ins()

(ins(),ins()) :-
if () return ins()
else if ( and ) return ins()
else return ins()

Some OT systems use both IT and ET functions, and some use only IT functions. The complexity of OT function design is determined by various factors:
  • the functionality of the OT system: whether the OT system supports do (consistency maintenance), undo, locking, awareness, application sharing, etc.;
  • the correctness responsibility in the OT system: what transformation properties (CP1/TP1, CP2/TP2, IP2, IP3, RP) to meet; whether ET is used;
  • the operation model of the OT system: whether the OT operation model is generic (e.g. primitive insert, delete, update), or application-specific (all operations of the target application); and
  • the data model of the OT system: whether the data in each operation is character-wise (an individual object), string-wise (a sequence of objects), hierarchical, or other structures.

Transformation properties

Various transformation properties for ensuring OT system correctness have been identified. These properties can be maintained by either the transformation control algorithm or by the transformation functions. Different OT system designs have different division of responsibilities among these components. The specifications of these properties and preconditions of requiring them are given below.

Convergence properties

The following two properties are related to achieving convergence.
  • CP1/TP1: For every pair of concurrent operations and defined on the same state, the transformation function T satisfies CP1/TP1 property if and only if: where denotes the sequence of operations containing followed by ;and where denotes equivalence of the two sequences of operations. CP1/TP1 Precondition: CP1/TP1 is required only if the OT system allows any two operations to be executed in different orders.
  • CP2/TP2: For every three concurrent operations and defined on the same document state, the transformation function T satisfies CP2/TP2 property if and only if: . CP2/TP2 stipulates equality between two operations transformed with regard to two equivalent sequences of operations: the transformation of against the sequence of operation followed by must give the same operation as the transformation of against the sequence formed by and . CP2/TP2 Precondition: CP2/TP2 is required only if the OT systems allows two operations and be IT-transformed in two different document states (or contexts).

Inverse properties

The following three properties are related to achieving the desired group undo effect. They are:
  • IP1: Given any document state S and the sequence , we have , which means the sequence is equivalent to a single identity operation I with respect to the effect on the document state. This property is required in an OT system for achieving the correct undo effect, but is not related to IT functions.
  • IP2: The property IP2 expresses that the sequence has no effect on the transformation of other operations. The transformation functions satisfy IP2 if and only if: , which means that the outcome of transforming against the sequence is equivalent to the outcome of transforming against the identity operation I. IP2-Precondition: IP2 is required only if the OT systems allows an operation to be transformed against a pair of do and undo operations , one-by-one.
  • IP3: Given two concurrent operations and defined on the same document state (or context), if and . The transformation functions satisfy the property IP3 if and only if , which means that the transformed inverse operation is equal to the inverse of the transformed operation . IP3-Precondition: IP3 is required only if the OT system allows an inverse operation to be transformed against an operation that is concurrent and defined on the same document state as (or context-equivalent to) .

OT control (integration) algorithms

Various OT control algorithms have been designed for OT systems with different capabilities and for different applications. The complexity of OT control algorithm design is determined by multiple factors. A key differentiating factor is whether an algorithm is capable of supporting concurrency control (do) and/or group undo
. In addition, different OT control algorithm designs make different tradeoffs in:
  • assigning correctness responsibilities among the control algorithm and transformation functions, and
  • time-space complexity of the OT system.


Most existing OT control algorithms for concurrency control adopts the theory of causality/concurrency as the theoretical basis: causally related operations must be executed in their causal order; concurrent operations must be transformed before their execution. However, it was well known that concurrency condition alone cannot capture all OT transformation conditions. In a recent work, the theory of operation context has been proposed to explicitly represent the notion of a document state, which can be used to formally express OT transformation conditions for supporting the design and verification of OT control algorithms.

The following table gives an overview of some existing OT control/integration algorithms
OT Control/Integration Algorithms(Systems) Required Transformation Function Types Support OT-based Do? Support OT-based Undo? Transformation Properties Supported By Control Algorithm Transformation Properties Supported By Transformation Functions Transformation Ordering and Propagation Constraints Timestamp
dOPT(GROVE) T (IT) Yes No None CP1/TP1, CP2/TP2 Causal order State vector
selective-undo(DistEdit) Transpose (IT and ET) No Selective Undo NA CP1/TP1, CP2/TP2, RP, IP1, IP2, IP3 Causal order ??
adOPTed(JOINT EMACS) LTransformation (IT) Yes Chronological Undo IP2, IP3 CP1/TP1, CP2/TP2, IP1 Causal order State vector
Jupiter xform (IT) Yes No CP2/TP2 CP1/TP1 Causal order + Central transformation server Scalar
Google Wave OT transform and composition(IT) Yes ?? CP2/TP2 CP1/TP1 Causal order + Central transformation server + stop'n'wait propagation protocol Scalar
GOT(REDUCE) IT and ET Yes No CP1/TP1, CP2/TP2 None Causal order + Discontinuous total order State vector
GOTO(REDUCE, CoWord, CoPPT, CoMaya) IT and ET Yes No None CP1/TP1, CP2/TP2 Causal order State vector
AnyUndo(REDUCE, CoWord, CoPPT, CoMaya) IT and ET No Undo any operation IP2, IP3, RP IP1, CP1/TP1, CP2/TP2 Causal order State vector
SCOP(NICE) IT Yes No CP2/TP2 CP1/TP1 Causal order + Central transformation server Scalar
COT (REDUCE, CoWord, CoPPT, CoMaya) IT Yes Undo any operation CP2/TP2, IP2, IP3 CP1/TP1, (no ET therefore no IP1 necessary) Causal order + Discontinuous total order Context vector
TIBOT IT Yes No CP2/TP2 CP1/TP1 Causal order Scalar
SOCT4 Forward transformation (IT) Yes No CP2/TP2 CP1/TP1 Causal order + Continuous Total Order Scalar
SOCT2 Forward Transformation(IT) and Backward Transformation(ET) Yes No None CP1/TP1, CP2/TP2, RP Causal order State vector
MOT2 Forward transformation (IT) Yes No ?? CP1/TP1 ?? scalar


A continuous total order is a strict total order
Total order
In set theory, a total order, linear order, simple order, or ordering is a binary relation on some set X. The relation is transitive, antisymmetric, and total...

 where it possible to detect a missing element i.e. 1,2,3,4,... is a continuous total order, 1,2,3,5,... is not a continuous total order.


The transformation-based algorithms proposed in are based on the alternative consistency models "CSM" and "CA" as described above. Their approaches differ from those listed in the table. They use vector timestamps for causality preservation. The other correctness conditions are "single-"/"multi-" operation effects relation preservation or "admissibility" preservation. Those conditions are ensured by the control procedure and transformation functions synergistically. There is no need to discuss TP1/TP2 in their work. Hence they are not listed in the above table.

There exist some other optimistic consistency control algorithms that seek alternative ways to design transformation algorithms, but do not fit well with the above taxonomy and characterization. For example, Mark and Retrace

The correctness problems of OT led to introduction of transformationless post-OT schemes, such as WOOT, Logoot and Causal Trees (CT). "Post-OT" schemes decompose the document into atomic operations, but they workaround the need to transform operations by employing a combination of unique symbol identifiers, vector timestamps and/or tombstones.

Critique of OT

While the classic OT approach of defining operations through their offsets in the text seems to be simple and natural, real-world distributed systems raise serious issues . Namely, that operations propagate with finite speed, states of participants are often different, thus the resulting combinations of states and operations are extremely hard to foresee and understand. As Li and Li put it,
Due to the need to consider complicated case coverage, formal proofs are very complicated and error-prone, even for OT algorithms that only treat two characterwise primitives (insert and delete).

Talk:Operational transformation

OT software

  • Collaborative plain text editors (One dimensional documents)
    • Subethaedit
      SubEthaEdit
      SubEthaEdit is a collaborative real-time editor designed for Mac OS X. The name comes from the Sub-Etha communication network in The Hitchhiker's Guide to the Galaxy series....

        (commercial)
    • Ace
      ACE (editor)
      ACE - a collaborative editor is a platform-independent, collaborative real-time editor. It is a real-time cooperative editing system that allows multiple geographically dispersed users to view and edit a shared text document at the same time.-Introduction:...

       (free, open-source)
    • Gobby
      Gobby
      Gobby is a free software collaborative real-time editor available on Windows and Unix-like platforms. It was initially released in June 2005 by the 0x539 dev group....

       (free, open-source)
    • MoonEdit
      MoonEdit
      MoonEdit is a collaborative real-time editor. It supports Linux, Windows and FreeBSD. While it is free for non-commercial use, it is not free software / open source software.MoonEdit was originally written by Tom Dobrowolski under the name Multi-Editoro....

      (free for non-commercial use)
    • ICT is a research prototype that allows for any (text) editors any editing commands. Its consistency control is based on a combination of diffing and operational transformation.
  • Collaborative productivity applications (Two dimensional documents)
    • CoWord is a Collaborative real-time word processor based on Microsoft Word
    • CoPowerPoint is a Collaborative real-time presentation editor based on Microsoft PowerPoint
  • Collaborative computer-aided media design tools (Three-dimensional documents)
    • CoMaya is a real-time collaborative 3D design tool based on Autodesk Maya.
  • Web-based applications
    • Google Docs & Google Wave
      Google Wave
      Apache Wave is a software framework for real-time collaborative editing online. Google Inc. originally developed it as Google Wave.It was announced at the Google I/O conference on May 27, 2009....

      .
    • EtherPad
      EtherPad
      Etherpad is a web-based collaborative real-time editor, allowing authors to simultaneously edit a text document, and see all of the participants' edits in real-time, with the ability to display each author's text in their own color...

       is a free open-source web based multi-party editor which has been purchased by Google in support of their collaborative computing projects.
    • Mockingbird is an online wireframing and mockup tool that allows for real-time collaboration using OT
  • Version control systems
    • So6 is a free open-source version control system integrated in the LibreSource
      LibreSource
      LibreSource is a collaborative development platform for open-source software, groupware, community interaction, electronic archiving and Web publishing....

       platform.
  • Operational Transformation Engines
    • beWeeVee
      BeWeeVee
      beWeeVee is a real time co-operative technology currently implemented as a collaborative real-time editor, allowing several persons to edit a text document at the same time, and see all of the participants' edits in real-time, each with its own color. It is web-based and allows people using any...

       .NET Based SDK which provides OT capabilities.
    • CodoxEngine Complete OT SDK which contains technologies used to build CodoxWord, supports Visual C++ .NET, Visual C# .NET, Java
  • Web Application Development Frameworks
    • Open Cooperative Web Framework, a Dojo Foundation Project uses Operational Transformation algorithms to enable Cooperative web
      Cooperative web
      The Cooperative Web or Co-Web refers to a browser-based platform that promises to replicate the power of face-to-face communications via web-touch without sacrificing the quality of human interactions. A Co-Web enabled...

       concepts.

See also

  • Optimistic Replication
    Optimistic replication
    Optimistic replication is a strategy for replication in which replicas are allowed to diverge.Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along...

  • Data synchronization
    Data synchronization
    Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device...

  • Collaborative Editing
    Collaborative editing
    Collaborative editing is the practice of groups producing works together through individual contributions. Effective choices in group awareness, participation, and coordination are critical to successful collaborative writing outcomes. Most usually it is applied to textual documents or...

  • Consistency model
    Consistency model
    In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...

    s

External links


Relevant online talks

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK