Replication (computer science) - AbsoluteAstronomy.com

Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware

Hardware

Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage device

Data storage device

thumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....

s, or computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device.

The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent

Transparency (computing)

Any change in a computing system, such as new feature or new component, is transparent if the system after change adheres to previous external interface as much as possible while changing its internal behaviour. The purpose is to shield from change all systems on the other end of the interface...

to an external user. Also, in a failure scenario, a failover

Failover

In computing, failover is automatic switching to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application, server, system, or network...

of replicas is hidden as much as possible.

It is common to talk about active and passive replication in systems that replicate data or services. Active replication is performed by processing the same request at every replica. In passive replication, each single request is processed on a single replica and then its state is transferred to the other replicas. If at any time one master replica is designated to process all the requests, then we are talking about the primary-backup scheme (master-slave scheme) predominant in high-availability cluster

High-availability cluster

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum of down-time. They operate by harnessing redundant computers in groups or clusters that provide continued service when system components fail...

s. On the other side, if any replica processes a request and then distributes a new state, then this is a multi-primary scheme (called multi-master
Multi-master replication
Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and...

in the database field). In the multi-primary scheme, some form of distributed concurrency control

Distributed concurrency control

Distributed concurrency control is the concurrency control of a system distributed over a computer network ....

must be used, such as distributed lock manager

Distributed lock manager

A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

.

Load balancing

Load balancing (computing)

Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid...

is different from task replication, since it distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (esp. multi-master) internally, to distribute its data among machines.

Backup

Backup

In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....

is different from replication, since it saves a copy of data unchanged for a long period of time. Replicas on the other hand are frequently updated and quickly lose any historical state.

Replication in distributed systems

Replication is one of the oldest and most important topics in the overall area of distributed systems

Distributed computing

Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

.

Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault-tolerance. For example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries.

Replication models in distributed systems

A number of widely cited models exist for data replication, each having its own properties and performance:

Transactional replication. This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one-copy serializability model is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with the overall ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...

properties that transactional systems seek to guarantee.
State machine replication
State machine replication
-State Machine Definition:For the subsequent discussion a State Machine will be defined as the following tuple of values :* A set of States* A set of Inputs* A set of Outputs...

. This model assumes that replicated process is a deterministic finite state machine
Deterministic finite state machine
In the theory of computation and automata theory, a deterministic finite state machine—also known as deterministic finite automaton —is a finite state machine accepting finite strings of symbols. For each state, there is a transition arrow leading out to a next state for each symbol...

and that atomic broadcast
Atomic broadcast
In distributed systems, atomic broadcast or total order broadcast is a broadcast messaging protocol that ensures that messages are received reliably and in the same order by all participants ....

of every event is possible. It is based on a distributed computing problem called distributed consensus
Consensus (computer science)
Consensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...

and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication. State machine replication is usually implemented by a replicated log consisting of multiple subsequent rounds of the Paxos algorithm
Paxos algorithm
Paxos is a family of protocols for solving consensus in a network of unreliable processors.Consensus is the process of agreeing on one result among a group of participants...

. This was popularized by Google's Chubby system, and is the core behind the open-source Keyspace data store.
Virtual synchrony
Virtual synchrony
Virtual synchrony is an interprocess messaging passing technology. Virtual synchrony systems allow programs running in a network to organize themselves into process groups, and to send messages to groups...

. This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group. A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send events (multicasts) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group.

Levels of performance vary widely depending on the model selected. Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony.

The virtual synchrony model is popular in part because it allows the developer to use either active or passive replication. In contrast, state machine replication and transactional replication are highly constraining and are often embedded into products at layers where end-users would not be able to access them.

Database replication

Database

A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

replication can be used on many database management system

Database management system

A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

s, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates.

Multi-master replication

Multi-master replication

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and...

, where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes.

Database replication becomes difficult when it scales up. Usually, the scale up goes with two dimensions, horizontal and vertical: horizontal scale up has more data replicas, vertical scale up has data replicas located further away in distance. Problems raised by horizontal scale up can be alleviated by a multi-layer multi-view access protocol. Vertical scale up is running into less trouble since internet reliability and performance are improving.

Disk storage replication

Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disk

Hard disk

A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...

s. This way, any file system

File system

A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...

supported by the operating system

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller

Disk array controller

A disk array controller is a device which manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, thus it is sometimes referred to as RAID controller. It also often provides additional disk cache.A disk array controller name is...

) or in software (in a device driver

Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....

).

The most basic method is disk mirroring

Disk mirroring

In data storage, disk mirroring or RAID1 is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability...

, typical for locally-connected disks. The storage industry narrows the definitions, so mirroring is a local (short-distance) operation. A replication is extendable across a computer network

Computer network

A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

, so the disks can be located in physically distant locations, and the master-slave database replication model is usually applied. The purpose of replication is to prevent damage from failures or disaster

Disaster recovery

Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity...

s that may occur in one location, or in case such events do occur, improve the ability to recover. For replication, latency is the key factor because it determines either how far apart the sites can be or the type of replication that can be employed.

The main characteristic of such cross-site replication is how write operations are handled:

Synchronous
Synchronization
Synchronization is timekeeping which requires the coordination of events to operate a system in unison. The familiar conductor of an orchestra serves to keep the orchestra in time....

replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either completes on both sides or not at all. Write is not considered complete until acknowledgement by both local and remote storage. Most applications wait for a write transaction to complete before proceeding with further work, hence overall performance decreases considerably. Inherently, performance drops proportionally to distance, as latency
Latency (engineering)
Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.-Packet-switched networks:...

is caused by speed of light
Speed of light
The speed of light in vacuum, usually denoted by c, is a physical constant important in many areas of physics. Its value is 299,792,458 metres per second, a figure that is exact since the length of the metre is defined from this constant and the international standard for time...

. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas nowadays a whole local cached write completes in about 10-20 μs.
- An often-overlooked aspect of synchronous replication is the fact that failure of remote replica, or even just the interconnection, stops by definition any and all writes (freezing the local storage system). This is the behaviour that guarantees zero data loss. However, many commercial systems at such potentially dangerous point do not freeze, but just proceed with local writes, losing the desired zero recovery point objective
  Recovery point objective
  -Recovery point objective :When computers used for normal "production" business services are affected by a "Major Incident" that cannot be fixed quickly, then the Information Technology Service Continuity Plan is performed, by the ITSC recovery team...
  
  .
- The main difference between synchronous and asynchronous volume replication is that synchronous replication needs to wait for the destination server in any write operation.
Asynchronous
Asynchronous I/O
Asynchronous I/O, or non-blocking I/O, is a form of input/output processing that permits other processing to continue before the transmission has finished....

replication - write is considered complete as soon as local storage acknowledges it. Remote storage is updated, but probably with a small lag
Lag
Lag is a common word meaning to fail to keep up or to fall behind. In real-time applications, the term is used when the application fails to respond in a timely fashion to inputs...

. Performance is greatly increased, but in case of losing a local storage, the remote storage is not guaranteed to have the current copy of data and most recent data may be lost.
Semi-synchronous replication - this usually means that a write is considered complete as soon as local storage acknowledges it and a remote server acknowledges that it has received the write either into memory or to a dedicated log file. The actual remote write is not performed immediately but is performed asynchronously, resulting in better performance than synchronous replication but with increased risk of the remote write failing.
- Point-in-time replication - introduces periodic snapshot
  Snapshot (computer storage)
  In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. It can refer to an actual copy of the state of a system or to a capability provided by certain systems....
  
  s that are replicated instead of primary storage. If the replicated snapshots are pointer-based, then during replication only the changed data is moved not the entire volume. Using this method, replication can occur over smaller, less expensive bandwidth links such as iSCSI or T1 instead of fiber optic lines.

To address the limits imposed by latency, techniques of WAN optimization

WAN Optimization

WAN optimization is a collection of techniques for increasing data-transfer efficiencies across wide-area networks. In 2008, the WAN optimization market was estimated to be $1 billion , and it will grow to $4.4 billion according to Gartner, a technology research firm.The most common measures of...

can be applied to the link.

Most important implementations:

DRBD
DRBD
DRBD is a distributed storage system for the GNU/Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability clusters...

module for Linux.
NetApp
NetApp filer
In computer storage, NetApp filer, known also as NetApp Fabric-Attached Storage , or NetApp's network attached storage device are NetApp's offering in the area of Storage Systems. A FAS functions in an enterprise-class Storage area network as well as a networked storage appliance...

SnapMirror
EMC SRDF
SRDF
SRDF is a family of EMC products that facilitates the data replication from one Symmetrix storage array to another through a Storage Area Network or IP network....
IBM PPRC
Peer to Peer Remote Copy
Peer to Peer Remote Copy or PPRC is a protocol to replicate a storage volume to another control unit in a remote site. Synchronous PPRC causes each write to the primary volume to be performed to the secondary as well, and the I/O is only considered complete when update to both primary and secondary...

and Global Mirror
Global Mirror
Global Mirror is an IBM technology that provides data replication over extended distances between two sites for business continuity and disaster recovery. If adequate bandwidth exists, Global Mirror provides an recovery point objective of as low as 3-5 seconds between the two sites at extended...

(known together as IBM Copy Services)
Hitachi TrueCopy
Hitachi TrueCopy
Hitachi TrueCopy, formerly known as Hitachi Open Remote Copy or Hitachi Remote Copy or Hitachi Asynchronous Remote Copy , is a remote mirroring feature from Hitachi storage arrays available for both open systems and IBM z/OS...
Hewlett-Packard Continuous Access (HP CA)
Symantec Veritas Volume Replicator
VERITAS Software
Veritas Software Corp. was an international software company that was founded in 1983 as Tolerant Systems, renamed Veritas Software Corp. in 1989, and merged with Symantec in 2005. It was headquartered in Mountain View, California...

(VVR)
DataCore
DataCore Software
DataCore Software is an independent software vendor specializing in storage virtualization, storage management, and storage networking. Founded in 1998, the privately-held firm operates global sales, support and service from headquarters in Ft. Lauderdale, Florida along with worldwide subsidiaries...

SANsymphony & SANmelody
FalconStor Replication & Mirroring (sub-block heterogeneous point-in-time, async, sync)
Compellent Remote Instant Replay
EMC RecoverPoint
RecoverPoint
RecoverPoint is a continuous data protection solution offered by EMC Corporation which supports asynchronous and synchronous data replication of block-based storage.- Capabilities :* Block-based journaling....

File Based Replication

File base replication is replicating files at a logical level rather than replicating at the storage block level. There are many different ways of performing this and unlike storage level replication , they are almost exclusively software solutions.

Capture with a kernel driver

With the use of a kernel driver (specifically a filter driver), that intercepts calls to the filesystem functions, any activity is captured immediately as it occurs. This utilises the same type of technology that real time active virus checkers employ. At this level, logical file operations are captured like file open, write, delete, etc. The kernel driver transmits these commands to another process, generally over a network to a different machine, which will mimic the operations of the source machine. Like block-level storage replication, the file-level replication allows both synchronous and asynchronous modes. In synchronous mode, write operations on the source machine are held and not allowed to occur until the destination machine has acknowledged the successful replication. Synchronous mode is less common with file replication products although a few solutions exists.

File level replication solution yield a few benefits. Firstly because data is captured at a file level it can make an informed decision on whether to replicate based on the location of the file and the type of file. Hence unlike block-level storage replication where a whole volume needs to be replicated, file replication products have the ability to exclude temporary files or parts of a filesystem that hold no business value. This can substantially reduce the amount of data sent from the source machine as well as decrease the storage burden on the destination machine. A further benefit to decreasing bandwidth is the data transmitted can be more granular than with block-level replication. If an application writes 100 bytes, only the 100 bytes are transmitted not a complete disk block which is generally 4096 bytes.

On a negative side, as this is a software only solution, it requires implementation and maintenance on the operating system level, and uses some of machine's processing power (CPU).

Notable implementations:

Cofio Software
Cofio Software
Cofio Software, headquartered in San Diego, California, is a privately held company founded in 2006 by Tony Cerqueira, Patrick Barcus and Fabrice Helliker. The founders were also founders of BakBone Software and much of Cofio's engineering team were the core developers at BakBone and were the team...

AIMstor Replication
Double-Take Software
Double-Take Software
Double-Take Software, Inc. , a publicly listed company, develops software that reduces downtime and protects data for business-critical systems in Microsoft server environments and also more recently on Red Hat Linux and VMWARE ESX servers....

Availability

Filesystem journal replication

In many ways working like a database journal, many filesystems have the ability to journal their activity. The journal can be sent to another machine, either periodically or in real time. It can be used there to playback events.

Notable implementations:

Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

DPM
System Center Data Protection Manager
System Center Data Protection Manager is a software product from Microsoft, designed to help IT professionals manage the Windows environment. It is part of the Microsoft System Center family of products and integrates disk-to-disk and near-continuous data protection for reliable recovery of...

(periodical updates, not in real time)

Batch Replication

This is the process of comparing the source and destination filesystems and ensuring that the destination matches the source. The key benefit is that such solutions are generally free or inexpensive. The downside is that the process of synchronizing them is quite system intensive and consequently this process is generally run infrequently.

Notable implementations:

rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...

Distributed shared memory replication

Another example of using replication appears in distributed shared memory

Distributed shared memory

Distributed Shared Memory , in Computer Architecture is a form of memory architecture where the memories can be addressed as one address space...

systems, where it may happen that many nodes of the system share the same page of the memory - which usually means, that each node has a separate copy (replica) of this page.

Primary-backup and multi-primary replication

Many classical approaches to replication are based on a primary/backup model where one device or process has unilateral control over one or more other processes or devices. For example, the primary might perform some computation, streaming a log of updates to a backup (standby) process, which can then take over if the primary fails. This approach is the most common one for replicating databases, despite the risk that if a portion of the log is lost during a failure, the backup might not be in a state identical to the one the primary was in, and transactions could then be lost.

A weakness of primary/backup schemes is that in settings where both processes could have been active, only one is actually performing operations. We're gaining fault-tolerance but spending twice as much money to get this property. For this reason, starting in the period around 1985, the distributed systems research community began to explore alternative methods of replicating data. An outgrowth of this work was the emergence of schemes in which a group of replicas could cooperate, with each process backup up the others, and each handling some share of the workload.

Jim Gray, a towering figure within the database community, analyzed multi-primary replication schemes under the transactional model and ultimately published a widely cited paper skeptical of the approach ( "The Dangers of Replication and a Solution"). In a nutshell, he argued that unless data splits in some natural way so that the database can be treated as n disjoint sub-databases, concurrency control conflicts will result in seriously degraded performance and the group of replicas will probably slow down as a function of n. Indeed, he suggests that the most common approaches are likely to result in degradation that scales as O(n³). His solution, which is to partition the data, is only viable in situations where data actually has a natural partitioning key.

The situation is not always so bleak. For example, in the 1985-1987 period, the virtual synchrony

Virtual synchrony

Virtual synchrony is an interprocess messaging passing technology. Virtual synchrony systems allow programs running in a network to organize themselves into process groups, and to send messages to groups...

model was proposed and emerged as a widely adopted standard (it was used in the Isis Toolkit, Horus, Transis, Ensemble, Totem, Spread

Spread Toolkit

The Spread Toolkit is a computer software package that provides a high performance group communication system that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level...

, C-Ensemble, Phoenix and Quicksilver systems, and is the basis for the CORBA fault-tolerant computing standard; the model is also used in IBM Websphere to replicate business logic and in Microsoft's Windows Server 2008 enterprise clustering

Microsoft Cluster Server

Microsoft Cluster Server is software designed to allow servers to work together as a computer cluster, to provide failover and increased availability of applications, or parallel calculating power in case of high-performance computing clusters .Microsoft has three technologies for clustering:...

technology). Virtual synchrony permits a multi-primary approach in which a group of processes cooperate to parallelize some aspects of request processing. The scheme can only be used for some forms of in-memory data, but when feasible, provides linear speedups in the size of the group.

A number of modern products support similar schemes. For example, the Spread Toolkit

Spread Toolkit

supports this same virtual synchrony model and can be used to implement a multi-primary replication scheme; it would also be possible to use C-Ensemble or Quicksilver in this manner. WANdisco

WANdisco

WANdisco, Inc. is a United States based software company involved in the production of Subversion, a software versioning and revision control system.-History:WANdisco was incorporated in 2005...

permits active replication where every node on a network is an exact copy or replica

Replica

A replica is a copy closely resembling the original concerning its shape and appearance. An inverted replica complements the original by filling its gaps. It can be a copy used for historical purposes, such as being placed in a museum. Sometimes the original never existed. For example, Difference...

and hence every node on the network is active at one time; this scheme is optimized for use in a wide area network

Wide area network

A wide area network is a telecommunication network that covers a broad area . Business and government entities utilize WANs to relay data among employees, clients, buyers, and suppliers from various geographical locations...