All Topics  
Rsync

 

   Email Print
   Bookmark   Link

 

Rsync


 
 

rsync is a software application for UnixUnix

Unix or UNIX is a computer operating system originally developed in the 1960s and 1970s by a group of AT&T Bell Labs e...
 systems which synchronizesFile synchronization

File Synchronization in computing is the process of making sure that two or more locations contain the same up-to-date infor...
 fileComputer file

A computer file is a piece of arbitrary information, or resources for storing information, that is available to a computer p...
s and directoriesDirectory (file systems)

In computing, a directory, catalog or folder, is an entity in a file system which contains a group of files and/...
 from one location to another while minimizing dataData

In general, data consists of propositions that reflect reality....
 transfer using delta encodingDelta encoding

Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than compl...
 when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirrorMirror (computing)

A mirror in computing is a direct copy of a data set....
ing takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compressionData compression

In computer science and information theory, data compression or source coding is the process of encoding information u...
 and recursionRecursion

In mathematics and computer science, recursion specifies a class of objects or methods by defining a few very simple base ...
.

In daemon mode, rsync listens to the default TCPTransmission Control Protocol

The Transmission Control Protocol is one of the core protocols of the Internet protocol suite....
 portTCP and UDP port

In the TCP and UDP protocols used in computer networking, a port is a special number present in the header of a data packe...
 of 873, serving files in the native rsync protocol. rsync can also be used to synchronize local directories, or via a remote shellFacts About Shell (computing)

In computing, a shell is a piece of software that provides an interface for users....
 such as RSHRemote Shell Overview

rsh is a command line computer program which can execute shell commands as another user, and on another computer across a co...
 or SSHFacts About Secure Shell

In computing, Secure Shell or SSH is a set of standards and an associated network protocol that allows establishing a ...
. In the latter case, the rsync client executable must be installed on both the local and the remote host.

Released under the GNU General Public LicenseGNU General Public License

The GNU General Public License is a widely used free software license, originally written by Richard Stallman for the GNU p...
, rsync is free softwareFree software

Free software, as defined by the Free Software Foundation, is software which can be used, copied, studied, modified and redi...
.

Algorithm

The rsync utility uses an algorithmAlgorithm

In mathematics and computing, an algorithm is a procedure for accomplishing some task which, given an initial state, will t...
 (invented by AustraliaAustralia

Australia, officially the Commonwealth of Australia, is a country in the Southern Hemisphere comprising the mainland o...
n computer programmer Andrew TridgellAndrew Tridgell

Andrew "Tridge" Tridgell is an Australian computer programmer best known as the creator of and contributor to the Samba fil...
) for efficiently transmitting a structure (such as a file) across a communications link when the receiving computer already has a different version of the same structure.

The recipient splits its copyCopying

Copying is the duplication of information, or an artifact, based only on an instance of that information or artifact, and no...
 of the file into fixed-size non-overlapping chunks, of size , and computes two checksumChecksum

A checksum is a form of redundancy check, a very simple measure for protecting the integrity of data by detecting errors in ...
s for each chunk: the MD4MD4

MD4 is a message digest algorithm designed by Professor Ronald Rivest of MIT in 1990....
 hashHash function Summary

A hash function is a way of creating a small digital "fingerprint" from any kind of data....
, and a weaker 'rolling checksumRolling hash

A rolling hash is a hash function where the input is hashed in a fixed width window that moves through the input....
'. It sends these checksums to the sender.

The sender computes the rolling checksum for every chunk of size in its own version of the file, even overlapping chunks. This can be calculated efficiently because of a special property of the rolling checksum: if the rolling checksum of byteByte

A byte is commonly used as a unit of storage measurement in computers, regardless of the type of data being stored....
s through is , the rolling checksum of bytes through can be computed from , byte , and byte without having to examine the intervening bytes. Thus, if one had already calculated the rolling checksum of bytes 1–25, one could calculate the rolling checksum of bytes 2–26 solely from the previous checksum, and from bytes 1 and 26.

The rolling checksumRolling hash

A rolling hash is a hash function where the input is hashed in a fixed width window that moves through the input....
 used in rsync is based on Mark Adler's adler-32Adler-32

Adler-32 is a checksum algorithm which was invented by Mark Adler....
 checksum, which is used in zlibZlib

name = zlib| logo = | screenshot =| caption =...
, and which itself is based on Fletcher's checksumFletcher's checksum

Fletcher's checksum is one of several types of checksum algorithms, which are relatively simple processes used by computers ...
.

The sender then compares its rolling checksums with the set sent by the recipient to determine if any matches exist. If they do, it verifies the match by computing the MD4 checksum for the matching blockBlock Summary

Block may refer to:* City block, an area surrounded but not divided by streets...
 and by comparing it with the MD4 checksum sent by the recipient.

The sender then sends the recipient those parts of its file that didn't match any of the recipient's blocks, along with assembly instructions on how to merge these blocks into the recipient's version. In practice, this creates a file identical to the sender's copy.
However, it is in principle possible that the recipient's copy differs at this point from the sender's: this can happen when the two files have different chunks that nonetheless possess the same MD4 hash and rolling checksum; the chances for this to happen are in practice extremely remote.

If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files.

While the rsync algorithm forms the heart of the rsync application that essentially optimizes transfers between two computers over TCP/IP, the rsync application supports other key features that aid significantly in data transfers or backup. They include compression and decompression of data block by block using zlibZlib

name = zlib| logo = | screenshot =| caption =...
 at sending and receiving ends, respectively, and support for protocols such as sshSecure Shell

In computing, Secure Shell or SSH is a set of standards and an associated network protocol that allows establishing a ...
 that enables encrypted transmission of compressed and efficient differential data using rsync algorithm. Instead of ssh, stunnelStunnel

Stunnel is a free multi-platform computer program, used to provide universal TLS/SSL service....
 can also be used to create an encrypted tunnel to secure the data transmitted.

Uses

rsync is written as a replacement for rcpRcp (Unix)

RCP is a command on the Unix operating systems that is used to remotely copy -- to copy one or more files from one computer ...
 and scpSecure copy

Secure Copy or SCP is a means of securely transferring computer files between a local and a remote host or between two...
. One of the earliest applications of rsync was to implement mirroring or backup for multiple Unix clients onto a central Unix server using rsync/ssh and standard Unix accounts. With a scheduling utility such as cronCRON

CRON, an acronym for Calorie Reduction, Optimum Nutrition, is a technique to increase longevity and decrea...
, one can even schedule automated encrypted rsync-based mirroring between multiple host computers and a central server.

Variations

A utility called rdiff uses the rsync algorithm to generate delta fileDelta encoding Summary

Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than compl...
s with the difference from file A to file B (like the utility diffDiff

In computing, diff is a file comparison utility for Unix systems that outputs the differences between two files....
, but in a different delta format). The delta file can then be applied to file A, turning it into file B (similar to the patchPatch (Unix)

patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch fil...
 utility).

Unlike diff, the process of creating a delta file has two steps: first a signature file is created from file A, and then this (relatively small) signature and file B is used to create the delta file. Also unlike diff, rdiff works well with binary fileBinary file

A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and process...
s.

Using rdiff, a utility called rdiff-backup has been created, capable of maintaining a backupBackup

In the field of information technology, backup refers to the copying of data so that these additional copies may be restor...
 mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point.

duplicity is a variation on rdiff-backup that allows for backups without cooperation from the storage server, as with simple storage services like Amazon S3Amazon S3

Amazon S3 is an online storage web service offered by Amazon Web Services....
. It works by generating the hashes for each block in advance, encrypting them, and storing them on the server, then retrieving them when doing an incremental backup. The rest of the data is also stored encrypted for security purposes.

History

rsync was first announced on 19 June 1996. The original authors were Andrew TridgellAndrew Tridgell

Andrew "Tridge" Tridgell is an Australian computer programmer best known as the creator of and contributor to the Samba fil...
 and Paul Mackerras.

Rsync 3.0 was released on 1 March 2008.

See also

  • CVSupFacts About CVSup

    CVSup® is a computer program which synchronises files and directories from one location to another while minimizing data tra...
  • Unison (file synchronizer)Unison (file synchronizer)

    Unison is an interactive file synchronization program for Unix-like operating systems and Windows used for synchronizing fil...
  • PowerFolderPowerFolder

    name = PowerFolder| logo =...
  • JigdoJigdo

    Jigdo is a download utility designed for the Debian distribution of GNU/Linux that downloads files from several mirrors in o...
  • GrsyncGrsync

    Grsync is a Graphical User Interface for rsync, the most used directory synchronization tool under Linux / Unix System....


External links