Snapshot (computer storage)
Encyclopedia
In computer systems
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

, a snapshot is the state
State (computer science)
In computer science and automata theory, a state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers....

 of a system at a particular point in time. The term was coined as an analogy to that in photography
Snapshot (photography)
A snapshot is popularly defined as a photograph that is "shot" spontaneously and quickly, most often without artistic or journalistic intent. Snapshots are commonly considered to be technically "imperfect" or amateurish—out of focus or poorly framed or composed...

. It can refer to an actual copy
System image
A system image in computing is a copy of the entire state of a computer system stored in some non-volatile form such as a file. A system is said to be capable of using system images if it can be shut down and later restored to exactly the same state...

 of the state of a system or to a capability provided by certain systems.

Rationale

A full backup
Backup
In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....

 of a large data set may take a long time to complete. On multi-tasking
Computer multitasking
In computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...

 or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being atomic and introduces a version skew that may result in data corruption
Data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data...

. For example, if a user moves a file into a directory that has already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. Version skew may also cause corruption with files which change their size or contents underfoot while being read.

One approach to safely backing up live data is to temporarily disable write access to data during the backup, either by stopping the accessing applications or by using the locking
Lock (computer science)
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...

 API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 provided by the operating system to enforce exclusive read access. This is tolerable for low-availability systems (on desktop computers and small workgroup servers, on which regular downtime
Downtime
The term downtime is used to refer to periods when a system is unavailable.Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function...

 is acceptable). High-availability 24/7
24/7
24/7 is an abbreviation which stands for "24 hours a day, 7 days a week", usually referring to a business or service available at all times without interruption...

 systems, however, cannot bear service stoppages.

To avoid downtime, high-availability systems may instead perform the backup on a snapshot—a read-only
Read-only
In computing, read-only can mean:* Read-only memory , a type of storage media* Read-only access to files or directories in file system permissions...

 copy of the data set frozen at a point in time—and allow applications to continue writing to their data. Most snapshot implementations are efficient and can create snapshots in O(1)
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

. In other words, the time and I/O needed to create the snapshot does not increase with the size of the data set, whereas the same for a direct backup is proportional to the size of the data set. In some systems once the initial snapshot is taken of a data set, subsequent snapshots copy the changed data only, and use a system of pointers to reference the initial snapshot. This method of pointer-based snapshots consumes less disk capacity than if the data set was repeatedly cloned.

Read-write snapshots are sometimes called branching snapshots, because they implicitly create diverging versions of their data. Aside from backups and data recovery, read-write snapshots are frequently used in virtualization
Hardware virtualization
Computer hardware virtualization is the virtualization of computers or operating systems. It hides the physical characteristics of a computing platform from users, instead showing another abstract computing platform...

, sandboxing
Sandbox (software development)
A sandbox is a testing environment that isolates untested code changes and outright experimentation from the production environment or repository, in the context of software development including Web development and revision control...

 and virtual hosting
Virtual private server
Virtual private server is a term used by internet hosting services to refer to a virtual machine. The term is used for emphasizing that the virtual machine, although running in software on the same physical computer as other customers' virtual machines, is functionally equivalent to a separate...

 setups because of their usefulness in managing changes to large sets of files.

Volume managers

Some Unix systems have snapshot-capable logical volume managers
Logical volume management
In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes...

. These implement copy-on-write
Copy-on-write
Copy-on-write is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource...

 on entire block devices by copying changed blocks—just before they are to be overwritten—to other storage, thus preserving a self-consistent past image of the block device. Filesystems on this image can later be mounted as if it were on read-only media. Block-level snapshotting is almost always less space-efficient than direct file system support for snapshots.

File systems

Some file systems, such as WAFL
Write Anywhere File Layout
The Write Anywhere File Layout is a file layout that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure , and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances...

WAFL is not a file system. WAFL is a file layout that provides mechanisms that enable a variety of file systems and technologies that want to access disk blocks., fossil
Fossil (file system)
Fossil is the default file system in Plan 9 from Bell Labs. It serves the network protocol 9P and runs as a user space daemon, like most Plan 9 file servers. Fossil is different from most other file systems due to its snapshot/archival feature. It can take snapshots of the entire file system on...

 for Plan 9 from Bell Labs
Plan 9 from Bell Labs
Plan 9 from Bell Labs is a distributed operating system. It was developed primarily for research purposes as the successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...

 or ODS-5, internally track old versions of files and make snapshots available through a special namespace
Namespace (computer science)
A namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols . An identifier defined in a namespace is associated only with that namespace. The same identifier can be independently defined in multiple namespaces...

. Others, like UFS2, provide an operating system API for accessing file histories. In NTFS
NTFS
NTFS is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7....

, access to snapshots is provided by the Volume Shadow-copying Service (VSS) in Windows XP
Windows XP
Windows XP is an operating system produced by Microsoft for use on personal computers, including home and business desktops, laptops and media centers. First released to computer manufacturers on August 24, 2001, it is the second most popular version of Windows, based on installed user base...

 and Windows Server 2003
Windows Server 2003
Windows Server 2003 is a server operating system produced by Microsoft, introduced on 24 April 2003. An updated version, Windows Server 2003 R2, was released to manufacturing on 6 December 2005...

 and Shadow Copy in Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...

. Snapshots have also been available in the NSS (Novell Storage Services
Novell Storage Services
Novell Storage Services is a file system used by the Novell NetWare operating system. Recently support of NSS was introduced to SUSE Linux via low-level network NCPFS protocol...

) file system on NetWare since version 4.11, and more recently on Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 platforms in the Open Enterprise Server product.

On Linux, the Btrfs
Btrfs
Btrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....

 and OCFS2 file systems support creating snapshots (cloning) of individual files. Additionally, Btrfs also supports the creation of snapshots of subvolumes.

Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 ZFS
ZFS
In computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...

 has a hybrid implementation which tracks read-write snapshots at the block level, but makes branched file sets nameable to user applications as "clones".

Time Machine
Time Machine (Apple software)
Time Machine is a backup utility developed by Apple. It is included with Mac OS X and was introduced with the 10.5 "Leopard" release of Mac OS X. The software is designed to work with the Time Capsule as well as other internal or external drives.-Overview:...

, included in Apple's Mac OS X v10.5
Mac OS X v10.5
Mac OS X Leopard is the sixth major release of Mac OS X, Apple's desktop and server operating system for Macintosh computers. Leopard was released on 26 October 2007 as the successor of Tiger , and is available in two variants: a desktop version suitable for personal computers, and a...

 operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

, is not a snapshotting scheme but a system-level incremental backup service: it merely watches mounted volumes for changes and copies changed files periodically to a specially-designated volume using hard links.

In databases

The SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 specification mandates four levels of transaction isolation. In the highest, SERIALIZABLE, a snapshot is implicitly created at the start of every transaction. The backup utilities for many popular SQL databases use this feature to generate self-consistent dumps of table data.

In virtualization

System emulators host a guest operating system in a virtual machine; some (including VMware
VMware
VMware, Inc. is a company providing virtualization software founded in 1998 and based in Palo Alto, California, USA. The company was acquired by EMC Corporation in 2004, and operates as a separate software subsidiary ....

, VirtualBox
VirtualBox
Oracle VM VirtualBox is an x86 virtualization software package, originally created by software company Innotek GmbH, purchased by Sun Microsystems, and now developed by Oracle Corporation as part of its family of virtualization products...

, Parallels Desktop, Qemu
QEMU
QEMU is a processor emulator that relies on dynamic binary translation to achieve a reasonable speed while being easy to port on new host CPU architectures....

 and Virtual PC) can perform whole-system snapshots by dumping the entire machine state to a backing file and redirecting future guest writes to a second file, which then acts as a copy-on-write table.

Other applications

Software transactional memory
Software transactional memory
In computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...

 is a scheme which applies the same concepts to data structures held only in memory.

See also

  • System image
    System image
    A system image in computing is a copy of the entire state of a computer system stored in some non-volatile form such as a file. A system is said to be capable of using system images if it can be shut down and later restored to exactly the same state...

  • LVM snapshots (Linux)
    Logical Volume Manager (Linux)
    LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...

  • R1Soft Hot Copy (Linux)
  • Microsoft Volume Shadow Copy
  • Storage Hypervisor
    Storage hypervisor
    In computing, a storage hypervisor is a portable software program that runs on a physical hardware platform, on a virtual machine, inside a hypervisor OS or in all three places. It may co-reside with virtual machine supervisors or have exclusive control of its platform...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK