All Topics  
Backup

 

   Email Print
   Bookmark   Link






 

Backup



 
 
In information technology
Information technology

Information technology , as defined by the Information Technology Association of America , is "the study, design, development, implementation, support or management of computer-based information systems, particularly software applications and computer hardware." IT deals with the use of electronic computers and computer software to data conv...
, backup refers to making copies of data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 so that these additional copies may be used to restore the original after a data loss
Data loss

In the field of information technology, data loss refers to the unforeseen loss of data or information. An occurrence of data loss can be called a Data Loss Event and there are several possible root causes....
 event. These additional copies are typically called "backups." Backups are useful primarily for two purposes. The first is to restore a state following a disaster (called disaster recovery
Disaster recovery

Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural disaster or man-made hazards disaster....
). The second is to restore small numbers of files after they have been accidentally deleted or corrupted..






Discussion
Ask a question about 'Backup'
Start a new discussion about 'Backup'
Answer questions from other users
Full Discussion Forum



Quotations


Backup: The duplicate copy of crucial data that no one bothered to make, used only in the abstract.

Only wimps use tape backup: _real_ men just upload their important stuff on

BANANA - Backups Are Not Archives, NOT ARCHIVES.

Anderson McCammont, quoting someone else on the veritas-bu mailing list (3/27/2007). Category:Themes





Encyclopedia


In information technology
Information technology

Information technology , as defined by the Information Technology Association of America , is "the study, design, development, implementation, support or management of computer-based information systems, particularly software applications and computer hardware." IT deals with the use of electronic computers and computer software to data conv...
, backup refers to making copies of data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 so that these additional copies may be used to restore the original after a data loss
Data loss

In the field of information technology, data loss refers to the unforeseen loss of data or information. An occurrence of data loss can be called a Data Loss Event and there are several possible root causes....
 event. These additional copies are typically called "backups." Backups are useful primarily for two purposes. The first is to restore a state following a disaster (called disaster recovery
Disaster recovery

Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural disaster or man-made hazards disaster....
). The second is to restore small numbers of files after they have been accidentally deleted or corrupted.. Data loss is also very common. 66% of internet users have suffered from serious data loss..

Since a backup system contains at least one copy of all data worth saving, the data storage requirements are considerable. Organizing this storage space and managing the backup process is a complicated undertaking. A data repository model can be used to provide structure to the storage. In the modern era of computing there are many different types of data storage device
Data storage device

A data storage device is a device for recording information . Recording can be done using virtually any form of energy, spanning from manual muscle power in handwriting, to acoustic vibrations in phonographic recording, to electromagnetic energy modulating magnetic tape and optical discs....
s that are useful for making backups. There are also many different ways in which these devices can be arranged to provide geographic redundancy, data security, and portability.

Before data is sent to its storage location, it is selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others. Many organizations and individuals try to have confidence that the process is working as expected and work to define measurements and validation techniques. It is also important to recognize the limitations and human factors involved in any backup scheme.

Storage, the base of a backup system


Data repository models

Any backup strategy starts with a concept of a data repository. The backup data needs to be stored somehow and probably should be organized to a degree. It can be as simple as a sheet of paper with a list of all backup tapes and the dates they were written or a more sophisticated setup with a computerized index, catalog, or relational database. Different repository models have different advantages. This is closely related to choosing a backup rotation scheme
Backup rotation scheme

A backup rotation scheme is a method for effectively backup data where multiple media are used in the backup process. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it....
.

Unstructured : An unstructured repository may simply be a stack of floppy disks or CD-R/DVD-R media with minimal information about what was backed up and when. This is the easiest to implement, but probably the least likely to achieve a high level of recoverability. Full + Incrementals
Incremental backup

An incremental backup is a backup method where multiple backups are kept . These backups will be incremental if each original piece of backed up information is stored only once, and then successive backups only contain the information that changed since a previous backup....
 : A Full + Incremental repository aims to make storing several copies of the source data more feasible. At first, a full backup (of all files) is taken. After that, any number of incremental backups can be taken. There are many different types of incremental backups, but they all attempt to only backup a small amount data relative to the full backup. Restoring a whole system to a certain point in time would require locating the full backup taken previous to that time and the incremental backups that cover the period of time between the full backup and the particular point in time to which the system is supposed to be restored. The scope of an incremental backup is typically defined as a range of time relative to other full or incremental backups. Different implementations of backup systems frequently use specialized or conflicting definitions of these terms. Continuous data protection
Continuous data protection

Continuous data protection , also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves....
 : This model takes it a step further and instead of scheduling periodic backups, the system immediately logs every change on the host system. This is generally done by saving byte or block-level differences rather than file-level differences. It differs from simple disk mirroring
Disk mirroring

In data storage, disk mirroring or RAID1 is the replication of logical disk volumes onto separate physical hard disks in Real-time computing to ensure continuous availability....
 in that it enables a roll-back of the log and thus restore of old image of data.

Storage media

Regardless of the repository model that is used, the data has to be stored on some data storage medium somewhere.

Magnetic tape
Magnetic tape data storage

Magnetic tape has been used for data storage for over 50 years. In this time, many advances in tape formulation, packaging, and data density have been made....
 : Magnetic tape has long been the most commonly used medium for bulk data storage, backup, archiving, and interchange. Tape has typically had an order of magnitude better capacity/price ratio when compared to hard disk, but recently the ratios for tape and hard disk have become a lot closer. There are myriad formats, many of which are proprietary or specific to certain markets like mainframes or a particular brand of personal computer. Tape is a sequential access medium, so even though access times may be poor, the rate of continuously writing or reading data can actually be very fast. Some new tape drives are even faster than modern hard disks. Hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
 : The capacity/price ratio of hard disk has been rapidly improving for many years. This is making it more competitive with magnetic tape as a bulk storage medium. The main advantages of hard disk storage are low access times, availability, capacity and ease of use. External disks can be connected via local interfaces like SCSI
SCSI

Small Computer System Interface, or SCSI , is a set of standards for physically connecting and transferring data between computers and peripheral devices....
, USB, FireWire
FireWire

The IEEE 1394 interface is a serial communications interface standard for high-speed communications and isochronous real-time data transfer, frequently used by personal computers, as well as in digital audio, digital video, automotive, and aeronautics applications....
, or eSATA, or via longer distance technologies like Ethernet
Ethernet

Ethernet is a family of Data frame-based computer networking technologies for local area networks . The name comes from the physical concept of the Luminiferous aether....
, iSCSI
ISCSI

In computing, iSCSI is Internet SCSI , an Internet Protocol -based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances....
, or Fibre Channel
Fibre Channel

Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage networking. Fibre Channel is standardized in the Technical Committee T11 of the InterNational Committee for Information Technology Standards , an American National Standards Institute ?accredited standards committee....
. Some disk-based backup systems, such as Virtual Tape Libraries, support data de-duplication which can dramatically reduce the amount of disk storage capacity consumed by daily and weekly backup data. Optical disc
Optical disc

In computing, sound reproduction, and video, an optical disc is a flat, circular disc wherein Data is stored in the pits in its flat surface ? sequentially on the continuous, spiral track extending from the innermost track to the outermost track, covering the entire disc surface....
 : A recordable CD can be used as a backup device. One advantage of CDs is that they can be restored on any machine with a CD-ROM drive. In addition, recordable CD's are relatively cheap. Another common format is recordable DVD
DVD recordable

DVD recordable and DVD rewritable refer to DVD optical disc formats that can be DVD recorder , either wiktionary:write once or rewritable format written by laser, as compared to DVD-ROM, which is mass-produced by pressing....
. Many optical disk formats are WORM
Write Once Read Many

Write Once, Read Many refers to computer data storage systems, data storage devices, and data storage media that can be written to once, but read from multiple times....
 type, which makes them useful for archival purposes since the data can't be changed. Other rewritable formats can also be utilized such as CD-RW
CD-RW

Compact Disc ReWritable is a rewritable optical disc format. Known as CD-Erasable during its development, CD-RW was introduced in 1997, and was preceded by the never officially released CD-RW#CD-MO in 1988....
 or DVD-RAM
DVD-RAM

DVD-RAM is a disc specification presented in 1996 by the DVD Forum, which specifies rewritable DVD-RAM media and the appropriate DVD writers. DVD-RAM media have been used in computers as well as camcorders and personal video recorders since 1998....
. The newer HD-DVDs and Blu-ray Discs
Blu-ray Disc

Blu-ray Disc is an optical disc data storage device medium. Its main uses are high-definition video and data storage. The disc has the same physical dimensions as standard DVDs and CDs....
 dramatically increase the amount of data possible on a single optical storage disk, though, as yet, the hardware may be cost prohibitive for many people. Additionally the physical lifetime of the optical disk has become a concern as it is possible for some optical disks to degrade and lose data within a couple of years. Floppy disk
Floppy disk

A floppy disk is a data storage medium that is composed of a disk of thin, flexible magnetic storage medium encased in a square or rectangle plastic shell....
 : During the 1980s and early 1990s, many personal/home computer users associated backup mostly with copying floppy disks. The low data capacity of a floppy disk makes it an unpopular and obsolete choice today. Solid state storage : Also known as flash memory
Flash memory

Flash memory is a non-volatile memory computer storage that can be electrically erased and reprogrammed. It is a technology that is primarily used in memory cards and USB flash drives for general storage and transfer of data between computers and other digital products....
, thumb drives, USB flash drive
USB flash drive

A USB flash drive consists of a Flash memory#NAND memories-type flash memory data storage device integrated with a USB interface. USB flash drives are typically removable and rewritable, much smaller than a floppy disk , and most USB flash drives weigh less than an ounce ....
s, CompactFlash
CompactFlash

CompactFlash is a mass storage device format used in portable electronic devices. For storage, CompactFlash typically uses flash memory in a standardized enclosure....
, SmartMedia
SmartMedia

SmartMedia is a flash memory memory card standard owned by Toshiba, with capacities ranging from 0.5 MB to 128 MB. SmartMedia memory cards are no longer manufactured, and there have been no new devices designed for use with SmartMedia for many years....
, Memory Stick
Memory Stick

Memory Stick is a removable flash memory memory card format, launched by Sony in October 1998 , and is also used in general to describe the whole family of Memory Sticks....
, Secure Digital card
Secure Digital card

Secure Digital is a non-volatile memory memory card format developed by Matsushita Electric Industrial Co., SanDisk, and Toshiba for use in portable devices....
s, etc., these devices are relatively costly for their low capacity, but offer excellent portability and ease-of-use. Remote backup service
Remote backup service

A remote, online, or managed backup service is a service that provides users with an online system for backup and storing computer files. List of online backup services are companies that provide this type of service....
 : As broadband internet access
Broadband Internet access

Broadband Internet access, often shortened to just broadband, is high data rate Internet access?typically contrasted with Dial-up internet access over a 56k modem....
 becomes more widespread, remote backup services are gaining in popularity. Backing up via the internet to a remote location can protect against some worst-case scenarios such as fires, floods, or earthquakes which would destroy any backups in the immediate vicinity along with everything else. There are, however, a number of drawbacks to remote backup services. First, internet connections (particularly domestic broadband connections) are generally substantially slower than the speed of local data storage devices, which can be a problem for people who generate or modify large amounts of data. Secondly, users need to trust a third party service provider with both privacy and integrity of backed up data. The risk associated with putting control of personal or sensitive data in the hands of a third party can be managed by encrypting
Encryption

In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key ....
 sensitive data so that its contents cannot be viewed without access to the secret key
Key (cryptography)

In cryptography, a key is a piece of information that determines the functional output of a cryptographic algorithm or cipher. Without a key, the algorithm would have no result....
.

Managing the data repository

Regardless of the data repository model or data storage media used for backups, a balance needs to be struck between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the needs of the situation. Using on-line disks for staging data before it is sent to a near-line tape library
Tape library

In computer storage, a tape library, sometimes called a tape silo, or tape jukebox, is a storage device which contains one or more tape drives, a number of slots to hold magnetic tape data storage cartridges, a barcode reader to identify tape cartridges and an automated method for loading tapes ....
 is a common example.

On-line : On-line backup storage is typically the most accessible type of data storage, which can begin restore in milliseconds time. A good example would be an internal hard disk or a disk array
Disk array

A disk array is a disk storage system which contains multiple disk drives. It is differentiated from a disk enclosure, in that an array has cache memory and advanced functionality, like redundant array of independent disks and virtualization....
 (maybe connected to SAN
Storage area network

A storage area network is an architecture to attach remote computer storage devices to Server s in such a way that the devices appear as Direct-attached storage to the operating system....
). This type of storage is very convenient and speedy, but is relatively expensive. On-line storage is vulnerable to being deleted or overwritten, either by accident, or in the wake of a data-deleting virus
Computer virus

A computer virus is a computer program that can copy itself and infect a computer without the permission or knowledge of the user. The term "virus" is also commonly but erroneously used to refer to other types of malware, adware and spyware programs that do not have the reproductive ability....
 payload. Near-line
Nearline storage

Nearline storage is a term used in computer science to describe an intermediate type of data storage that represents a compromise between online storage and offline storage/archiving ....
 : Near-line storage is typically less accessible and less expensive than on-line storage, but still useful for backup data storage. A good example would be a tape library
Tape library

In computer storage, a tape library, sometimes called a tape silo, or tape jukebox, is a storage device which contains one or more tape drives, a number of slots to hold magnetic tape data storage cartridges, a barcode reader to identify tape cartridges and an automated method for loading tapes ....
 with restore times ranging from seconds to a few minutes. A mechanical device is usually involved in moving media units from storage into a drive where the data can be read or written. Off-line : Off-line storage is similar to near-line, except it requires human interaction to make storage media available. This can be as simple as storing backup tapes in a file cabinet. Media access time can be anywhere from a few seconds to more than an hour. Off-site vault
Off-site Data Protection

In computing, off-site data protection, or vaulting, is the strategy of sending critical data out of the main location as part of a disaster recovery plan....
 : To protect against a disaster or other site-specific problem, many people choose to send backup media to an off-site vault. The vault can be as simple as the System Administrator’s home office or as sophisticated as a disaster hardened, temperature controlled, high security bunker that has facilities for backup media storage. Backup site, Disaster Recovery Center or DR Center: In the event of a disaster, the data on backup media will not be sufficient to recover. Computer systems onto which the data can be restored and properly configured networks are necessary too. Some organizations have their own data recovery centers that are equipped for this scenario. Other organizations contract this out to a third-party recovery center. Note that because DR site is itself a huge investment, backup is very rarely considered preferred method of moving data to DR site. More typical way would be remote disk mirroring
Disk mirroring

In data storage, disk mirroring or RAID1 is the replication of logical disk volumes onto separate physical hard disks in Real-time computing to ensure continuous availability....
, which keeps the DR data as up-to-date as possible.

Selection, extraction and manipulation of data


Selection and extraction of file data

Deciding what to back up at any given time is a harder process than it seems. By backing up too much redundant data, the data repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information.

Copying files
File copying

File copying is creation of a new computer file which has the same content as an existing file.All operating systems include file copying in the user interface, like "cp " in Unix and "copy " in MS-DOS; operating systems with GUIs usually provide copy-and-paste or drag-and-drop methods of file copying.  File managers, too, provide an e...
 : Making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems.

Partial file copying : Instead of copying whole files, one can limit the backup to only the blocks or bytes within a file that have changed in a given period of time. This technique can use substantially less storage space on the backup medium, but requires a high level of sophistication to reconstruct files in a restore situation. Some implementations require integration with the source filesystem.

Filesystem dump : Instead of copying files within a filesystem, a copy of the whole filesystem itself can be made. This is also known as a raw partition backup and is related to disk imaging
Disk image

A disk image is a single file containing the complete contents and structure representing a data storage medium or device, such as a hard drive, CD, or DVD....
. The process usually involves unmounting the filesystem and running a program like dump
Dump (program)

dump is a Unix Computer program used to Backup file systems. It operates on Block , below filesystem abstractions such as files and directories....
. This type of backup has the possibility of running faster than a backup that simply copies files. A feature of some dump software is the ability to restore specific files from the dump image.

Identification of changes : Some filesystems have an archive bit for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup, to determine whether the file was changed.

Versioning file system
Versioning file system

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control....
 : A versioning filesystem keeps track of all changes to a file and makes those changes accessible to the user. Generally this gives access to any previous version, all the way back to the file's creation time. An example of this is the Wayback versioning filesystem for Linux.

Selection and extraction of live data

If a computer system is in use while it is being backed up, the possibility of files being open for reading or writing is real. If a file is open, the contents on disk may not correctly represent what the owner of the file intends. This is especially true for database files of all kinds. The term fuzzy backup
Fuzzy backup

A fuzzy backup is a secondary copy of data file or directories that were in one state when the backup started, but in a different state by the time the backup completed....
 can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at any single point in time. This is because the data being backed up changed in the period of time between when the backup started and when it finished. For databases in particular, fuzzy backups are worthless.

Snapshot
Snapshot (computer storage)

In file system, a snapshot is a copy of a set of files and directories as they were at a particular point in the past. The term was coined as an analogy to Snapshot ....
 backup : A snapshot is an instantaneous function of some storage systems that presents a copy of the filesystem as if it was frozen in a specific point in time, often by a copy-on-write
Copy-on-write

Copy-on-write is an Optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource....
 mechanism. An effective way to back up live data is to temporarily quiesce it (e.g. close all files), take a snapshot, and then resume live operations. At this point the snapshot can be backed up through normal methods. While a snapshot is very handy for viewing a filesystem as it was at a different point in time, it is hardly an effective backup mechanism by itself.

Open file backup : Many backup software packages feature the ability to handle open files in backup operations. Some simply check for openness and try again later. File locking
File locking

File locking is a mechanism that enforces access to a computer file by only one user or computer process at any specific time. The purpose of locking is to prevent the classic interceding update scenario....
 is useful for regulating access to open files.
When attempting to understand the logistics of backing up open files, one must consider that the backup process could take several minutes to back up a large file such as a database. In order to back up a file that is in use, it is vital that the entire backup represent a single-moment snapshot of the file, rather than a simple copy of a read-through. This represents a challenge when backing up a file that is constantly changing. Either the database file must be locked to prevent changes, or a method must be implemented to ensure that the original snapshot is preserved long enough to be copied, all while changes are being preserved. Backing up a file while it is being changed, in a manner that causes the first part of the backup to represent data before changes occur to be combined with later parts of the backup after the change results in a corrupted file that is unusable, as most large files contain internal references between their various parts that must remain consistent throughout the file.


Cold database backup : During a cold backup, the database is closed or locked and not available to users. The datafiles do not change during the backup process so the database is in a consistent state when it is returned to normal operation.

Hot database backup : Some database management systems offer a means to generate a backup image of the database while it is online and usable ("hot"). This usually includes an inconsistent image of the data files plus a log of changes made while the procedure is running. Upon a restore, the changes in the log files are reapplied to bring the database in sync.

Selection and extraction of metadata

Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too. System description : System specifications are needed to procure an exact replacement after a disaster. Boot sector
Boot sector

A boot sector is a disk_sector of a hard disk, floppy disk, or similar data storage device that contains code for booting computer programs stored in other parts of the disk....
 : The boot sector can sometimes be recreated more easily than saving it. Still, it usually isn't a normal file and the system won't boot without it. Partition
Disk partitioning

Disk partitioning is the dividing of the data storage space of a hard disk drive into separate areas referred to as partitions. A partition editor program can be used to create, delete or modify these partitions....
 layout : The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system. File metadata
Metadata

Metadata is "data about other data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema....
 : Each file's permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment. System metadata : Different operating systems have different ways of storing configuration information. Windows keeps a registry
Windows registry

The Windows Registry is a directory which stores settings and options for Microsoft Windows operating systems. It contains information and settings for all the hardware, operating system software, most non-operating system software, and per-user settings....
 of system information that is more difficult to restore than a typical file.

Manipulation of data and dataset optimisation

It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations provide many benefits including improved backup speed, restore speed, data security, media usage and reduced bandwidth requirements. Compression
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 : Various schemes can be employed to shrink the size of the source data to be stored so that uses less storage space. Compression is frequently a built-in feature of tape drive hardware. De-duplication
Capacity optimization

Capacity optimization technologies are similar to data compression technologies, but they look for redundancy of very large sequences of bytes across very large comparison windows....
 : When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication; Duplication
Replication (computer science)

Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility....
 : Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed, to have a second copy at a different location or on a different storage medium. Encryption
Encryption

In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key ....
 : High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, but presents new problems. First, encryption is a CPU intensive process that can slow down backup speeds. Second, once data has been encrypted, it can not be effectively compressed and the data compression function of many tape drives is ineffective. For this reason and since redundant data makes cryptanalytic attacks easier, many encryption implementations compress the data before encrypting it. Third, the security of the encrypted backups is only as effective as the security of the key management policy. Multiplexing
Multiplexing

In telecommunications and computer networks, multiplexing is a process where multiple analog message signals or digital data streams are combined into one signal over a shared medium....
 : When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful. Refactoring : The process of rearranging the backup sets in a data repository is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could potentially require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape. This is especially useful for backup systems that do incrementals forever style backups. Staging
Disk staging

Disk staging is using hard disk as an additional, temporary stage of backup process before finally storing backup to magnetic tape data storage....
 : Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.

Managing the backup process

It is important to understand that backup is a process. As long as new data is being created and changes are being made, backups will need to be updated. Individuals and organizations with anything from one computer to thousands (or even millions) of computer systems all have requirements for protecting data. While the scale is different, the objectives and limitations are essentially the same. Likewise, those who perform backups need to know to what extent they were successful, regardless of scale.

Objectives

Recovery Point Objective (RPO) : The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization
File synchronization

File synchronization in computing is the process of making sure that two or more locations contain the same up-to-date Computer file. If you add, change, or delete a file from one location, the synchronization process will add, change, or delete the same file at the other location....
 between the source data and the backup repository. Recovery Time Objective (RTO) : The amount of time elapsed between disaster and restoration of business functions. Data security : In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies.

Limitations

An effective backup scheme will take into consideration the limitations of the situation. Backup window : The period of time when backups are permitted to run on a system is called the backup window. This is typically the time when the system see the least usage and the backup process will have the least amount of interference with normal operations. The backup window is usually planned with users' convenience in mind. If a backup extends past the defined backup window, a decision is made whether it is more beneficial to abort the backup or to lengthen the backup window. Performance impact : All backup schemes have some performance impact on the system being backed up. For example, for the period of time that a computer system is being backed up, the hard drive is busy reading files for the purposes of the backup, and its full bandwidth is no longer available for other tasks. Such impacts should be analyzed. Costs of hardware, software, labor : All types of storage media have a finite capacity with a real cost. Matching the correct amount of storage capacity (over time) with the backup needs is an important part of the design of a backup scheme. Any backup scheme has some labor requirement, but complicated schemes have considerably higher labor requirements. The cost of commercial backup software can also be considerable. Network Bandwidth : Distributed backup systems can be impacted by limited network bandwidth.

Implementation

Meeting the defined objectives in the face of the above limitations can be a difficult task. The tools and concepts below can make that task more achievable. Scheduling : Using a Job scheduler
Job scheduler

A job scheduler is an enterprise software application that is in charge of unattended background executions, commonly known for historical reasons as batch processing....
 can greatly improve the reliability and consistency of backups by removing part of the human element. Many backup software packages include this functionality. Authentication : Over the course of regular operations, the user accounts and/or system agents that perform the backups need to be authenticated at some level. The power to copy all data off of or onto a system requires unrestricted access. Using an authentication mechanism is a good way to prevent the backup scheme from being used for unauthorized activity. Chain of trust
Chain of trust

In computer security, a chain of trust is established by validating each component of hardware and software from the bottom up. It is intended to ensure that only trusted software and hardware can be used while still remaining flexible....
 : Removable storage media are physical items and must only be handled by trusted individuals. Establishing a chain of trusted individuals (and vendors) is critical to defining the security of the data.

Measuring the process

To ensure that the backup scheme is working as expected, the process needs to include monitoring key factors and maintaining historical data.

Backup validation
Backup validation

Backup validation is the process whereby owners of computer data may examine how their data was backed up in order to understand what their risk of data loss might be....
 : (also known as "Backup Success Validation") The process by which owners of data can get information regarding how their data was backed up. This same process is also used to prove compliance to regulatory bodies outside of the organization, for example, an insurance company might be required under HIPAA to show "proof" that their patient data are meeting records retention requirements. Disaster, data complexity, data value and increasing dependence upon ever-growing volumes of data all contribute to the anxiety around and dependence upon successful backups to ensure business continuity
Business continuity

Business Continuity is the activity performed by an organization to ensure that critical business functions will be available to customers, suppliers, regulators, and other entities that must have access to those functions....
. For that reason, many organizations rely on third-party or "independent" solutions to test, validate, and optimize their backup operations (backup reporting). Reporting : In larger configurations, reports are useful for monitoring media usage, device status, errors, vault coordination and other information about the backup process. Logging : In addition to the history of computer generated reports, activity and change logs are useful for monitoring backup system events. Validation : Many backup programs make use of checksum
Checksum

A checksum or hash sum is a fixed-size data computed from an arbitrary block of digital data for the purpose of error detection that may have been introduced during its telecommunications or computer storage....
s or hash
Hash function

A hash function is any algorithm or function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an array index into an array....
es to validate that the data was accurately copied. These offer several advantages. First, they allow data integrity to be verified without reference to the original file: if the file as stored on the backup medium has the same checksum as the saved value, then it is very probably correct. Second, some backup programs can use checksums to avoid making redundant copies of files, to improve backup speed. This is particularly useful for the de-duplication process. Monitored Backup : Backup processes are monitored by a third party monitoring center. This center alerts users to any errors that occur during automated backups. Monitored backup requires software capable of pinging the monitoring center's servers in the case of errors.

Lore


Confusion

Due to a considerable overlap in technology, backups and backup systems are frequently confused with archive
Archive

An archive refers to a collection of historical records, and also refers to the location in which these records are kept.'Archives' are made up of records which have been accumulated over the course of an individual or organization's lifetime....
s and fault-tolerant system
Fault-tolerant system

Fault-tolerance or graceful degradation is the property that enables a system to continue operating properly in the event of the failure of some of its components....
s. Backups differ from archives in the sense that archives are the primary copy of data, usually put away for future use, while backups are a secondary copy of data, kept on hand to replace the original item. Backup systems differ from fault-tolerant systems in the sense that backup systems assume that a fault will cause a data loss event and fault-tolerant systems assume a fault will not.

Advice

  • The more important the data that is stored on the computer the greater the need is for backing up this data.
  • A backup is only as useful as its associated restore strategy.
  • Storing the copy near the original is unwise, since many disasters such as fire, flood and electrical surges are likely to cause damage to the backup at the same time.
  • Automated backup and scheduling should be considered, as manual backups can be affected by human error.
  • Backups will fail for a wide variety of reasons. A verification or monitoring strategy is an important part of a successful backup plan.
  • It is good to store backed up archives in open/standard formats. This helps with recovery in the future when the software used to make the backup is obsolete. It also allows different software to be used.


Events

  • In 1997, during a fire at the headquarters of Credit Lyonnais
    Crédit Lyonnais

    Cr?dit Lyonnais is a historic France bank. In the early 1990s it was the largest French bank, majority state-owned at that point. Cr?dit Lyonnais was the subject of poor management during that period which almost led to its bankruptcy in 1993....
    , a major bank in Paris, system administrators ran into the burning building to rescue backup tapes because they didn't have offsite copies. Crucial bank archives and computer data were lost.
  • Privacy Rights Clearinghouse
    Privacy Rights Clearinghouse

    Privacy Rights Clearinghouse is a project of the , an United States 501 non-profit consumer advocacy organization. The Privacy Rights Clearinghouse is devoted to upholding the right to privacy and protecting consumers against identity theft and other privacy crimes....
     has documented 16 instances of stolen or lost backup tapes (among major organizations) in 2005 & 2006. Affected organizations included Bank of America
    Bank of America

    Bank of America Corporation , based in Charlotte, North Carolina, is the largest financial services company in the world, largest bank by assets, second largest commercial bank by deposits, and third largest by market capitalization in the United States....
    , Ameritrade, Citigroup
    Citigroup

    Citigroup Inc., doing business as Citi, is a major United States financial services company based in New York City. Citigroup was formed from one of the world's largest mergers in history by combining the banking giant Citicorp and financial conglomerate Travelers Group on April 7, 1998....
    , and Time Warner
    Time Warner

    Time Warner Inc. is the world's third largest media and entertainment Conglomerate by market capitalization , headquartered in the Time Warner Center in New York City....
    .
  • On 3 January 2008, an email server crashed at TeliaSonera
    TeliaSonera

    TeliaSonera AB is the dominant telephone company and mobile network operator in Sweden and Finland. The company just launched fiber broadband in Denmark, and is also active in other countries in Northern Europe, Eastern Europe, Central Asia and Spain, with a total of 106 million mobile customers ....
    , a major Nordic telecom company and internet service provider
    Internet service provider

    An Internet service provider is a company that offers its customers access to the Internet. The ISP connects to its customers using a data transmission technology appropriate for delivering Internet Protocol datagrams, such as dial-up, DSL, cable modem or dedicated high-speed interconnects....
    . It was subsequently discovered that the last serviceable backup set was from 15 December 2007. Three hundred thousand customer email accounts were affected.


See also

  • Glossary of backup terms
    Glossary of backup terms

    The subject of computer backups is rife with jargon and highly specialized terminology. This page is a glossary of backup terms that aims to clarify the meaning of such jargon and terminology....
  • Backup software
    Backup software

    Backup software is a computer program used to perform a complete back up of a file, data, database, system or server. The back up software enables you to make an exact duplicate of everything contained on the original source....
  • Backup rotation scheme
    Backup rotation scheme

    A backup rotation scheme is a method for effectively backup data where multiple media are used in the backup process. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it....
  • Incremental backup
    Incremental backup

    An incremental backup is a backup method where multiple backups are kept . These backups will be incremental if each original piece of backed up information is stored only once, and then successive backups only contain the information that changed since a previous backup....
  • Computer data storage
  • Data proliferation
    Data proliferation

    Data proliferation refers to the unprecedented amount of data, structured data and unstructured, that business and government continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data....
  • File synchronization
    File synchronization

    File synchronization in computing is the process of making sure that two or more locations contain the same up-to-date Computer file. If you add, change, or delete a file from one location, the synchronization process will add, change, or delete the same file at the other location....
  • Information repository
    Information repository

    An information repository is an easy to deploy secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata, processed, de-duplicated, and then purged, automat...
  • Disaster recovery and business continuity auditing
    Disaster recovery and business continuity auditing

    Disaster recovery and business continuity refers to an organization?s ability to recover from a disaster and/or unexpected event and resume or continue operations....
  • Digital preservation
    Digital preservation

    Digital preservation is the management of digital information over time. Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media....
  • Reversible computing
    Reversible computing

    Reversible computing, sometimes called non-destructive computing includes any computational process that is reversible, i.e., time-invertible function, meaning that a time-reversed version of the process could exist within the same general dynamical system as the original process....