Dd (Unix)
Encyclopedia
In computing
, dd is a common Unix
program whose primary purpose is the low-level copying and conversion of raw data. According to the manual page for Version 7 Unix, it will "convert and copy a file". It is used to copy a specified number of byte
s or block
s, performing on-the-fly byte order conversions, as well as more esoteric EBCDIC
to ASCII
conversions. It can also be used to copy regions of raw device files, for example backing up the boot sector
of a hard disk
, or to read fixed amounts of data from special files like /dev/zero
or /dev/random
.
The name dd may stand for "data" or "disk duplication". It is jokingly said to stand for "disk destroyer", "data destroyer", "death and destruction", or "delete data", since when used for low-level operations on hard disks, a small mistake, such as reversing the if and of (input and output) parameters, could result in the loss of some or all data on a disk.
The syntax of dd is likely inspired from DD found in IBM JCL, and the command's syntax is meant to be reminiscent of this; in JCL, "DD" stands for Data Description. The Jargon File
states that it is rumored to have been based on IBM's JCL
, and the syntax may have been a joke.
Usage varies across different operating system
s. Also, certain features of dd will depend on the computer system cababilities, such as dd's ability to implement an option for direct memory access. Sending a SIGINFO
signal (or a USR1 signal on Linux) to a running dd process makes it print I/O statistics to standard error and then continue copying. Dd can read standard input from the keyboard. When EOF
(end of file) is read, dd will exit. Signals and EOF are determined by the software. For example, Unix tools ported to Windows vary as to the EOF: Cygwin
uses (the usual, Unix EOF) and MKS Toolkit
uses (the usual, Windows EOF).
In compliance with the Unix philosophy
, dd does one thing well. Unlike a sophisticated and highly abstracted utility, dd has no algorithm other than in the low-level decisions of the user concerning how to vary the run options. Often the options are changed for each run of dd in a multi-step process to solve a computer problem.
Each of the "Records in" and "Records out" lines shows the number of complete blocks transferred + the number of partial blocks, e.g. because the physical medium ended before a complete block was read.
size is a crucial operating factor. Each run of dd will use one set of block sizes. There are block
sizes for input and output. Block sizes can adapt dd to the realm of its application, and to the phase of an operation involving many runs of dd. An input block size is ibs, but bs will override ibs. An output block size will depend on obs, and cbs, and sync will pad to comply with cbs.
For example, in data recovery in an area of errors on a hard drive, the most bytes will be recovered by using a small block size; for the greatest speed a large block size is chosen according to (a point of diminishing returns concerning) the system it runs on. If the transfer uses a network, dd can operate using a suitable block size depending on congestion levels.
Some implementations understand the letter x as a multiplication operator in the block size and count parameters:
dd bs=2x80x18b if=/dev/fd0 of=floppy.image
where the "b" suffix indicates that the units are 512-byte blocks. Unix block devices use this as their allocation unit by default.
For the value of bs field, following decimal number can be suffixed:
Hence
.
In a different terminal obtain the pid of the dd process by doing
ps -a
You may get a output like
18255 pts/5 00:00:00 ssh
24084 pts/2 00:00:04 dd
24334 pts/4 00:00:00 ps
To send a USR1 signal to dd, continue the following:
sudo kill -USR1 24084
In the terminal where dd is running you will see its output, something like:
349389+0 records in
349389+0 records out
1431097344 bytes (1.4 GB) copied, 935.624 s, 1.5 MB/s
One can do this as many as times as required to see the continuous progress.
An attempt to copy the entire disk using cp
may omit the final block if it is an unexpected length; whereas dd may succeed. The source and destination disks should have the same size.
The noerror means to keep going if there is an error. The sync option means to pad the output blocks
To duplicate the first two sectors of a floppy drive:
dd if=/dev/fd0 of=MBRboot.img bs=512 count=2
To create an image of the entire master boot record
(including the partition table
):
dd if=/dev/sda of=MBR.img bs=512 count=1
To create an image of only the boot code of the master boot record
(without the partition table
):
dd if=/dev/sda of=MBR_boot.img bs=446 count=1
Overwrite the first 512 bytes of a file with null bytes:
dd if=/dev/zero of=path/to/file bs=512 count=1 conv=notrunc
The notrunc conversion option means do not truncate the output file — that is, if the output file already exists, just replace the specified bytes and leave the rest of the output file alone. Without this option, dd would create an output file 512 bytes long.
To duplicate a disk partition as a disk image file on a different partition:
dd if=/dev/sdb2 of=partition.image bs=4096 conv=noerror
To check to see if a drive has data on it, send the output to standard out.
dd if=/dev/sda
To wipe a disk, first, consider the operation that would create a 1 GiB file containing only zeros (bs specifies block size, count the number of blocks):
dd if=/dev/zero of=file1G.tmp bs=1M count=1024
Count is the number of reads dd does. Multiplying 1M times 1024 gives us 1 GiB.
Now here are ways to use dd to wipe a disk:
The output may be piped to various other Unix utilities in order to facilitate the report.
(OSS) for data recovery
and restoration of files, drives, and partitions started with GNU dd in 1984, with one block size per dd process, and no recovery algorithm other than the user's interactive session running one form of dd after another. Then a C program was authored Oct. 1999 called dd_rescue. It has two block sizes in its algorithm. But the author of the 2003 shell script dd_rhelp that enhances dd_rescue's data recovery algorithm, now recommends GNU ddrescue, a C++ program that published in 2004 and is now in most Linux distributions. GNU ddrescue has the most sophisticated block-size-changing algorithm available in OSS. (The names ddrescue and dd_rescue are similar, yet they are very different programs. Still, the Debian
Linux distribution packages dd_rescue as "ddrescue", and packages the GNU ddrescue as "gdrescue" or as "gddrescue").
GNU ddrescue is stable and safe. Here is an untested rescue using 3 of ddrescue's 24 options:
Another open source program called savehd7 uses a sophisticated algorithm, but it also requires the installation of its own programming-language interpreter.
dd if=/dev/zero bs=1024 count=1000000 of=file_1GB
dd if=file_1GB of=/dev/null bs=64k
To make a file of 100 random bytes:
dd if=/dev/urandom of=myrandom bs=100 count=1
To convert a file to uppercase:
dd if=filename of=filename1 conv=ucase
Create a 1 GiB sparse file
or resize an existing file to 1 GiB without overwriting:
dd if=/dev/zero of=mytestfile.out bs=1 count=0 seek=1G
documentation warns, "Certain disc utilities, such as DD, which depend on low-level disc access may not support 48-bit
LBA
s until they are updated." Using ATA harddrives over 128 GiB requires 48-bit LBA. However, in Linux
, dd uses the kernel to read or write to raw device files. Support for 48-bit LBA has been present since version 2.4.23 of the kernel.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, dd is a common Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
program whose primary purpose is the low-level copying and conversion of raw data. According to the manual page for Version 7 Unix, it will "convert and copy a file". It is used to copy a specified number of byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
s or block
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
s, performing on-the-fly byte order conversions, as well as more esoteric EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
to ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
conversions. It can also be used to copy regions of raw device files, for example backing up the boot sector
Boot sector
A boot sector or boot block is a region of a hard disk, floppy disk, optical disc, or other data storage device that contains machine code to be loaded into random-access memory by a computer system's built-in firmware...
of a hard disk
Hard disk
A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...
, or to read fixed amounts of data from special files like /dev/zero
/dev/zero
/dev/zero is a special file in Unix-like operating systems that provides as many null characters as are read from it. One of the typical uses is to provide a character stream for initializing data storage.-Function:...
or /dev/random
/dev/random
In Unix-like operating systems, /dev/random is a special file that serves as a random number generator or as a pseudorandom number generator. It allows access to environmental noise collected from device drivers and other sources. Not all operating systems implement the same semantics for /dev/random...
.
The name dd may stand for "data" or "disk duplication". It is jokingly said to stand for "disk destroyer", "data destroyer", "death and destruction", or "delete data", since when used for low-level operations on hard disks, a small mistake, such as reversing the if and of (input and output) parameters, could result in the loss of some or all data on a disk.
The syntax of dd is likely inspired from DD found in IBM JCL, and the command's syntax is meant to be reminiscent of this; in JCL, "DD" stands for Data Description. The Jargon File
Jargon File
The Jargon File is a glossary of computer programmer slang. The original Jargon File was a collection of terms from technical cultures such as the MIT AI Lab, the Stanford AI Lab and others of the old ARPANET AI/LISP/PDP-10 communities, including Bolt, Beranek and Newman, Carnegie Mellon...
states that it is rumored to have been based on IBM's JCL
Job Control Language
Job Control Language is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem....
, and the syntax may have been a joke.
Usage
The command line syntax of dd is significantly different from most other Unix programs, and because of its ubiquity it is resistant to recent attempts to enforce a common syntax for all command line tools. Generally, dd uses an option=value format, whereas most Unix programs use either-
option value or --option=value format. Also, the input is specified using the "if" (from input file) option, while most programs simply take the name by itself.Usage varies across different operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s. Also, certain features of dd will depend on the computer system cababilities, such as dd's ability to implement an option for direct memory access. Sending a SIGINFO
SIGINFO
On some Unix-like platforms, SIGINFO is the signal sent to computer programs when a status request is received from the keyboard. The symbolic constant for SIGINFO is defined in the header file signal.h...
signal (or a USR1 signal on Linux) to a running dd process makes it print I/O statistics to standard error and then continue copying. Dd can read standard input from the keyboard. When EOF
EOF
EOF may refer to:*End-of-file, the computing term for an end-of-file condition or its tangible indication*Empirical orthogonal functions, a statistical technique for simplifying a dataset*Enterprise Objects Framework, a product from Apple Computer...
(end of file) is read, dd will exit. Signals and EOF are determined by the software. For example, Unix tools ported to Windows vary as to the EOF: Cygwin
Cygwin
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...
uses
MKS Toolkit
MKS Toolkit is a software package produced and maintained by MKS Inc. that provides a Unix-like environment for scripting, connectivity and porting Unix and Linux software to both 32- and 64-bit Microsoft Windows systems. It was originally created for MS-DOS....
uses
In compliance with the Unix philosophy
Unix philosophy
The Unix philosophy is a set of cultural norms and philosophical approaches to developing software based on the experience of leading developers of the Unix operating system.-McIlroy: A Quarter Century of Unix:...
, dd does one thing well. Unlike a sophisticated and highly abstracted utility, dd has no algorithm other than in the low-level decisions of the user concerning how to vary the run options. Often the options are changed for each run of dd in a multi-step process to solve a computer problem.
Output messages
The GNU variant of dd as supplied with Linux does not describe the format of the messages displayed on stdout on completion, however these are described by other implementations e.g. that with BSD.Each of the "Records in" and "Records out" lines shows the number of complete blocks transferred + the number of partial blocks, e.g. because the physical medium ended before a complete block was read.
Block size
BlockBlock (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
size is a crucial operating factor. Each run of dd will use one set of block sizes. There are block
Disk sector
In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user data. Traditional formatting of these storage media provides space for 512 bytes or 2048 bytes of user-accessible data per sector...
sizes for input and output. Block sizes can adapt dd to the realm of its application, and to the phase of an operation involving many runs of dd. An input block size is ibs, but bs will override ibs. An output block size will depend on obs, and cbs, and sync will pad to comply with cbs.
For example, in data recovery in an area of errors on a hard drive, the most bytes will be recovered by using a small block size; for the greatest speed a large block size is chosen according to (a point of diminishing returns concerning) the system it runs on. If the transfer uses a network, dd can operate using a suitable block size depending on congestion levels.
Some implementations understand the letter x as a multiplication operator in the block size and count parameters:
dd bs=2x80x18b if=/dev/fd0 of=floppy.image
where the "b" suffix indicates that the units are 512-byte blocks. Unix block devices use this as their allocation unit by default.
For the value of bs field, following decimal number can be suffixed:
- w means 2
- b means 512
- k means 1024
- M specifies multiplication by 10242
- G specifies multiplication by 10243
Hence
bs=2x80x18b
means 2 × 80 × 18 × 512 = 1474560 which is the exact size of a 1440 KiB floppy diskFloppy disk
A floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...
.
Progress Information
dd is a silent tool which is very useful for scripting. However, if the progress is to be seen, use the following command on a GNU/Linux machines.In a different terminal obtain the pid of the dd process by doing
ps -a
You may get a output like
18255 pts/5 00:00:00 ssh
24084 pts/2 00:00:04 dd
24334 pts/4 00:00:00 ps
To send a USR1 signal to dd, continue the following:
sudo kill -USR1 24084
In the terminal where dd is running you will see its output, something like:
349389+0 records in
349389+0 records out
1431097344 bytes (1.4 GB) copied, 935.624 s, 1.5 MB/s
One can do this as many as times as required to see the continuous progress.
Data transfer
dd can duplicate data across files, devices, partitions and volumes. The data may be input or output to and from any of these; but there are important differences concerning the output when going to a partition. Also, during the transfer, the data can be modified using the conv options to suit the medium.An attempt to copy the entire disk using cp
Cp (Unix)
cp is a UNIX command used to copy a file. Files can be copied either to the same directory or to a completely different directory, possibly on a different file system or hard disk drive. If the file is copied to the same directory, the new file must have a different name to the original; in all...
may omit the final block if it is an unexpected length; whereas dd may succeed. The source and destination disks should have the same size.
dd if=/dev/sr0 of=myCD.iso bs=2048 conv=noerror,sync | create an ISO ISO image An ISO image is an archive file of an optical disc, composed of the data contents of every written sector of an optical disc, including the optical disc file system... disk image Disk image A disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drive, tape drive, floppy disk, CD/DVD/BD, or USB flash drive, although an image of an optical disc may be referred to as an optical disc image... from a CD-ROM. |
dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=noerror | Clone Disk cloning Disk cloning is the process of copying the contents of one computer hard disk to another disk or to an "image" file. Often, the contents of the first disk are written to an image file as an intermediate step, and the second disk is loaded with the contents of the image... one partition to another |
dd if=/dev/ad0 of=/dev/ad1 bs=1M conv=noerror | Clone a hard disk "ad0" to "ad1". |
The noerror means to keep going if there is an error. The sync option means to pad the output blocks
Master boot record
It is possible to repair a master boot record. It can be transferred to and from a repair file.To duplicate the first two sectors of a floppy drive:
dd if=/dev/fd0 of=MBRboot.img bs=512 count=2
To create an image of the entire master boot record
Master boot record
A master boot record is a type of boot sector popularized by the IBM Personal Computer. It consists of a sequence of 512 bytes located at the first sector of a data storage device such as a hard disk...
(including the partition table
Partition table
The term partition table is most commonly associated with partition table but it may be used generically to refer to other "formats" that divide a disk drive into partitions, such as: GUID Partition Table, Apple partition map, or BSD disklabel.An alternative term to generically refer to partition...
):
dd if=/dev/sda of=MBR.img bs=512 count=1
To create an image of only the boot code of the master boot record
Master boot record
A master boot record is a type of boot sector popularized by the IBM Personal Computer. It consists of a sequence of 512 bytes located at the first sector of a data storage device such as a hard disk...
(without the partition table
Partition table
The term partition table is most commonly associated with partition table but it may be used generically to refer to other "formats" that divide a disk drive into partitions, such as: GUID Partition Table, Apple partition map, or BSD disklabel.An alternative term to generically refer to partition...
):
dd if=/dev/sda of=MBR_boot.img bs=446 count=1
Data modification
dd can modify data in place.Overwrite the first 512 bytes of a file with null bytes:
dd if=/dev/zero of=path/to/file bs=512 count=1 conv=notrunc
The notrunc conversion option means do not truncate the output file — that is, if the output file already exists, just replace the specified bytes and leave the rest of the output file alone. Without this option, dd would create an output file 512 bytes long.
To duplicate a disk partition as a disk image file on a different partition:
dd if=/dev/sdb2 of=partition.image bs=4096 conv=noerror
Disk wipe
For security reasons, it is necessary to have a disk wipe of the discarded device.To check to see if a drive has data on it, send the output to standard out.
dd if=/dev/sda
To wipe a disk, first, consider the operation that would create a 1 GiB file containing only zeros (bs specifies block size, count the number of blocks):
dd if=/dev/zero of=file1G.tmp bs=1M count=1024
Count is the number of reads dd does. Multiplying 1M times 1024 gives us 1 GiB.
Now here are ways to use dd to wipe a disk:
The output may be piped to various other Unix utilities in order to facilitate the report.
Data recovery
The history of open-source softwareOpen-source software
Open-source software is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software.Open...
(OSS) for data recovery
Data recovery
Data recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media such as internal or external hard disk drives, solid-state drives , USB flash drive,...
and restoration of files, drives, and partitions started with GNU dd in 1984, with one block size per dd process, and no recovery algorithm other than the user's interactive session running one form of dd after another. Then a C program was authored Oct. 1999 called dd_rescue. It has two block sizes in its algorithm. But the author of the 2003 shell script dd_rhelp that enhances dd_rescue's data recovery algorithm, now recommends GNU ddrescue, a C++ program that published in 2004 and is now in most Linux distributions. GNU ddrescue has the most sophisticated block-size-changing algorithm available in OSS. (The names ddrescue and dd_rescue are similar, yet they are very different programs. Still, the Debian
Debian
Debian is a computer operating system composed of software packages released as free and open source software primarily under the GNU General Public License along with other free software licenses. Debian GNU/Linux, which includes the GNU OS tools and Linux kernel, is a popular and influential...
Linux distribution packages dd_rescue as "ddrescue", and packages the GNU ddrescue as "gdrescue" or as "gddrescue").
GNU ddrescue is stable and safe. Here is an untested rescue using 3 of ddrescue's 24 options:
Another open source program called savehd7 uses a sophisticated algorithm, but it also requires the installation of its own programming-language interpreter.
Miscellaneous uses
To make drive benchmark test and analyze the sequential read and write performance for 1024 byte blocks :dd if=/dev/zero bs=1024 count=1000000 of=file_1GB
dd if=file_1GB of=/dev/null bs=64k
To make a file of 100 random bytes:
dd if=/dev/urandom of=myrandom bs=100 count=1
To convert a file to uppercase:
dd if=filename of=filename1 conv=ucase
Create a 1 GiB sparse file
Sparse file
In computer science, a sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty. This is achieved by writing brief information representing the empty blocks to disk instead of the actual "empty" space which...
or resize an existing file to 1 GiB without overwriting:
dd if=/dev/zero of=mytestfile.out bs=1 count=0 seek=1G
Limitations
SeagateSeagate Technology
Seagate Technology is one of the world's largest manufacturers of hard disk drives. Incorporated in 1978 as Shugart Technology, Seagate is currently incorporated in Dublin, Ireland and has its principal executive offices in Scotts Valley, California, United States.-1970s:On November 1, 1979...
documentation warns, "Certain disc utilities, such as DD, which depend on low-level disc access may not support 48-bit
LBA
Logical block addressing
Logical block addressing is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disks....
s until they are updated." Using ATA harddrives over 128 GiB requires 48-bit LBA. However, in Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, dd uses the kernel to read or write to raw device files. Support for 48-bit LBA has been present since version 2.4.23 of the kernel.
See also
- List of Unix programs
- BackupBackupIn information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....
- Disk cloningDisk cloningDisk cloning is the process of copying the contents of one computer hard disk to another disk or to an "image" file. Often, the contents of the first disk are written to an image file as an intermediate step, and the second disk is loaded with the contents of the image...
- Disk imageDisk imageA disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drive, tape drive, floppy disk, CD/DVD/BD, or USB flash drive, although an image of an optical disc may be referred to as an optical disc image...
- RaWrite
- Disk CopyDisk CopyDisk Copy was the default utility for handling disk images in System 7 through Mac OS X 10.2 . In later versions of Mac OS X it has been replaced by DiskImageMounter for mounting the images and Disk Utility for creating them.Although the last official public release of Disk Copy for Mac OS 9 was...
- Forensics (DD) Dcfldd
External links
- dd: manual page from GNUGNUGNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...
coreutils. - dd for Windows.
- savehd7 - Save a potentially damaged harddisk partition
- GNU ddrescue.
- Manual for GNU ddrescue.
- dd_rescue
- dd_rhelp
- Softpanorama dd page.
- DD at Linux Questions Wiki.
- How to use ddrescue to image a damaged harddisk partition and mount it in Windows.