dd is a common
UnixUnix is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
program whose primary purpose is the low-level copying and conversion of raw
dataThe term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...
.
dd is an abbreviation for "dataset definition" in
IBMInternational Business Machines Corporation, abbreviated IBM, is a multinational computer technology and IT consulting corporation headquartered in Armonk, Town of North Castle, New York, United States. The company is one of the few information technology companies with a continuous history dating...
JCLJob Control Language is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The term "Job Control Language" can also be used generically to refer to all languages which perform these functions, such as Burroughs' WFL...
, and the command's syntax is meant to be reminiscent of this.
dd is used to copy a specified number of
byteA byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...
s or blocks, performing on-the-fly
byte orderIn computing, endianness is the byte ordering used to represent some kind of data. Typical cases are the order in which integer values are stored as bytes in computer memory and the transmission order over a network or other medium...
conversions, as well as more esoteric
EBCDICExtended Binary Coded Decimal Interchange Code is an 8-bit character encoding used on IBM mainframe operating systems such as z/OS, OS/390, VM and VSE, as well as IBM midrange computer operating systems such as OS/400 and i5/OS...
to
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
conversions.
dd can also be used to copy regions of raw device files, e.g. backing up the
boot sectorA boot sector is a sector of a hard disk, floppy disk, or similar data storage device that contains code for booting programs stored in other parts of the disk....
of a
hard diskA hard disk drive is a non-volatile storage device that stores digitally encoded data on rapidly rotating platters with magnetic surfaces. Strictly speaking, "drive" refers to the motorized mechanical aspect that is distinct from its medium, such as a tape drive and its tape, or a floppy disk...
, or to read fixed amounts of data from special files like
/dev/zeroIn Unix-like operating systems, /dev/zero is a special file that provides as many null characters as are read from it. One of the typical uses is to provide a character stream for overwriting information. Another might be to generate a clean file of a certain size...
or
/dev/randomIn Unix-like operating systems, /dev/random is a special file that serves as a true random number generator or as a pseudorandom number generator. It allows access to environmental noise collected from device drivers and other sources. Not all operating systems implement the same semantics for...
.
It can also be used in
computer forensicsComputer forensics is a branch of forensic science pertaining to legal evidence found in computers and digital storage media. Computer forensics is also known as digital forensics....
when the magnetic pattern of an entire disk needs to be preserved as a
byteA byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...
-exact copy. Using
cpcp is the command entered in a Unix shell to copy a file from one place to another, possibly on a different filesystem. The original file remains unchanged, and the new file may have the same or a different name....
would not be possible because data from deleted files still physically present on a disk are not visible through the
file systemIn computing, a file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them...
interface.
It is jokingly said to stand for "data destroyer" or "delete data", since, being used for low-level operations on hard disks, a small mistake, such as reversing the
if and
of parameters, can possibly result in the loss of all or some data on a disk.
Usage
The command line
syntaxIn linguistics, syntax is the study of the principles and rules for constructing sentences in natural languages...
of
dd is significantly different from most other Unix programs, and because of its ubiquity it is resistant to recent attempts to enforce a common syntax for all command line tools. Generally,
dd uses an
option=value format, whereas most Unix programs use either
-option value or
--option=value format. Also,
dd's input is specified using the "if" (
input
file) option, while most programs simply take the name by itself. It is rumored to have been based on IBM's
JCLJob Control Language is a scripting language used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. The term "Job Control Language" can also be used generically to refer to all languages which perform these functions, such as Burroughs' WFL...
, and though the syntax may have been a joke, there seems never to have been any effort to write a more Unix-like replacement.
Example use of
dd command to create an
ISO disk imageAn ISO image is an archive file of an optical disc in a format defined by the International Organization for Standardization . This format is supported by many software vendors. ISO image files typically have a file extension of .iso...
from a CD-ROM:
dd if=/dev/cdrom of=/home/sam/myCD.iso bs=2048 conv=sync,notrunc
Note that an attempt to copy the entire disk image using cp may omit the final block if it is an unexpected length; dd will always complete the copy if possible.
Using
dd to wipe an entire disk with random data:
dd if=/dev/urandom of=/dev/hda
alternative:
for n in {1..7}; do dd if=/dev/urandom of=/dev/sda bs=8b conv=notrunc; done
Using dd to duplicate one hard disk partition to another hard disk:
dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=notrunc,noerror
Note that notrunc means do not truncate the output file. Noerror means to keep going if there is an error (though a better tool for this would be
ddrescue).
To duplicate a disk partition as a
disk imageA disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drive, floppy disk, CD, or DVD, although an image of an optical disc may be referred to as an optical disc image...
file on a different partition
dd if=/dev/sdb2 of=/home/sam/partition.image bs=4096 conv=notrunc,noerror
To duplicate a disk partition as a
disk imageA disk image is a single file or storage device containing the complete contents and structure representing a data storage medium or device, such as a hard drive, floppy disk, CD, or DVD, although an image of an optical disc may be referred to as an optical disc image...
file on a remote machine over a secure ssh connection:
dd if=/dev/sdb2 | ssh user@host "dd of=/home/user/partition.image"
Create a 1
GBThe gigabyte is an SI-multiple of the unit byte for digital information storage. The prefix giga means 109, therefore 1 gigabyte is ....
file containing only zeros (bs=blocksize, count=number of blocks):
dd if=/dev/zero of=file1G.tmp bs=1G count=1
To make sure that my drive is really zeroed out
dd if=/dev/sda | hexdump -C | head
The output of this command will resemble the following if the drive is blank:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
201f78000
16841664+0 records in
16841664+0 records out
8622931968 bytes (8.6 GB) copied, 1247.05 s, 6.9 MB/s
If the drive is blank, one line of blank bytes will be printed, followed by a '*' signifying repeated blank lines, followed by a line indicating the address of the line which ends the repetition, followed by the statistics which are printed after the output. The numbers in the statistics above are illustrative. If the drive is not entirely blank, there will be more than one line of data output.
To duplicate the first 2 sectors of the floppy.
dd if=/dev/fd0 of=/home/sam/MBRboot.image bs=512 count=2
To duplicate
master boot recordA master boot record , or partition sector, is the 512-byte boot sector that is the first sector of a partitioned data storage device such as a hard disk. A master boot record (MBR), or partition sector, is the 512-byte boot sector that is the first sector ("LBA Sector 0") of a partitioned data...
only
dd if=/dev/sda of=/home/sam/MBR.image bs=446 count=1
To make drive benchmark test and analyze read and write performance
dd if=/dev/zero bs=1024 count=1000000 of=/home/sam/1Gb.file
dd if=/home/sam/1Gb.file bs=64k | dd of=/dev/null
To make a file of 100 random bytes:
dd if=/dev/urandom of=/home/sam/myrandom bs=100 count=1
To convert a file to uppercase:
dd if=filename of=filename conv=ucase
To search the system memory:
dd if=/dev/mem | hexdump -C | grep 'some-string-of-words-in-the-file-you-forgot-to-save-before-you-hit-the-close-button'
Image a partition to another machine:
On source machine:
dd if=/dev/hda bs=16065b | netcat < targethost-IP > 1234
On target machine:
netcat -l -p 1234 | dd of=/dev/hdc bs=16065b
Everybody has mentioned the first obvious fix: raise your blocksize from the default 512 bytes. The second fix addresses the problem that with a single dd, you are either reading or writing.
If you pipe the first dd into a second one, it'll let you run at the max speed of the slowest device.
dd if=/dev/ad2 conv=noerror,sync bs=64k | dd of=/dev/ad3 bs=64k
Sending a USR1 signal to a running `dd' process makes it print I/O statistics to standard error and then resume copying.
$ dd if=/dev/zero of=/dev/null& pid=$!
$ kill -USR1 $pid
18335302+0 records in 18335302+0 records out 9387674624 bytes (9.4 GB) copied,
34.6279 seconds, 271 MB/s
Create a 1GB
sparse fileIn computer science, a sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty. This is achieved by writing brief information representing the empty blocks to disk instead of the actual "empty" space which...
or resize an existing file to 1GB without overwriting:
dd if=/dev/zero of=mytestfile.out bs=1 count=0 seek=1G
Some implementations understand x as a multiplication operator in the block size and count parameters:
dd bs=2x80x18b if=/dev/fd0 of=/tmp/floppy.image
where the "b" suffix indicates that the units are 512-byte blocks. Unix block devices use this as their allocation unit by default.
For the value of bs field, following decimal number can be suffixed:
w means 2
b means 512
k means 1024
M specifies multiplication by 1024*1024
G specifies multiplication by 1024*1024*1024
Hence bs=2*80*18b means, 2*80*18*512=1474560 which is the exact size of 1440
KiBA kibibyte is a unit of information or computer storage, established by the International Electrotechnical Commission in 2000. Its symbol is KiB...
floppy disk
To mount that image
mount -o loop floppy.image /mntpoint
Speed
Small block sizes (bs=) take much longer due to the fixed overhead of transfer requests. Above 32k/64k/128k (depending on machine) block sizes there is nothing to be gained since the blocks have to be split up. The "sweet spot" is at 64k, which is much larger than the default bs=512. In other words, always use bs=64k for large files and where you don't need to count blocks. Don't worry, the fractional part less than 64k is always copied.
Output messages
The GNU variant of dd as supplied with Linux does not describe the format of the messages displayed on stdout on completion, however these are described by other implementations e.g. that with BSD.
Each of the "Records in" and "Records out" lines shows the number of complete blocks transferred + the number of partial blocks, e.g. because the physical medium ended before a complete block was read.
ATA Disks over 128 GB
Seagate documentation warns, "Certain disc utilities, such as DD, which depend on low-level disc access may not support 48-bit
LBALogical block addressing is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disks. The term LBA can mean either the address or the block to which it refers. Logical blocks in modern computer...
s until they are updated." 48-bit LBA is required for ATA harddrives over 128 GB in size.
Recovery-oriented variants of dd
Open SourceOpen source is an approach to the design, development, and distribution of software, offering practical accessibility to a software's source code. Some consider open source as one of various possible design approaches, while others consider it a critical strategic element of their operations...
unix-based programs for rescue include
dd_rescue and
dd_rhelp, which work together,
savehd7, or
GNU ddrescue.
Antonio Diaz Diaz (the developer of GNU ddrescue) compares the variants of dd for the task of rescuing:
The standard utility dd does a linear read of the drive, so it can take a long time or even fry the drive without rescueing anything if the errors are at the beginning of the drive.
Kurt Garloff's dd_rescue does basically the same thing as dd, only more efficiently.
LAB Valentin's dd_rhelp is a complex shell script that runs Garloff's dd_rescue many times, trying to be strategic about copying the drive, but it is very inefficient.
- dd_rhelp first extracts all the readable data, and saves it to a file, inserting zeros where bytes cannot be read. Then it tries to re-read the invalid data and update this file.
- GNU ddrescue can be used to copy data directly to a new disk if needed, just like Linux
Linux is a generic term referring to Unix-like computer operating systems based on the Linux kernel. Their development is one of the most prominent examples of free and open source software collaboration; typically all the underlying source code can be used, freely modified, and redistributed,...
dd.
dd_rhelp or GNU ddrescue will yield a complete disk image, faster but possibly with some errors. GNU ddrescue is generally much faster, as it is written entirely in
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level and low-level language features...
, whereas dd_rhelp is a
shell scriptA shell script is a script written for the shell, or command line interpreter, of an operating system. It is often considered a simple domain-specific programming language...
wrapperThe term wrapper generally refers to a type of packaging, such as a flat sheet made out of paper, cellophane, or plastic to enclose an object.In computing, it may also refer to:...
around dd_rescue. Both dd_rhelp and GNU ddrescue aim to copy data fast where there are no errors, then copy in smaller blocks and with retries where there are errors. GNU ddrescue is easy to use with default options, and can easily be downloaded and compiled on Linux-based
Live CDA live CD or live DVD is a CD or DVD containing a bootable computer operating system. Live CDs are unique in that they have the ability to run a complete, modern operating system on a computer lacking mutable secondary storage, such as a hard disk drive...
s such as
KnoppixKnoppix, or KNOPPIX , is an operating system based on Debian designed to be run directly from a CD / DVD, one of the first of its kind for any operating system. Knoppix was developed by Linux consultant Klaus Knopper. When starting a program it is loaded from the optical disc and decompressed into...
, and can be used with
SystemRescueCDSystemRescueCd is an operating system by definition, though the primary purpose of SystemRescueCD is to repair unbootable or otherwise damaged computer systems after a system crash. SystemRescueCD is not intended to be used as a permanent operating system. It runs from a Live CD or a USB flash drive...
.
GNU ddrescue example
- first, grab most of the error-free areas in a hurry:
ddrescue -n /dev/old_disk /dev/new_disk rescued.log
- then try to recover as much of the dicey areas as possible:
ddrescue -r 1 /dev/old_disk /dev/new_disk rescued.log
There is a big difference in how disk errors are processed by kernels. FreeBSD, NetBSD, OpenBSD, Solaris, and different Linux kernels (i.e. hda vs. sda (<2.6.20)) behave differently. Also, Linux lacks "raw" disk devices like *BSD has, which makes it less desirable for low-level data recovery. Non-raw devices read larger blocks than requested, obscuring the actual location where the error occurred. You may wish to use "dmesg -n8" to see the error messages on the console.
External links