Gzip
Encyclopedia
Gzip is any of several software applications used for file compression and decompression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

. The term usually refers to the GNU Project
GNU Project
The GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...

's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

 algorithm, which is a combination of Lempel-Ziv (LZ77) and Huffman coding
Huffman coding
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...

. The program was created by Jean-Loup Gailly
Jean-Loup Gailly
Jean-Loup Gailly - is an author of gzip. He wrote the compression code of the portable archiver of the Info-ZIP and the tools compatible with the PKZIP archiver for MS-DOS...

 and Mark Adler
Mark Adler
Dr. Mark Adler may be best known for his work in the field of data compression. Adler is the author of the Adler-32 hash function, a co-author of the zlib compression library and gzip, has contributed to Info-ZIP, and has participated in developing the Portable Network Graphics image format...

 as a free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 replacement for the compress
Compress
Compress is a UNIX compression program based on the LZC compression method, which is an LZW implementation using variable size pointers as in LZ78.- Description of program :Files compressed by compress are typically given the extension .Z...

program used in early Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 systems, and intended for use by the Project. Version 0.1 was first publicly released on October 30, 1992, and version 1.0 followed in February 1993.

OpenBSD
OpenBSD
OpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley. It was forked from NetBSD by project leader Theo de Raadt in late 1995...

's version of gzip is actually the compress program, to which support for the gzip format was added in OpenBSD 3.4. The "g" in this specific version stands for gratis.

FreeBSD
FreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...

, DragonFlyBSD, and NetBSD
NetBSD
NetBSD is a freely available open source version of the Berkeley Software Distribution Unix operating system. It was the second open source BSD descendant to be formally released, after 386BSD, and continues to be actively developed. The NetBSD project is primarily focused on high quality design,...

 use a BSD-licensed implementation instead of the GNU version; it is actually a command-line interface
Command-line interface
A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...

 for zlib
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

 intended to be compatible with the GNU implementation's options. These implementations originally come from NetBSD
NetBSD
NetBSD is a freely available open source version of the Berkeley Software Distribution Unix operating system. It was the second open source BSD descendant to be formally released, after 386BSD, and continues to be actively developed. The NetBSD project is primarily focused on high quality design,...

, and supports decompression of bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

 and Unix pack(1) format.

Other uses

The “Content-Encoding”/"Accept-Encoding" and "Transfer-Encoding"/"TE" headers in HTTP/1.1 allow clients to optionally receive compressed HTTP responses and (less commonly) to send compressed requests. The specification for HTTP/1.1 (RFC 2616) specifies three compression methods: “gzip” (RFC 1952; the content wrapped in a gzip stream), “deflate” (RFC 1950; the content wrapped in a zlib-formatted stream), and "compress" (explained in RFC 2616 section 3.5 as 'The encoding format produced by the common UNIX file compression program "compress". This format is an adaptive Lempel-Ziv-Welch coding (LZW).'). Many client libraries, browsers, and server platforms (including Apache and Microsoft IIS) support gzip. Many agents also support deflate, although several important players incorrectly implement deflate support using the format specified by RFC 1951 instead of the correct format specified by RFC 1950 (which encapsulates RFC 1951). Notably, Internet Explorer versions 6, 7, and 8 report deflate support but do not actually accept RFC 1950 format, making actual use of deflate highly unusual. Many clients accept both RFC 1951 and RFC 1950-formatted data for the "deflate" compressed method, but a server has no way to detect whether a client will correctly handle RFC 1950 format.

Since the late 1990s, bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

, a file compression utility based on a block-sorting algorithm, has gained some popularity as a gzip replacement. It produces considerably smaller files (especially for source code and other structured text), but at the cost of memory and processing time (up to a factor of 4). bzip2-compressed archive files are conventionally named either .tar.bz2 or simply .tbz.

AdvanceCOMP
AdvanceCOMP
AdvanceCOMP is a set of cross-platform command line data compression tools. The utilities allow modifying an already-compressed file, with the intent of reducing the file-size by optimising the compressed representation...

 and 7-Zip
7-Zip
7-Zip is an open source file archiver. 7-Zip operates with the 7z archive format, but can read and write several other archive formats. The program can be used from a command line interface, graphical user interface, or with Microsoft Windows shell integration. 7-Zip began in 1999 and is actively...

 can produce gzip-compatible files, using an internal DEFLATE implementation with better compression ratios than gzip itself—at the cost of more processor time compared to the reference implementation.

File format

Gzip is based on the DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

 algorithm, which is a combination of LZ77
LZ77 and LZ78
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for most of the LZ variations including LZW, LZSS, LZMA and...

 and Huffman coding
Huffman coding
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...

. DEFLATE was intended as a replacement for LZW
LZW
Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978...

 and other patent
Patent
A patent is a form of intellectual property. It consists of a set of exclusive rights granted by a sovereign state to an inventor or their assignee for a limited period of time in exchange for the public disclosure of an invention....

-encumbered data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s, which, at the time, limited the usability of compress and other popular archivers.

"Gzip" is often also used to refer to the gzip file format, which is:
  • a 10-byte header, containing a magic number
    Magic number (programming)
    In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:* A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures...

    , a version number and a time stamp
  • optional extra headers, such as the original file name,
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data


Although its file format also allows for multiple such streams to be concatenated
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

 (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files. Compressed archives are typically created by assembling collections of files into a single tar
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

 archive, and then compressing that archive with gzip. The final .tar.gz or .tgz file is usually called a "tarball
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

".

Gzip is not to be confused with the ZIP
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

 archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

 holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression
Solid compression
In computing, solid compression refers to a method for data compression of multiple files, wherein all the compressed files are concatenated and treated as a single data block. Such an archive is called a solid archive. It is used natively in the 7z and RAR formats, as well as indirectly in...

).

Zlib
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

 is an abstraction of the DEFLATE algorithm in library form which includes support both for the gzip file format and a lightweight stream format in its API. The zlib stream format, DEFLATE, and the gzip file format were standardized respectively as RFC 1950, RFC 1951, and RFC 1952.

The corresponding program for uncompressing gzipped files is gunzip. Both commands call the same binary; gunzip has the same effect as gzip -d.

gunzip and zcat

The gzip utility on UNIX systems has some alternative names.

When gzip is invoked as gunzip, it decompresses the data (a file or stdin). gunzip is equivalent to gzip -d.

When gzip is invoked as zcat, it also decompresses the data, but behaves similarly to cat
Cat (Unix)
The cat command is a standard Unix program used to concatenate and display files. The name is from catenate, a synonym of concatenate.- Specification :...

. It decompresses individual files and concatenates them to standard output.

zcat is equivalent to gzip -d -c.

Examples

  • To compress a file with gzip, pass the filename as an argument:


gzip file.txt

The command will then replace the original file with a new, usually smaller file called file.txt.gz. To keep the original file file.txt, it is necessary to use the -c option and redirect the output to a new file.
  • To uncompress, use gunzip:


gunzip file.txt.gz
  • Generally, multiple files can be compressed by combining tar
    Tar (file format)
    In computing, tar is both a file format and the name of a program used to handle such files...

     with gzip:


tar czf files.tar.gz *.txt

See also

  • List of archive formats
  • List of file archivers
  • Comparison of file archivers
    Comparison of file archivers
    The following tables compare general and technical information for a number of file archivers. Please see the individual products' articles for further information. They are neither all-inclusive nor are some entries necessarily up to date...

  • List of Unix programs
  • Free file format
  • Info-ZIP
    Info-ZIP
    Info-ZIP is a set of open-source software to handle ZIP archives. It has been in circulation since 1989. It consists of 4 separately-installable packages: the Zip and UnZip command-line utilities; and WiZ and MacZip, which are graphical user interfaces for archiving programs in Microsoft Windows...

    's funzip can gunzip gzip-ped data

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK