ZIP (file format)
Encyclopedia
Zip is a file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...

 used for data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 and archiving
File archiver
A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage...

. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is. The zip file format permits a number of compression algorithms.

The format was originally created in 1989 by Phil Katz
Phil Katz
Phillip Walter Katz was a computer programmer best known as the co-creator of the zip file format for data compression, and the author of PKZIP, a program for creating zip files which ran under DOS.- Career :...

, and was first implemented in PKWARE's PKZIP
PKZIP
PKZIP is an archiving tool originally written by Phil Katz and marketed by his company PKWARE, Inc. The common "PK" prefix used in both PKZIP and PKWARE stands for "Phil Katz".-History:...

 utility, as a replacement for the previous ARC
ARC (file format)
ARC is a lossless data compression and archival format by System Enhancement Associates . It was very popular during the early days of networked dial-up BBS. The file format and the program were both called ARC...

 compression format by Thom Henderson.

The zip format is now supported by many software utilities other than PKZIP. Microsoft has included built-in zip support (under the name "compressed folders") in versions of Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 since 1998. Apple has included built-in zip support in Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 10.3 (via BOMArchiveHelper, now Archive Utility
Archive utility
* For archive utility applications in general see file archiver.* For the MAC OS Archive Utility service application see Archive Utility....

) and later, along with other compression formats.

Zip files generally use the file extensions ".zip" or ".ZIP" and the MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

 media type application/zip. Zip is used as a base file format by many programs, usually under a different name.

History

The zip file format was created by Phil Katz of PKWARE
PKWARE, Inc
PKWARE, Inc. was founded in 1986 by Phil Katz, co-inventor of the ZIP standard. PKWARE provides data-centric security solutions across all major computing platforms, and is known for its data compression and file management solutions...

. He created the format after his company had lawsuits filed against him by Systems Enhancement Associates (SEA) claiming that his archiving products were derivatives of SEA's ARC
ARC (file format)
ARC is a lossless data compression and archival format by System Enhancement Associates . It was very popular during the early days of networked dial-up BBS. The file format and the program were both called ARC...

 archiving system.

The name "zip" (meaning "speed") was suggested by Katz's friend, Robert Mahoney. They wanted to imply that their product would be faster than ARC
ARC (file format)
ARC is a lossless data compression and archival format by System Enhancement Associates . It was very popular during the early days of networked dial-up BBS. The file format and the program were both called ARC...

 and other compression formats of the time.

The earliest known version of .ZIP File Format Specification was first published as part of PKZIP
PKZIP
PKZIP is an archiving tool originally written by Phil Katz and marketed by his company PKWARE, Inc. The common "PK" prefix used in both PKZIP and PKWARE stands for "Phil Katz".-History:...

 0.9 package under the file APPNOTE.TXT.

The zip file format was released into the public domain
Public domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...

.

Version history

The .ZIP File Format Specification has its own version number, which does not necessarily correspond to the version numbers for the PKZIP tool, especially with PKZIP 6 or later. At various times, PKWARE has added preliminary features that allow PKZIP products to extract archives using advanced features, but PKZIP products that create such archives are not made available until the next major release. Other companies or organizations support the PKWARE specifications at their own pace.

A summary of key advances in various versions of the PKWARE specification:
  • 2.0: File entries can be compressed with DEFLATE.
  • 4.5: Documented 64-bit zip format.
  • 5.0: DES, Triple DES, RC2, RC4 supported for encryption
  • 5.2: RC2-64 supported for Encryption.
  • 6.1: Documented certificate storage.
  • 6.2.0: Documented Central Directory Encryption.
  • 6.3.0: Documented Unicode (UTF-8) filename storage. Expanded list of supported hash, compression, encryption algorithms.
  • 6.3.1: Corrected standard hash values for SHA-256/384/512.
  • 6.3.2: Documented compression method 97 (WavPack
    WavPack
    WavPack is a free, open source lossless audio compression format developed by David Bryant.-Features:WavPack compression can compress 8-, 16-, 24-, and 32-bit fixed-point, and 32-bit floating point audio files in the .WAV file format. It also supports surround sound streams and high frequency...

    ).


WinZip
WinZip
WinZip is a proprietary file archiver and compressor for Microsoft Windows and Mac OS X, developed by WinZip Computing...

, starting with version 12.1, uses the extension .zipx for zip files that use compression methods newer than DEFLATE; specifically, methods BZip, LZMA, PPMd, Jpeg and Wavpack. The last 2 are applied to appropriate file types when "Best method" compression is selected.

Standardization

In April 2010 ISO/IEC JTC 1 initiated a ballot to determine whether a project should be initiated to create an ISO/IEC International Standard format compatible with zip. The proposed project, entitled Document Packaging envisages a zip-compatible 'minimal compressed archive format' suitable for use with a number of existing standards including OpenDocument
OpenDocument
The Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....

, Office Open XML and EPUB
EPUB
EPUB is a free and open e-book standard by the International Digital Publishing Forum...

.

Design

Zip is a simple archive format that stores multiple files. Zip allows contained files to be compressed using many different methods, as well as simply storing a file without compressing it. Each file is stored separately, allowing different files in the same archive to be compressed using different methods.

A directory is placed at the end of a zip file. This identifies what files are in the zip and identifies where in the zip that file is located. This allows zip readers to load the list of files without reading the entire zip archive. Zip archives can also include extra data that is not related to the zip archive. This allows for zip archives to be made into self-extracting archives, applications that decompress their contained data, by including the program code in a zip archive and marking the file as executable (i.e., with the .exe extension). On the other hand, it also allows for an innocuous file, such as a GIF image file, to hide malicious code by making the file a zip archive.

The zip format uses a 32-bit CRC algorithm and includes two copies of the directory structure of the archive to provide greater protection against data loss.

Structure

A zip file is identified by the presence of a central directory which is located at the end of the structure in order to allow the appending of new files. The central directory stores a list of the names of the entries (files or directories) stored in the zip file, along with other metadata about the entry, and an offset into the zip file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files. The entries in the zip file also include this information for redundancy.

The order of the file entries in the directory need not coincide with the order of file entries in the archive.

Each entry is introduced by a local header with information about the file such as the comment, file size and file name, followed by optional "Extra" data fields, and then the possibly compressed, possibly encrypted file data. The "Extra" data fields are the key to the extensibility of the zip format. "Extra" fields are exploited to support the ZIP64 format, WinZip-compatible AES encryption, file attributes, and higher-resolution NTFS or Unix file timestamps. Other extensions are possible via the "Extra" field. Zip tools are required by the specification to ignore Extra fields they do not recognize.

The zip format uses specific 4-byte "signatures" to denote the various structures in the file. Each file entry is marked by a specific signature. The beginning of the central directory is indicated with a different signature, and each entry in the central directory is marked with yet another particular 4-byte signature.

There is no BOF or EOF marker in the zip specification. Often the first thing in a zip file is a zip entry, which can be identified easily by its signature. But it is not necessarily the case that a zip file begins with a zip entry, and is not required by the zip specification.

Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format allows for other data to be between chunks.

The zip specification also supports spreading archives across multiple filesystem files. Originally intended for storage of large zip files across multiple 1.44 MB floppy disk
Floppy disk
A floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...

s, this feature is now used for sending zip archives in parts over email, or over other transports or removable media.

The FAT filesystem
File Allocation Table
File Allocation Table is a computer file system architecture now widely used on many computer systems and most memory cards, such as those used with digital cameras. FAT file systems are commonly found on floppy disks, flash memory cards, digital cameras, and many other portable devices because of...

 of DOS has a timestamp resolution of only two seconds; zip file records mimic this. As a result, the built-in timestamp resolution of files in a zip archive is only two seconds, though extra fields can be used to store more accurate timestamps.

In September 2007, PKZIP released a revision of the zip specification that contains a provision to store file names using UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

, finally adding Unicode compatibility to zip.

File headers

All multi-byte values in the header are stored in little-endian byte order. All length fields count the length in bytes.
Local file header
Offset Bytes Description
 0 4 Local file header signature = 0x04034b50 (read as a little-endian number)
 4 2 Version needed to extract (minimum)
 6 2 General purpose bit flag
 8 2 Compression method
10 2 File last modification time
12 2 File last modification date
14 4 CRC-32
18 4 Compressed size
22 4 Uncompressed size
26 2 File name length (n)
28 2 Extra field length (m)
30 n File name
30+n m Extra field


The extra field contains a variety of optional data such as OS-specific attributes. It is divided into chunks, each with a 16-bit ID code and a 16-bit length.

This is immediately followed by the compressed data.

If bit 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written. The fields in the local header are filled with zero, and the CRC-32 and size are appended in a 12-byte structure immediately after the compressed data:
Data descriptor
Offset Bytes Description
 0 4 Local file header signature = 0x08074b50
 4 4 CRC-32
 8 4 Compressed size
 12 4 Uncompressed size


The central directory entry is an expanded form of the local header:
Central directory file header
Offset Bytes Description
 0 4 Central directory file header signature = 0x02014b50
 4 2 Version made by
 6 2 Version needed to extract (minimum)
 8 2 General purpose bit flag
10 2 Compression method
12 2 File last modification time
14 2 File last modification date
16 4 CRC-32
20 4 Compressed size
24 4 Uncompressed size
28 2 File name length (n)
30 2 Extra field length (m)
32 2 File comment length (k)
34 2 Disk number where file starts
36 2 Internal file attributes
38 4 External file attributes
42 4 Relative offset of local file header. This is the number of bytes between the start of the first disk on which the file occurs, and the start of the local file header. This allows software reading the central directory to locate the position of the file inside the ZIP file.
46 n File name
46+n m Extra field
46+n+m k File comment


After all the central directory entries comes the end of central directory record, which marks the end of the ZIP file:
End of central directory record
Offset Bytes Description
 0 4 End of central directory signature = 0x06054b50
 4 2 Number of this disk
 6 2 Disk where central directory starts
 8 2 Number of central directory records on this disk
10 2 Total number of central directory records
12 4 Size of central directory (bytes)
16 4 Offset of start of central directory, relative to start of archive
20 2 Comment length (n)
22 n Comment


This ordering allows a zip file to be created in one pass, but it is usually decompressed by first reading the central directory at the end.

Compression methods

The .ZIP File Format Specification documents the following compression methods: stored (no compression), Shrunk, Reduced (methods 1-4), Imploded, Tokenizing, Deflated, Deflate64, bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

, LZMA
LZMA
The Lempel–Ziv–Markov chain algorithm is an algorithm used to perform data compression. It has been under development since 1998 and was first used in the 7z format of the 7-Zip archiver...

 (EFS), WavPack
WavPack
WavPack is a free, open source lossless audio compression format developed by David Bryant.-Features:WavPack compression can compress 8-, 16-, 24-, and 32-bit fixed-point, and 32-bit floating point audio files in the .WAV file format. It also supports surround sound streams and high frequency...

, PPMd. The most commonly used compression method is DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

, which is described in IETF RFC 1951.

Compression methods mentioned, but not documented in detail in the specification include: PKWARE Data Compression Library (DCL) Imploding (old IBM TERSE), IBM TERSE (new), IBM LZ77 z Architecture (PFS).

Encryption

Zip supports a simple password
Password
A password is a secret word or string of characters that is used for authentication, to prove identity or gain access to a resource . The password should be kept secret from those not allowed access....

-based symmetric encryption
Symmetric-key algorithm
Symmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...

 system which is documented in the zip specification, and known to be seriously flawed. In particular it is vulnerable to known-plaintext attack
Known-plaintext attack
The known-plaintext attack is an attack model for cryptanalysis where the attacker has samples of both the plaintext , and its encrypted version . These can be used to reveal further secret information such as secret keys and code books...

s which are in some cases made worse by poor implementations of random number generators.

New features including new compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 and encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

 (e.g. AES
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

) methods have been documented in the .ZIP File Format Specification since version 5.2. A WinZip-developed AES-based standard is used also by 7-Zip
7-Zip
7-Zip is an open source file archiver. 7-Zip operates with the 7z archive format, but can read and write several other archive formats. The program can be used from a command line interface, graphical user interface, or with Microsoft Windows shell integration. 7-Zip began in 1999 and is actively...

, XCeed, and DotNetZip, but some vendors use other formats. PKWARE SecureZIP also supports RC2, RC4, DES, Triple DES encryption methods, Digital Certificate-based encryption and authentication (X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...

), and archive header encryption.

ZIP64

The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations increasing the limitation to 16 EiB
EIB
EIB may stand for:*Epson Ivy Bowl, an American college football bowl game versus an all-star team from Japan*European Investment Bank, the European Union's financing institution*the Export-Import Bank of the United States*Even in Blackouts, an American band...

 (264 bytes).

The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip does support ZIP64 from version Java 7.

Combination with other file formats

The zip file format allows for a comment containing any data to occur at the end of the file after the central directory. Also, because the central directory specifies the offset of each file in the archive with respect to the start, it is possible in practice for the first file entry to start at an offset other than zero.

This allows arbitrary data to occur in the file both before and after the zip archive data, and for the archive to still be read by a zip application. A side-effect of this is that it is possible to author a file that is both a working zip archive and another format, provided that the other format tolerates arbitrary data at its end, beginning, or middle. Self-extracting archives (SFX), of the form supported by WinZip and DotNetZip, take advantage of this—they are .exe files that conform to the PKZIP AppNote.txt specification and can be read by compliant zip tools or libraries.

This property of the zip format, and of the JAR format which is a variant of zip, can be exploited to hide harmful Java classes inside a seemingly harmless file, such as a GIF image uploaded to the web. This so-called GIFAR exploit has been demonstrated as an effective attack against web applications such as Facebook.

Limits

The minimum size of a zip file is 22 bytes.

The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB) for ZIP64.

Proprietary extensions

When WinZip
WinZip
WinZip is a proprietary file archiver and compressor for Microsoft Windows and Mac OS X, developed by WinZip Computing...

 9.0 public beta was released in 2003, WinZip introduced its own AES-256 encryption, using a different file format, along with the documentation for the new specification. The encryption standards themselves were not proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

, but PKWARE had not updated APPNOTE.TXT to include Strong Encryption Specification (SES) since 2001, which had been used by PKZIP versions 5.0 and 6.0. WinZip technical consultant Kevin Kearney and StuffIt
StuffIt
StuffIt is a family of computer software utilities for archiving and compressing files on the Macintosh and Microsoft Windows platforms: it was originally produced for the Macintosh. An old version for Linux and Sun Solaris 2.7 or later is also available...

 product manager Mathew Covington accused PKWARE of withholding SES, but PKZIP chief technology officer Jim Peterson claimed that Certificate-based encryption was still incomplete.

To overcome this shortcoming, contemporary products such as PentaZip implemented strong zip encryption by encrypting zip archives into a different file format.

In another controversial move, PKWare applied for a patent on 2003-07-16 describing a method for combining zip and strong encryption to create a secure file.

In the end, PKWARE and WinZip agreed to support each other's products. On 2004-01-21, PKWARE announced the support of WinZip-based AES compression format. In a later version of WinZip beta, it was able to support SES-based zip files. PKWARE eventually released version 5.2 of the .ZIP File Format Specification to the public, which documented SES. The Free Software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 project 7-Zip
7-Zip
7-Zip is an open source file archiver. 7-Zip operates with the 7z archive format, but can read and write several other archive formats. The program can be used from a command line interface, graphical user interface, or with Microsoft Windows shell integration. 7-Zip began in 1999 and is actively...

 also supports AES in zip files (as does its POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 port
Porting
In computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...

 p7zip
P7zip
p7zip is a port of the command line version of the 7-Zip file archiver to POSIX-conforming operating systems, such as Unix, Linux, FreeBSD, Windows NT and Mac OS X. It is free software, available under the GNU Lesser General Public License....

).

Advantages and disadvantages

Compressing files separately, as is done in zip files, allows for random access
Random access
In computer science, random access is the ability to access an element at an arbitrary position in a sequence in equal time, independent of sequence size. The position is arbitrary in the sense that it is unpredictable, thus the use of the term "random" in "random access"...

: individual files can be retrieved without reading through other data. It may allow better overall compression by using different algorithms for different files. Even when confining the possibility to DEFLATE compression, the use of different compression dictionaries for each file may result in a smaller archive overall.

This approach is less well-suited, in general, to archival of a large number of small files. In the zip archive format, the metadata for each entry—the information about each individual entry—is not compressed. This limits the maximum achievable compression ratio, especially as the size of the individual entries diminishes and approaches the size of the metadata for the entry.

An alternate approach is used in a compressed tar archive (.tar.gz, or .tgz)
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

, in which the file data and metadata are compressed as a unit using gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

. The downside of this approach is the loss of random access. The same approach can be used with zip: creating first a zip archive in which the individual files are uncompressed (STORE method), and then compressing the first zip file into another zip file which contains the first, will emulate solid archives. As in the case of compressed tar archives, random access is not possible.

Implementation

There are numerous zip tools available, and numerous zip libraries for various programming environments; licenses used include commercial and open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

. For instance, WinZip is one well-known zip tool running on Windows and WinRAR
WinRAR
WinRAR is a shareware file archiver and data compression utility developed by Eugene Roshal, and first released in autumn of 1993. It is one of the few applications that is able to create RAR archives natively, because the encoding method is held to be proprietary.-Developer:The current developer...

, IZarc
IZArc
IZArc is a proprietary file archiver for Microsoft Windows developed by Bulgarian programmer Ivan Zahariev. The program is freeware and closed source. In addition to the most commonly used archive formats, like zip, rar, gzip, tar.gz, bzip2, and 7z, IZArc handles a large number of less common formats...

, Info-zip, 7-Zip
7-Zip
7-Zip is an open source file archiver. 7-Zip operates with the 7z archive format, but can read and write several other archive formats. The program can be used from a command line interface, graphical user interface, or with Microsoft Windows shell integration. 7-Zip began in 1999 and is actively...

, PeaZip
PeaZip
PeaZip is a file manager and file archiver for Microsoft Windows and GNU/Linux. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats...

 and DotNetZip are other tools, available on various platforms. Some of those tools have library or programmatic interfaces.

Some development libraries licensed under open source agreement are the GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

 gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

 project and Info-ZIP
Info-ZIP
Info-ZIP is a set of open-source software to handle ZIP archives. It has been in circulation since 1989. It consists of 4 separately-installable packages: the Zip and UnZip command-line utilities; and WiZ and MacZip, which are graphical user interfaces for archiving programs in Microsoft Windows...

. For Java: Java Platform, Standard Edition
Java Platform, Standard Edition
Java Platform, Standard Edition or Java SE is a widely used platform for programming in the Java language. It is the Java Platform used to deploy portable applications for general use...

 contains the package "java.util.zip" to handle standard zip files; the Zip64File library specifically supports large files (larger than 4 GB) and treats zip files using random access; and the Apache Ant
Apache Ant
Apache Ant is a software tool for automating software build processes. It is similar to Make but is implemented using the Java language, requires the Java platform, and is best suited to building Java projects....

 tool contains a more complete implementation released under the Apache Software License.

For .NET applications, there is a no-cost open-source library called DotNetZip available in source and binary form under the Microsoft Public License. It supports many zip features, including passwords for traditional zip encryption or WinZip-compatible AES encryption, Unicode, ZIP64, zip comments, spanned archives, and self-extracting archives. The Microsoft .NET 3.5
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 runtime library includes a class System.IO.Packaging.Package that supports the zip format. It is primarily designed for document formats using the ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 international standard Open Packaging Conventions.

The Info-ZIP
Info-ZIP
Info-ZIP is a set of open-source software to handle ZIP archives. It has been in circulation since 1989. It consists of 4 separately-installable packages: the Zip and UnZip command-line utilities; and WiZ and MacZip, which are graphical user interfaces for archiving programs in Microsoft Windows...

 implementations of the zip format adds support for Unix filesystem features, such as user and group IDs, file permissions, and support for symbolic links. The Apache Ant
Apache Ant
Apache Ant is a software tool for automating software build processes. It is similar to Make but is implemented using the Java language, requires the Java platform, and is best suited to building Java projects....

 implementation is aware of these to the extent that it can create files with predefined Unix permissions. The Info-ZIP implementations also know how to use the error correction capabilities built into the zip compression format. Some programs (such as IZArc
IZArc
IZArc is a proprietary file archiver for Microsoft Windows developed by Bulgarian programmer Ivan Zahariev. The program is freeware and closed source. In addition to the most commonly used archive formats, like zip, rar, gzip, tar.gz, bzip2, and 7z, IZArc handles a large number of less common formats...

) do not and will choke on a file that has errors.

The Info-ZIP Windows tools also support NTFS
NTFS
NTFS is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7....

 filesystem permissions, and will make an attempt to translate from NTFS permissions to Unix permissions or vice-versa when extracting files. This can result in potentially unintended combinations, e.g. .exe files being created on NTFS volumes with executable permission denied.

Versions of Microsoft Windows have included support for zip compression in Explorer since the Plus! pack was released for Windows 98. Microsoft calls this feature "Compressed Folders". Not all zip features are supported by the Windows Compressed Folders capability. For example, AES Encryption, split or spanned archives, and Unicode entry encoding are not known to be readable or writable by the Compressed Folders feature in Windows XP or Windows Vista.

Legacy

There are numerous other standards and formats using "zip" as part of their name. Phil Katz stated that he wanted to allow the "zip" name for any archive type. For example, zip is distinct from gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

, and the latter is defined in an IETF RFC
Request for Comments
In computer network engineering, a Request for Comments is a memorandum published by the Internet Engineering Task Force describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems.Through the Internet Society, engineers and...

 (RFC 1952). Both zip and gzip primarily use the DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

 algorithm for compression. Likewise, the ZLIB
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

 format (IETF RFC 1950) also uses the DEFLATE compression algorithm, but specifies different headers for error and consistency checking. Other common, similarly named formats and programs with different native formats include 7-Zip
7-Zip
7-Zip is an open source file archiver. 7-Zip operates with the 7z archive format, but can read and write several other archive formats. The program can be used from a command line interface, graphical user interface, or with Microsoft Windows shell integration. 7-Zip began in 1999 and is actively...

, bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

, and rzip
Rzip
The rzip program is huge-scale data compression software designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding on 900 kB output chunks....

.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK