All Topics  
Data compression

 

   Email Print
   Bookmark   Link






 

Data compression



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 and information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
, data compression or source coding is the process of encoding information using fewer bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s (or other information-bearing units) than an unencoded
Code

In communications, a code is a Operator for converting a piece of information into another form or representation , not necessarily of the same type....
 representation would use through use of specific encoding
Encoding

Encoding is the process of transforming information from one format into another. The opposite operation is called decoding.There are a number of more specific meanings that apply in certain contexts:...
 schemes.

As with any communication, compressed data communication only works when both the sender
Sender

A sender was a circuit in a 20th century electromechanical telephone exchange which sent telephone numbers and other information to another exchange....
 and receiver of the information
Information

Information as a Conveyed concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control system, data, form, instruction, knowledge, Meaning , stimulation, pattern, perception, and knowledge representation....
 understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language.






Discussion
Ask a question about 'Data compression'
Start a new discussion about 'Data compression'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 and information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
, data compression or source coding is the process of encoding information using fewer bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s (or other information-bearing units) than an unencoded
Code

In communications, a code is a Operator for converting a piece of information into another form or representation , not necessarily of the same type....
 representation would use through use of specific encoding
Encoding

Encoding is the process of transforming information from one format into another. The opposite operation is called decoding.There are a number of more specific meanings that apply in certain contexts:...
 schemes.

As with any communication, compressed data communication only works when both the sender
Sender

A sender was a circuit in a 20th century electromechanical telephone exchange which sent telephone numbers and other information to another exchange....
 and receiver of the information
Information

Information as a Conveyed concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control system, data, form, instruction, knowledge, Meaning , stimulation, pattern, perception, and knowledge representation....
 understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver.

Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
 space or transmission bandwidth
Bandwidth (computing)

In computer networking and computer science, digital bandwidth, network bandwidth or just bandwidth is a measure of available or consumed data communication resources expressed in bit/s or multiples of it ....
. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme
Lossy data compression

A lossy compression method is one where data compression and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way....
), and the computational resources required to compress and uncompress the data.

Lossless versus lossy compression

Lossless
Lossless data compression

Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data....
 compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small.

Another kind of compression, called lossy data compression
Lossy data compression

A lossy compression method is one where data compression and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way....
 or perceptual coding, is possible if some loss of fidelity
Fidelity

Fidelity is a notion that at its most abstract level implies a truthful connection to a source or sources. Its original meaning dealt with loyalty and attentiveness to one's duty to a lord or a monarch, in a broader sense than the related concept of fealty....
 is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance
Luminance

Luminance is a Photometry measure of the luminous intensity per unit area of light travelling in a given direction. It describes the amount of light that passes through or is emitted from a particular area, and falls within a given solid angle....
 than it is to variations in color. JPEG
JPEG

In computing, JPEG is a commonly used method of for photographic images. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality....
 image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.

Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.

However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress all but the most trivially encrypted
Encryption

In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key ....
 data.

In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, like for example always removing the last byte of a file, will always compress a file up to the point where it is empty.

An example of lossless vs. lossy compression is the following string:
25.888888888
This string can be compressed as:
25.[9]8


Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using
26


instead, the original data is lost, at the benefit of a smaller file size.

Applications

The above is a very simple example of run-length encoding
Run-length encoding

Run-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run....
, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to optimize disk space on office computers, or better use the connection bandwidth
Bandwidth compression

In telecommunication, the term bandwidth compression has the following meanings:*The reduction of the Bandwidth needed to transmit a given amount of data in a given time....
 in a computer network
Computer network

A computer network is a group of interconnected computers. Networks may be classified according to a wide variety of characteristics. This article provides a general overview of some types and categories and also presents the basic components of a network....
. For symbolic data such as spreadsheets, text, executable programs
Executable compression

Executable compression is any means of Data compression an executable file and combining the compressed data with the decompression code it needs into a single executable....
, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).

For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of the limitations of the human sensory system, a great deal of space can be saved while producing an output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.

Lossy image compression
Image compression

Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or data transmission data in an efficient form....
 is used in digital camera
Digital camera

A digital camera is a camera that takes video or still photographs, or both, digitally by recording digital image via an electronics .Many compact digital still cameras can record sound and moving video as well as still photographs....
s, to increase storage capacities with minimal degradation of picture quality. Similarly, DVD
DVD

DVD, also known as "Digital Versatile Disc" or "Digital Video Disc,"is a popular optical disc data storage device media format. Its main uses are video and data storage....
s use the lossy MPEG-2
MPEG-2

MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of Lossy compression video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission bandwidth....
 codec
Video codec

A video codec is a device or software that enables video compression and/or decompression for digital video. The compression usually employs lossy data compression....
 for video compression
Video compression

Video compression refers to reducing the quantity of data used to represent digital video images, and is a straightforward combination of and motion compensation....
.

In lossy audio compression
Audio compression (data)

Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs....
, methods of psychoacoustics
Psychoacoustics

Psychoacoustics is the study of subjective human perception of sounds. Alternatively it can be described as the study of the psychological correlates of the physical parameters of acoustics....
 are used to remove non-audible (or less audible) components of the signal
Audio signal processing

Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of sound Signal , or sound. As audio signals may be electronically represented in either digital or analog signal format, signal processing may occur in either domain....
. Compression of human speech is often performed with even more specialized techniques, so that "speech compression
Speech encoding

Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream....
" or "voice coding" is sometimes distinguished as a separate discipline from "audio compression". Different audio and speech compression standards are listed under audio codec
Audio codec

An audio codec is a hardware device or a computer program that data compression digital audio data according to a given audio file format or streaming media....
s. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players.

Theory

The theoretical background of compression is provided by information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
 (which is closely related to algorithmic information theory
Algorithmic information theory

Algorithmic information theory is a subfield of information theory and computer science that concerns itself with the relationship between theory of computation and Information#Measuring information....
) and by rate-distortion theory. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Cryptography
Cryptography

Cryptography is the practice and study of hiding information. In modern times cryptography is considered a branch of both mathematics and computer science and is affiliated closely with information theory, computer security and engineering....
 and coding theory
Coding theory

Coding theory is a branch of information theory, electrical engineering, digital communication, mathematics, and computer science designing efficient and reliable data transmission methods, so that redundancy in the data can be removed and errors induced by a noisy channel can be corrected....
 are also closely related. The idea of data compression is deeply connected with statistical inference.

Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy data compression systems typically include even more stages, including, for example, prediction, frequency transformation, and quantization.

The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, therefore compression can be slow. DEFLATE is used in PKZIP
PKZIP

PKZIP is an archiving tool originally written by Phil Katz and marketed by his company PKWARE, Incorporation PKZIP is an acronym for Phil Katz's ZIP program....
, gzip
Gzip

gzip is a software application used for file compression. gzip is short for GNU zip; the program is a free software replacement for the compress program used in early Unix systems, intended for use by the GNU Project....
 and PNG. LZW
LZW

Lempel-Ziv-Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ77 and LZ78 algorithm published by Lempel and Ziv in 1978....
 (Lempel-Ziv-Welch) is used in GIF images. Also noteworthy are the LZR (LZ-Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded
Huffman coding

In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for...
 (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX
LZX (algorithm)

LZX is the name of an LZ77 family Data compression algorithm. It is also the name of a archive formats with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen....
, used in Microsoft's CAB
Cabinet (file format)

In computing, CAB is the Microsoft Windows native compressed archive format. It supports compression and digital signing, and is used in a variety of Microsoft installation engines: Setup API, Device Installer, AdvPack and Windows Installer....
 format.

The very best compressors use probabilistic models whose predictions are coupled to an algorithm called arithmetic coding
Arithmetic coding

Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the American Standard Code for Information Interchange code....
. Arithmetic coding, invented by Jorma Rissanen
Jorma Rissanen

Jorma J. Rissanen is an information theory, known for inventing the arithmetic coding technique of lossless data compression, and the minimum description length principle....
, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG
JBIG

JBIG is a Lossless data compression standard from the Joint Bi-level Image Experts Group, standardized as International Organization for Standardization/International Electrotechnical Commission standard 11544 and as ITU-T recommendation T.82....
, and the document-compression standard DjVu
DjVu

DjVu is a computer file format designed primarily to store , especially those containing combination of text, line drawings and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images....
. The text entry system, Dasher
Dasher

Dasher is a computer accessibility tool which enables users to write without using a Computer keyboard, by entering text on a screen using a pointing device such as a mouse, a touchpad, a touch screen, a roller ball, a joystick, a Wii Remote, or even mouses operated by the foot or head....
, is an inverse-arithmetic-coder.

There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution), while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for "general intelligence" .

See also


Data compression topics


Compression algorithms


Lossless data compression
  • run-length encoding
    Run-length encoding

    Run-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run....
  • dictionary coder
    Dictionary coder

    A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of string s contained in a data structure maintained by the encoder....
    s
    • LZ77 & LZ78
    • LZW
      LZW

      Lempel-Ziv-Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ77 and LZ78 algorithm published by Lempel and Ziv in 1978....
  • Burrows-Wheeler transform
    Burrows-Wheeler transform

    The Burrows-Wheeler transform , is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California....
  • prediction by partial matching (also known as PPM)
  • context mixing
    Context mixing

    Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions....
  • Dynamic Markov Compression
    Dynamic Markov compression

    Dynamic Markov compression is a lossless data compression algorithm developed by Gordon Cormack and Nigel Horspool . It uses predictive arithmetic coding similar to prediction by partial matching , except that the input is predicted one bit at a time ....
     (DMC)
  • entropy encoding
    Entropy encoding

    In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....
    • Huffman coding
      Huffman coding

      In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for...
       (simple entropy coding; commonly used as the final stage of compression)
    • Adaptive Huffman coding
      Adaptive Huffman coding

      Adaptive Huffman coding is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data....
    • arithmetic coding
      Arithmetic coding

      Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the American Standard Code for Information Interchange code....
       (more advanced)
      • Shannon-Fano coding
      • range encoding
        Range encoding

        Range encoding is a data compression method defined by G N N Martin in his 1979 paper on "Range encoding: an algorithm for removing redundancy from a digitized message" ....
         (same as arithmetic coding, but looked at in a slightly different way)
    • T-code
      T-code

      Each function in SAP ERP has an SAP Transaction Code associated with it. A transaction code consists of letters, numbers, or both. You enter transaction codes in the command field....
      , A variant of Huffman code
    • Golomb coding
      Golomb coding

      Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input strea...
       (simple entropy coding for infinite input data with a geometric distribution
      Geometric distribution

      In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:* the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set , or...
      )
    • universal code
      Universal code (data compression)

      In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic , the expected value lengths of the codewords are within a constant factor of the expec...
      s (entropy coding for infinite input data with an arbitrary distribution)
      • Elias gamma coding
        Elias gamma coding

        Elias gamma code is a universal code encoding positive integers developed by Peter Elias. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand....
      • Fibonacci coding
        Fibonacci coding

        In mathematics, Fibonacci coding is a universal code which encodes positive integers into binary code words. All tokens end with "11" and have no "11" before the end....


Lossy data compression
  • discrete cosine transform
    Discrete cosine transform

    A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequency....
  • fractal compression
    Fractal compression

    Fractal compression is a lossy data compression method using fractals to achieve high levels of compression. The method is best suited for photographs of natural scenes ....
    • fractal transform
      Fractal transform

      The fractal transform is a technique invented by Michael Barnsley et al. to perform lossy data compression .This first practical fractal compression system for digital images resembles a vector quantization system using the image itself as the codebook....
  • wavelet compression
    Wavelet compression

    Wavelet compression is a form of data compression well suited for . The goal is to store image data in as little space as possible in a Computer file....
  • vector quantization
    Vector quantization

    Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors....
  • linear predictive coding
    Linear predictive coding

    Linear predictive coding is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of Speech communication in data compression form, using the information of a linear prediction model....
  • Modulo-N code for correlated data
    Modulo-N code

    Modulo-N code is a lossy compression algorithm used to compress correlated data sources using modulo arithmetic....
  • A-law Compander
  • Mu-law Compander


Example implementations
  • DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP, gzip
    Gzip

    gzip is a software application used for file compression. gzip is short for GNU zip; the program is a free software replacement for the compress program used in early Unix systems, intended for use by the GNU Project....
     and PNG files
  • LZMA
    LZMA

    The Lempel-Ziv-Markov chain-Algorithm is an algorithm used to perform data compression. It has been under development since 1998 and is used in the 7z format of the 7-Zip archiver....
     used by 7-Zip
    7-Zip

    7-Zip is an open source file archiver designed originally for Microsoft Windows. 7-Zip operates primarily with the 7z archive format, as well as being able to read and write to several other archive formats....
  • LZO
    LZO

    Lempel-Ziv-Oberhumer is a lossless data compression algorithm that is focused on decompression speed.A free software tool which implements it is lzop....
     (very fast LZ variation, speed oriented)
  • LZX
    LZX (algorithm)

    LZX is the name of an LZ77 family Data compression algorithm. It is also the name of a archive formats with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen....
     (an LZ77 family compression algorithm)
  • Unix
    Unix

    Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
     compress
    Compress

    compress is a Unix compression program based on the LZC compression method, which is an LZW implementation using variable size pointers as in LZ78....
     utility (the .Z file format), and GIF use LZW
    LZW

    Lempel-Ziv-Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ77 and LZ78 algorithm published by Lempel and Ziv in 1978....
  • Unix pack utility (the .z file format) used Huffman coding
    Huffman coding

    In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for...
  • bzip2
    Bzip2

    bzip2 is a free software and open-source software lossless data compression algorithm and program developed by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996....
     (a combination of the Burrows-Wheeler transform and Huffman coding)
  • PAQ
    PAQ

    PAQ is a series of data compression archivers that have evolved through collaborative development to top rankings on several benchmarks measuring compression ratio ....
     (very high compression based on context mixing
    Context mixing

    Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions....
    , but extremely slow; competing in the top of the highest compression competitions)


  • JPEG
    JPEG

    In computing, JPEG is a commonly used method of for photographic images. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality....
     (image compression using a discrete cosine transform, then quantization, then Huffman coding)
  • MPEG (audio and video compression standards family in wide use, using DCT
    Discrete cosine transform

    A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequency....
     and motion-compensated prediction for video)
    • MP3
      MP3

      MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a digital audio Encoder format using a form of lossy data compression. It is a common audio format for consumer audio storage, as well as a de facto standard encoding for the transfer and playback of music on digital audio players....
       (a part of the MPEG-1
      MPEG-1

      MPEG-1 is a standard for lossy compression of video and Audio frequency. It is designed to compress VHS-quality raw digital video and CD audio down to 1.5 Mbit/s without excessive quality loss, making Video CDs, digital Cable television/Satellite television TV and digital audio broadcasting possible....
       standard for sound and music compression, using subbanding and MDCT
      Modified discrete cosine transform

      The modified discrete cosine transform is a List of Fourier-related transforms based on the type-IV discrete cosine transform , with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,...
      , perceptual modeling, quantization, and Huffman coding)
    • AAC
      Advanced Audio Coding

      Advanced Audio Coding is a standardized, lossy data compression Audio data compression and encoder scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at many bit rates....
       (part of the MPEG-2
      MPEG-2

      MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of Lossy compression video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission bandwidth....
       and MPEG-4
      MPEG-4

      MPEG-4 is a collection of methods defining Video compression of audio and visual digital data. It was introduced in late 1998 and designated a standardization for a group of sound and video coding formats and related technology agreed upon by the International Organization for Standardization/International Electrotechnical Commission Moving...
       audio coding specifications, using MDCT
      Modified discrete cosine transform

      The modified discrete cosine transform is a List of Fourier-related transforms based on the type-IV discrete cosine transform , with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,...
      , perceptual modeling, quantization, and Huffman coding)
  • Vorbis
    Vorbis

    Vorbis is a free software and open source software, Lossy compression audio codec project headed by the Xiph.Org Foundation and intended to serve as a replacement for MP3....
     (DCT based AAC-alike audio codec, designed with a focus on avoiding patent encumbrance)
  • JPEG 2000
    JPEG 2000

    JPEG 2000 is a wavelet-based standard. It was created by the Joint Photographic Experts Group committee in the year 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard ....
     (image compression using wavelets, then quantization, then entropy coding)
  • TTA (codec)
    TTA (codec)

    True Audio is a free software, real-time lossless audio codec, based on adaptive prognostic filters.Also, .tta is the generic filename extension to filename of audio files created by True Audio codec....
     (uses linear predictive coding
    Linear predictive coding

    Linear predictive coding is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of Speech communication in data compression form, using the information of a linear prediction model....
     for lossless audio compression)
  • FLAC (linear predictive coding
    Linear predictive coding

    Linear predictive coding is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of Speech communication in data compression form, using the information of a linear prediction model....
     for lossless audio compression)


Corpora

Data collections, commonly used for comparing compression algorithms.
  • Canterbury Corpus
    Canterbury Corpus

    The Canterbury Corpus is a collection of Computer file intended for use as a benchmark for testing lossless data compression algorithms. It was created in 1997 at the University of Canterbury, New Zealand and designed to replace the Calgary Corpus....
  • Calgary Corpus
    Calgary Corpus

    The Calgary Corpus is a collection of text and binary data files, commonly used for comparing data compression algorithms. It was created by Ian Witten and Tim Bell in the 1980s and was commonly used in the 1990s....


External links

  • by Guy E Blelloch from CMU
    Carnegie Mellon University

    Carnegie Mellon University is a top private university research university in Pittsburgh. Since its inception, Carnegie Mellon has grown into a world-renowned institution, with numerous programs that are frequently college and university rankings among the best in the world....
  • (Compares speed and efficiency for commonly used compression programs)