Rate distortion theory
Encyclopedia
Rate–distortion theory is a major branch of information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

 which provides the theoretical foundations for lossy data compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

; it addresses the problem of determining the minimal amount of entropy
Entropy
Entropy is a thermodynamic property that can be used to determine the energy available for useful work in a thermodynamic process, such as in energy conversion devices, engines, or machines. Such devices can only be driven by convertible energy, and have a theoretical maximum efficiency when...

 (or information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

) R that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.

Introduction

Rate–distortion theory gives theoretical bounds for how much compression can be achieved using lossy compression methods. Many of the existing audio, speech, image, and video compression techniques have transforms, quantization, and bit-rate allocation procedures that capitalize on the general shape of rate–distortion functions.

Rate–distortion theory was created by Claude Shannon in his foundational work on information theory.

In rate–distortion theory, the rate is usually understood as the number of bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s per data sample to be stored or transmitted. The notion of distortion is a subject of on-going discussion. In the most simple case (which is actually used in most cases), the distortion is defined as the variance of the difference between input and output signal (i.e., the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

 of the difference). However, since we know that most lossy compression techniques operate on data that will be perceived by human consumers (listening to music, watching pictures and video) the distortion measure should preferably be modeled on human perception
Perception
Perception is the process of attaining awareness or understanding of the environment by organizing and interpreting sensory information. All perception involves signals in the nervous system, which in turn result from physical stimulation of the sense organs...

 and perhaps aesthetics
Aesthetics
Aesthetics is a branch of philosophy dealing with the nature of beauty, art, and taste, and with the creation and appreciation of beauty. It is more scientifically defined as the study of sensory or sensori-emotional values, sometimes called judgments of sentiment and taste...

: much like the use of probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 in lossless compression, distortion measures can ultimately be identified with loss function
Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

s as used in Bayesian estimation
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

 and decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

. In audio compression, perceptual models (and therefore perceptual distortion measures) are relatively well developed and routinely used in compression techniques such as MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

 or Vorbis
Vorbis
Vorbis is a free software / open source project headed by the Xiph.Org Foundation . The project produces an audio format specification and software implementation for lossy audio compression...

, but are often not easy to include in rate–distortion theory. In image and video compression, the human perception models are less well developed and inclusion is mostly limited to the JPEG
JPEG
In computing, JPEG . The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality....

 and MPEG weighting (quantization
Quantization (signal processing)
Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The error introduced by...

, normalization
Normalization
Normalization may refer to:- Mathematics and statistics:* Normalization property , term in mathematical logic and theoretical computer science* Noether normalization lemma, result of commutative algebra...

) matrix.

Rate–distortion functions

The functions that relate the rate and distortion are found as the solution of the following minimization problem:



Here QY | X(y | x), sometimes called a test channel, is the conditional

Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

 probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

 (PDF) of the communication channel output (compressed signal) Y for a given input (original signal) X, and IQ(Y ; X) is the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

between Y and X defined as


where H(Y) and H(Y | X) are the entropy of the output signal Y and the conditional entropy
Conditional entropy
In information theory, the conditional entropy quantifies the remaining entropy of a random variable Y given that the value of another random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H...

 of the output signal given the input signal, respectively:



The problem can also be formulated as a distortion–rate function, where we find the infimum over achievable distortions for given rate constraint. The relevant expression is:


The two formulations lead to functions which are inverses of each other.

The mutual information can be understood as a measure for prior uncertainty the receiver has about the sender's signal (H(Y)), diminished by the uncertainty that is left after receiving information about the sender's signal (H(Y | X)). Of course the decrease in uncertainty is due to the communicated amount of information, which is I(Y; X).

As an example, in case there is no communication at all, then H(Y |X) = H(Y) and I(Y; X) = 0. Alternatively, if the communication channel is perfect and the received signal Y is identical to the signal X at the sender, then H(Y | X) = 0 and I(Y; X) = H(Y) = H(X).

In the definition of the rate–distortion function, DQ and D* are the distortion between X and Y for a given QY | X(y | x) and the prescribed maximum distortion, respectively. When we use the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

 as distortion measure, we have (for amplitude-continuous signals):



As the above equations show, calculating a rate–distortion function requires the stochastic description of the input X in terms of the PDF PX(x), and then aims at finding the conditional PDF QY | X(y | x) that minimize rate for a given distortion D*. These definitions can be formulated measure-theoretically to account for discrete and mixed random variables as well.

An analytical

Analytical expression
In mathematics, an analytical expression is a mathematical expression, constructed using well-known operations that lend themselves readily to calculation...

 solution to this minimization problem
Optimization problem
In mathematics and computer science, an optimization problem is the problem of finding the best solution from all feasible solutions. Optimization problems can be divided into two categories depending on whether the variables are continuous or discrete. An optimization problem with discrete...

 is often difficult to obtain except in some instances for which we next offer two of the best known examples. The rate–distortion function of any source is known to obey several fundamental properties, the most important ones being that it is a continuous
Continuous function
In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...

, monotonically decreasing convex
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

 (U) function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

 and thus the shape for the function in the examples is typical (even measured rate–distortion functions in real life tend to have very similar forms).

Although analytical solutions to this problem are scarce, there are upper and lower bounds to these functions including the famous Shannon lower bound (SLB), which in the case of squared error and memoryless sources, states that for arbitrary sources with finite differential entropy,


where h(D) is the differential entropy of a Gaussian random variable with variance D. This lower bound is extensible to sources with memory and other distortion measures. One important feature of the SLB is that it is asymptotically tight in the low distortion regime for a wide class of sources and in some occasions, it actually coincides with the rate–distortion function. Shannon Lower Bounds can generally be found if the distortion between any two numbers can be expressed as a function of the difference between the value of these two numbers.

The Blahut-Arimoto algorithm, co-invented by Richard Blahut
Richard Blahut
Richard Blahut, former chair of the Electrical and Computer Engineering Department at the University of Illinois at Urbana-Champaign, is best known for his work in information theory . He received his PhD Electrical Engineering from Cornell University in 1972.Blahut taught at Cornell from 1973 to...

, is an elegant iterative technique for numerically obtaining rate–distortion functions of arbitrary finite input/output alphabet sources and much work has been done to extend it to more general problem instances.

Memoryless (independent) Gaussian source

If we assume that PX(x) is Gaussian with variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 σ2, and if we assume that successive samples of the signal X are stochastically independent (or, if you like, the source is memoryless
Memorylessness
In probability and statistics, memorylessness is a property of certain probability distributions: the exponential distributions of non-negative real numbers and the geometric distributions of non-negative integers....

, or the signal is uncorrelated), we find the following analytical expression
Analytical expression
In mathematics, an analytical expression is a mathematical expression, constructed using well-known operations that lend themselves readily to calculation...

 for the rate–distortion function:


The following figure shows what this function looks like:
Rate–distortion theory tell us that no compression system exists that performs outside the gray area. The closer a practical compression system is to the red (lower) bound, the better it performs. As a general rule, this bound can only be attained by increasing the coding block length parameter. Nevertheless, even at unit blocklengths one can often find good (scalar) quantizers that operate at distances from the rate–distortion function that are practically relevant.

This rate–distortion function holds only for Gaussian memoryless sources. It is known that the Gaussian source is the most "difficult" source to encode: for a given mean square error, it requires the greatest number of bits. The performance of a practical compression system working on—say—images, may well be below the R(D) lower bound shown.

See also

  • Decorrelation
    Decorrelation
    Decorrelation is a general term for any process that is used to reduce autocorrelation within a signal, or cross-correlation within a set of signals, while preserving other aspects of the signal. A frequently used method of decorrelation is the use of a matched linear filter to reduce the...

  • Rate–distortion optimization
    Rate–distortion optimization
    Rate–distortion optimization is a method of improving video quality in video compression. The name refers to the optimization of the amount of distortion against the amount of data required to encode the video, the rate...

  • Source coding
    Source coding
    In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy....

  • Whitening
    White noise
    White noise is a random signal with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK