Redundancy (information theory)
Encyclopedia
Redundancy in information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

 is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 is a way to reduce or eliminate unwanted redundancy, while checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

s are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity
Channel capacity
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel...

.

Quantitative definition

In describing the redundancy of raw data, recall that the rate
Entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process...

 of a source of information is the average entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

 per symbol. For memory less sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is


the limit, as n goes to infinity, of the joint entropy of the first n symbols divided by n. It is common in information theory to speak of the "rate" or "entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memory less source is simply , since by definition there is no interdependence of the successive messages of a memory less source.

The absolute rate of a language or source is simply


the logarithm
Logarithm
The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...

 of the cardinality of the message space, or alphabet. (This formula is sometimes called the Hartley function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memory less and has a uniform distribution.

The absolute redundancy can then be defined as


the difference between the absolute rate and the rate.

The quantity is called the relative redundancy and gives the maximum possible data compression ratio
Data compression ratio
Data compression ratio, also known as compression power, is a computer-science term used to quantify the reduction in data-representation size produced by a data compression algorithm...

, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memory less source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed.

Other notions of redundancy

A measure of redundancy between two variables is the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

 or a normalized variant. A measure of redundancy among many variables is given by the total correlation
Total correlation
In probability theory and in particular in information theory, total correlation is one of several generalizations of the mutual information. It is also known as the multivariate constraint or multiinformation...

.

Redundancy of compressed data refers to the difference between the expected
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 compressed data length of messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic
Ergodic theory
Ergodic theory is a branch of mathematics that studies dynamical systems with an invariant measure and related problems. Its initial development was motivated by problems of statistical physics....

 and stationary
Stationary
Stationary can mean:* In statistics and probability: a stationary process.* In mathematics: a stationary point.* In mathematics: a stationary set.* In physics: a time-invariant quantity, such as a constant position or temperature....

, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources.

See also

  • Data compression
    Data compression
    In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

  • Hartley function
  • Negentropy
    Negentropy
    The negentropy, also negative entropy or syntropy, of a living system is the entropy that it exports to keep its own entropy low; it lies at the intersection of entropy and life...

  • Source coding theorem
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK