All Topics  
Redundancy (information theory)

 

   Email Print
   Bookmark   Link






 

Redundancy (information theory)



 
 
Redundancy in information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
 is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 is a way to reduce or eliminate unwanted redundancy, while checksum
Checksum

A checksum or hash sum is a fixed-size data computed from an arbitrary block of digital data for the purpose of error detection that may have been introduced during its telecommunications or computer storage....
s are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity
Channel capacity

In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a channel ....
.

Quantitative definition
In describing the redundancy of raw data, recall that the rate
Entropy rate

The entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process....
 of a source of information is the average entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
 per symbol.






Discussion
Ask a question about 'Redundancy (information theory)'
Start a new discussion about 'Redundancy (information theory)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Redundancy in information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
 is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression
Data compression

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
 is a way to reduce or eliminate unwanted redundancy, while checksum
Checksum

A checksum or hash sum is a fixed-size data computed from an arbitrary block of digital data for the purpose of error detection that may have been introduced during its telecommunications or computer storage....
s are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity
Channel capacity

In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a channel ....
.

Quantitative definition


In describing the redundancy of raw data, recall that the rate
Entropy rate

The entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process....
 of a source of information is the average entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
 per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is

the limit, as n goes to infinity, of the joint entropy
Joint entropy

The joint entropy is an information entropy used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables....
 of the first n symbols divided by n. It is common in information theory to speak of the "rate" or "entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply , since by definition there is no interdependence of the successive messages of a memoryless source.

The absolute rate of a language or source is simply

the logarithm
Logarithm

In mathematics, the logarithm of a number to a given base is the Power or exponent to which the base must be raised in order to produce the number....
 of the cardinality
Cardinality

In mathematics, the cardinality of a set is a measure of the "number of Element of the set". For example, the set A = contains 3 elements, and therefore A has a cardinality of 3....
 of the message space, or alphabet. (This formula is sometimes called the Hartley function
Hartley function

The Hartley function is a measure of uncertainty, introduced by Ralph Hartley in 1928. If we pick a sample from a finite set A uniformly at random, the information revealed after we know the outcome is given by the Hartley function...
.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memoryless and has a uniform distribution
Uniform distribution (discrete)

In probability theory and statistics, the discrete uniform distribution is a discrete probability distribution that can be characterized by saying that all values of a finite set of possible values are equally probable....
.

The absolute redundancy can then be defined as

the difference between the absolute rate and the rate.

The quantity is called the relative redundancy and gives the maximum possible data compression ratio
Data compression ratio

Data compression ratio, also known as compression power, is a computer-science term used to quantify the reduction in data-representation size produced by a data compression....
, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memoryless source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed.

Other notions of redundancy


A measure of redundancy between two variables is the mutual information
Mutual information

In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two variables....
 or a normalized variant. A measure of redundancy among many variables is given by the total correlation
Total correlation

In probability theory and in particular in information theory, total correlation is one of several generalizations of the mutual information. It is also known as the multivariate constraint or multiinformation ....
.

Redundancy of compressed data refers to the difference between the expected
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 compressed data length of messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic
Ergodic theory

Ergodic theory is a branch of mathematics that studies dynamical systemswith an invariant measure and related problems. Its initial development was motivated by problems of statistical physics....
 and stationary
Stationary

Stationary can mean:* Not moving .* Unchanging .* In statistics and probability: a stationary process.* In mathematics: a stationary point....
, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources.

See also

  • Data compression
    Data compression

    In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
  • Exformation
    Exformation

    Exformation is a term meaning explicitly discarded information.It was coined by Danish physicist Tor N?rretranders in his book The User Illusion published in English 1998....
  • Negentropy
    Negentropy

    The negentropy, also negative entropy or syntropy, of a living system is the entropy that it exports to keep its own entropy low; it lies at the intersection of entropy and life....
  • Source coding theorem