Gibbs' inequality
Encyclopedia
In information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality
Fano's inequality
In information theory, Fano's inequality relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D...

.
It was first presented by J. Willard Gibbs in the 19th century.

Gibbs' inequality

Suppose that


is a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. Then for any other probability distribution


the following inequality between positive quantities (since the pi and qi are positive numbers less than one) holds


with equality if and only if


for all i. Put in words, the information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

 of a distribution P is less than or equal to its cross entropy
Cross entropy
In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p.The cross entropy...

 with any other distribution Q.

The difference between the two quantities is the negative of the Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

 or relative entropy, so the inequality can also be written:


Note that the use of base-2 logarithm
Logarithm
The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...

s is optional, and
allows one to refer to the quantity on each side of the inequality as an
"average surprisal" measured in bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s.

Proof

Since


it is sufficient to prove the statement using the natural logarithm (ln). Note that the natural logarithm satisfies


for all x with equality if and only if x=1.

Let denote the set of all for which pi is non-zero. Then


So


and then trivially


since the right hand side does not grow, but the left hand side may grow or may stay the same.

For equality to hold, we require:
  1. for all so that the approximation is exact.
  2. so that equality continues to hold between the third and fourth lines of the proof.


This can happen if and only if


for i = 1, ..., n.

Alternative proofs

The result can alternatively be proved using Jensen's inequality
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...

 or log sum inequality
Log sum inequality
In mathematics, the log sum inequality is an inequality which is useful for proving several theorems in information theory.-Statement:Let a_1,\ldots,a_n and b_1,\ldots,b_n be nonnegative numbers. Denote the sum of all a_i\;s by a and the sum of all b_i\;s by b...

.

Corollary

The entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

of is bounded by:


The proof is trivial - simply set for all i.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK