Gibbs' inequality

Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality

Fano's inequality

In information theory, Fano's inequality relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D...

.
It was first presented by J. Willard Gibbs in the 19th century.

Suppose that

is a probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. Then for any other probability distribution

the following inequality between positive quantities (since the p_i and q_i are positive numbers less than one) holds

with equality if and only if

for all i. Put in words, the information entropy

Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

of a distribution P is less than or equal to its cross entropy

Cross entropy

In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p.The cross entropy...

with any other distribution Q.

The difference between the two quantities is the negative of the Kullback–Leibler divergence

Kullback–Leibler divergence

In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

or relative entropy, so the inequality can also be written:

Note that the use of base-2 logarithm

Logarithm

The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...

s is optional, and
allows one to refer to the quantity on each side of the inequality as an
"average surprisal" measured in bit

Bit

A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

Proof

Since

it is sufficient to prove the statement using the natural logarithm (ln). Note that the natural logarithm satisfies

for all x with equality if and only if x=1.

Let

denote the set of all

for which p_i is non-zero. Then

and then trivially

since the right hand side does not grow, but the left hand side may grow or may stay the same.

For equality to hold, we require:

for all so that the approximation is exact.
so that equality continues to hold between the third and fourth lines of the proof.

This can happen if and only if

for i = 1, ..., n.

Alternative proofs

The result can alternatively be proved using Jensen's inequality

Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...

or log sum inequality

Log sum inequality

In mathematics, the log sum inequality is an inequality which is useful for proving several theorems in information theory.-Statement:Let a_1,\ldots,a_n and b_1,\ldots,b_n be nonnegative numbers. Denote the sum of all a_i\;s by a and the sum of all b_i\;s by b...

Corollary

The entropy

Information entropy

is bounded by:

The proof is trivial - simply set

for all i.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.