Encyclopedia
In
mathematics,
Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a
convex function of an
integral to the integral of the convex function. It was proved by Jensen himself in 1906. Given its generality, the inequality appears in several different forms depending on the context, some of which will be presented below.
Statements
The inequality can be stated quite generally using measure theory, and can be further generalized to its
full strength in a probabilistic setting.
In measure-theoretic notation
Let be a measure space, such that μ = 1. If
g is a real-valued function that is
μ-integrable, and if φ is a measurable
convex function on the real axis, then:
In probability-theory notation
The same result can be stated in a probability theory setting. Let be a probability space, an integrable real-valued random variable and φ a measurable convex function. Then:
In this probability setting, the measure μ is intended as a probability , the integral with respect to μ as an expected value , and the function
g as a random variable .
In probability-theory notation
More generally, let
T be a real topological vector space, and a
T-valued integrable random variable. In this general setting,
integrable means that for any element
z in the dual space of
T: , and there exists an element in
T, such that . Then, for any measurable convex function φ and any sub-σ-algebra of :
Here stands for the
expectation conditioned to the σ-algebra . This general statement reduces to the previous one when the topological vector space
T is the real axis, and is the trivial σ-algebra .
Finite form
For a real convex function φ and positive weights
ai, Jensen's inequality can be stated as:
and the inequality is clearly reversed if φ is concave.
As a particular case, if the weights
ai are all equal to unity, then
For instance, the function is
concave, so substituting in the previous formula, this establishes the the familiar arithmetic mean-geometric mean inequality:
The variable
x may, if required, be a function of another variable
t, so that . All of this carries directly over to the general continuous case: the weights
ai are replaced by a non-negative integrable function
f, such as a probability distribution, for example; and the summations replaced by integrals.
Proofs
A proof of Jensen's inequality can be provided in several ways, and three different proofs corresponding to the three different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where is a real number . Assuming a hypothetical distribution of
X values, one can immediately identify the position of and its image in the graph. Noticing that for convex mappings
Y the corresponding distribution of
Y values is increasingly "stretched out" for increasing values of
X, it is easy to see that the distribution of
Y is broader than that of
X in the interval corresponding to and narrower in for any ; in particular, this is also true for . Consequently, in this picture the expectation of
Y will always shift upwards with respect to the position of , and this "proves" the inequality, i.e.
the equality taking place when
Y is not strictly convex, e.g. when it is a straight line.
The proofs below formalize this intuitive notion.
Proof 1
If are two arbitrary positive real numbers such that , then convexity of implies for any . This can be easily generalized: if are
n positive real numbers such that , then
for any . This
finite form of the Jensen's inequality can be proved by
induction: by convexity hypotheses, the statement is true for . Suppose it is true also for some
n, one needs to prove it for
n+1. At least one of the is strictly positive, say ; therefore by convexity inequality:
Since , one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.
In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be re-written as:
where is a measure given by an arbitrary convex combination of
Dirac deltas:
Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures , the general statement is obtained simply by a limiting procedure.
Proof 2
Let
g be a real-valued μ-integrable function on a measure space Ω, and let
φ be a convex function on the real numbers. Define the right-handed derivative of φ at
x as
Since φ is convex, the quotient of the right-hand side is decreasing when
t approaches 0 from the right, and bounded below by any term of the form
where
t < 0, and therefore, the limit does always exist.
Now, let us define the following:
Then for all
x, . To see that, take
x>
x0, and define
t =
x −
x0 > 0. Then,
Therefore,
as desired. The case for
x <
x0 is proven similarly, and clearly .
φ can then be rewritten as
But since μ = 1, then for every real number
k we have
In particular,
Proof 3
Let be an integrable random variable that takes value in a real topological vector space
T. Since is convex, for any , the quantity
is decreasing as θ approaches
0. In particular, it is well defined the
subdifferential of evaluated at in the direction , defined by:
It is easily seen that the subdifferential is linear in and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for , one gets:
In particular, for an arbitrary sub-σ-algebra we can evaluate the last inequality when to obtain:
Now, if we take the expectation conditioned to on both sides of the previous expression, we get the result since:
by the linearity of the subdifferential in the variable, and well-known properties of the
conditional expectation.
Applications and special cases
Form involving a probability density function
Suppose Ω is a measurable subset of the real line and
f is a non-negative function such that
In probabilistic language,
f is a probability density function.
Then Jensen's inequality becomes the following statement about convex integrals:
If
g is any real-valued measurable function and φ is convex over the range of
g, then
If
g =
x, then this form of the inequality reduces to a commonly used special case:
Alternative finite form
If is some finite set , and if is a counting measure on , then the general form reduces to a statement about sums:
provided that
There is also an infinite discrete form.
Statistical physics
Jensen's equality is of particular importance in statistical physics when the convex function is an exponential, giving:
where angle brackets denote expected values with respect to some
probability distribution in the random variable
X.
The proof in this case is very simple . The desired inequality follows directly, by writing
and then applying the inequality
to the final exponential.
Information theory
If
p is the true probability distribution for
x, and
q is another distribution, then applying Jensen's inequality for the random variable
Y =
q/
p and the function φ = −log gives
a result called Gibbs' inequality.
It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities
p rather than any other distribution
q. The quantity that is greater than zero is called the Kullback-Leibler distance of
q from
p.
Rao-Blackwell theorem
- Main article: Rao-Blackwell theorem
If
L is a convex function, then from Jensen's inequality we get
So if δ is some estimator of an unobserved parameter θ given a vector of observables
X; and if
T is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss
L, can be obtained by calculating
the expectated value of δ with respect to θ, taken over all possible vectors of observations
X compatible with the same value of
T as that observed.
This result is known as the Rao-Blackwell theorem.
References
External links
-
- Jensen's inequality serves as the logo for the
Footnotes