{{For|Jensen's inequality for analytic functions|Jensen's formula}}
In

mathematicsMathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

,

**Jensen's inequality**, named after the Danish mathematician Johan Jensen, relates the value of a

convex functionIn mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

of an

integralIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean after convex transformation; it is a simple corollary that the opposite is true of concave transformations.
Jensen's inequality generalizes the statement that the

secant lineA secant line of a curve is a line that intersects two points on the curve. The word secant comes from the Latin secare, to cut.It can be used to approximate the tangent to a curve, at some point P...

of a convex function lies

*above* the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function,

$t\; f(x\_1)\; +\; (1-t)\; f(x\_2),$ while the graph of the function is the convex function of the weighted means,

$f(t\; x\_1\; +\; (1-t)\; x\_2).$
There are also converses of the Jensen's inequality, which estimate the upper bound of the integral of the convex function.
In the context of

probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, it is generally stated in the following form: if

*X* is a

random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

and

$\backslash varphi$ is a convex function, then

$\backslash varphi\backslash left(\backslash mathbb\{E\}\backslash left[X\backslash right]\backslash right)\; \backslash leq\; \backslash mathbb\{E\}\backslash left[\backslash varphi(X)\backslash right].$
## Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of

measure theoryIn mathematical analysis, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, and volume...

or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its

*full strength*.

### Finite form

For a real

convex functionIn mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

{{nowrap|

$\backslash varphi$}}, numbers

*x*_{1},

*x*_{2}, ...,

*x*_{n} in its domain, and positive weights

*a*_{i}, Jensen's inequality can be stated as:

$\backslash varphi\backslash left(\backslash frac\{\backslash sum\; a\_i\; x\_i\}\{\backslash sum\; a\_j\}\backslash right)\; \backslash le\; \backslash frac\{\backslash sum\; a\_i\; \backslash varphi\; (x\_i)\}\{\backslash sum\; a\_j\}\; \backslash qquad\backslash qquad\; (1)$
and the inequality is reversed if {{nowrap|

$\backslash varphi$}} is

concaveIn mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

, which is

$\backslash varphi\backslash left(\backslash frac\{\backslash sum\; a\_i\; x\_i\}\{\backslash sum\; a\_j\}\backslash right)\; \backslash geq\; \backslash frac\{\backslash sum\; a\_i\; \backslash varphi\; (x\_i)\}\{\backslash sum\; a\_j\}.\backslash qquad\backslash qquad(2)$
As a particular case, if the weights

*a*_{i} are all equal, then (1) and (2) become

$\backslash varphi\backslash left(\backslash frac\{\backslash sum\; x\_i\}\{n\}\backslash right)\; \backslash le\; \backslash frac\{\backslash sum\; \backslash varphi\; (x\_i)\}\{n\}\; \backslash qquad\backslash qquad\; (3)$$\backslash varphi\backslash left(\backslash frac\{\backslash sum\; x\_i\}\{n\}\backslash right)\; \backslash geq\; \backslash frac\{\backslash sum\; \backslash varphi\; (x\_i)\}\{n\}\; \backslash qquad\backslash qquad\; (4)$
For instance, the function log(

*x*) is

*concave* (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken), so substituting

$\backslash scriptstyle\backslash varphi(x)\backslash ,=\backslash ,\backslash log(x)$ in the previous formula (4) establishes the (logarithm of) the familiar arithmetic mean-geometric mean inequality:

$\backslash frac\{x\_1\; +\; x\_2\; +\; \backslash cdots\; +\; x\_n\}\{n\}\; \backslash geq\; \backslash sqrt[n]\{x\_1\; x\_2\; \backslash cdots\; x\_n\}.$
The variable

*x* may, if required, be a function of another variable (or set of variables)

*t*, so that

*x*_{i} =

*g*(

*t*_{i}). All of this carries directly over to the general continuous case: the weights

*a*_{i} are replaced by a non-negative integrable function

*f*(

*x*), such as a probability distribution, and the summations are replaced by integrals.

### Measure-theoretic and probabilistic form

Let (Ω,

*A*,

*μ*) be a measure space, such that μ(Ω) = 1. If

*g* is a

realIn mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

-valued function that is μ-integrable, and if

$\backslash varphi$ is a

convex functionIn mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

on the real line, then:

$\backslash varphi\backslash left(\backslash int\_\backslash Omega\; g\backslash ,\; d\backslash mu\backslash right)\; \backslash le\; \backslash int\_\backslash Omega\; \backslash varphi\; \backslash circ\; g\backslash ,\; d\backslash mu.$
In real analysis, we may require an estimate on

$\backslash varphi\backslash left(\backslash int\_a^b\; f(x)\backslash ,\; dx\backslash right)$
where

$a,b$ are real numbers, and

$f:[a,b]\backslash to\backslash mathbb\{R\}$ is a non-negative

realIn mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

-valued function that is Lebesgue-integrable. In this case, the Lebesgue measure of

$[a,b]$ need not be unity. However, by integration
by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get

$\backslash varphi\backslash left(\backslash int\_a^b\; f(x)\backslash ,\; dx\backslash right)\; \backslash le\; \backslash int\_a^b\; \backslash varphi((b-a)f(x))\backslash frac\{1\}\{b-a\}\; \backslash ,dx.$
The same result can be equivalently stated in a

probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

setting, by a simple change of notation. Let

$\backslash scriptstyle(\backslash Omega,\; \backslash mathfrak\{F\},\backslash mathbb\{P\})$ be a

probability spaceIn probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...

,

*X* an integrable real-valued

random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

and

$\backslash varphi$ a

convex functionIn mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

. Then:

$\backslash varphi\backslash left(\backslash mathbb\{E\}\backslash left[X\backslash right]\backslash right)\; \backslash leq\; \backslash mathbb\{E\}\backslash left[\backslash varphi(X)\backslash right].$
In this probability setting, the measure μ is intended as a probability

$\backslash scriptstyle\backslash mathbb\{P\}$, the integral with respect to μ as an

expected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

$\backslash scriptstyle\backslash mathbb\{E\}$, and the function

*g* as a

random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

*X*.

### General inequality in a probabilistic setting

More generally, let

*T* be a real

topological vector spaceIn mathematics, a topological vector space is one of the basic structures investigated in functional analysis...

, and

*X* a

*T*-valued integrable random variable. In this general setting,

*integrable* means that there exists an element

$\backslash scriptstyle\backslash mathbb\{E\}\backslash \{X\backslash \}$ in

*T*, such that for any element

*z* in the

dual spaceIn mathematics, any vector space, V, has a corresponding dual vector space consisting of all linear functionals on V. Dual vector spaces defined on finite-dimensional vector spaces can be used for defining tensors which are studied in tensor algebra...

of

*T*:

$\backslash scriptstyle\backslash mathbb\{E\}|\backslash langle\; z,\; X\; \backslash rangle|\backslash ,<\backslash ,\backslash infty$, and

$\backslash scriptstyle\backslash langle\; z,\; \backslash mathbb\{E\}\backslash \{X\backslash \}\backslash rangle\backslash ,=\backslash ,\backslash mathbb\{E\}\backslash \{\backslash langle\; z,\; X\; \backslash rangle\backslash \}$. Then, for any measurable convex function φ and any sub-σ-algebra

$\backslash scriptstyle\backslash mathfrak\{G\}$ of

$\backslash scriptstyle\backslash mathfrak\{F\}$:

$\backslash varphi\backslash left(\backslash mathbb\{E\}\backslash left[X|\backslash mathfrak\{G\}\backslash right]\backslash right)\; \backslash leq\; \backslash mathbb\{E\}\backslash left[\backslash varphi(X)|\backslash mathfrak\{G\}\backslash right].$
Here

$\backslash scriptstyle\backslash mathbb\{E\}\backslash \{\backslash cdot|\backslash mathfrak\{G\}\; \backslash \}$ stands for the

expectation conditionedIn probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....

to the σ-algebra

$\backslash scriptstyle\backslash mathfrak\{G\}$. This general statement reduces to the previous ones when the topological vector space

*T* is the real axis, and

$\backslash scriptstyle\backslash mathfrak\{G\}$ is the trivial σ-algebra

$\backslash scriptstyle\backslash \{\backslash varnothing,\; \backslash Omega\backslash \}$.
In case that the sub-sigma algebra is generated by a measurable function

$Y$ the statement can be given as{{clarify|reason=explain or wikilink for notation|date=October 2011}}

$\backslash varphi\backslash left(\backslash mathbb\{E\}\backslash left[X|Y\backslash right]\backslash circ\; Y\backslash right)\; \backslash leq\; \backslash mathbb\{E\}\backslash left[\backslash varphi(X)|Y\backslash right]\backslash circ\; Y$
or

$\backslash varphi\backslash circ\backslash mathbb\{E\}\backslash left[X|Y\backslash right]\; \backslash leq\; \backslash mathbb\{E\}\backslash left[\backslash varphi\backslash circ\; X|Y\backslash right].$
## Proofs

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where

*X* is a real number (see figure). Assuming a hypothetical distribution of

*X* values, one can immediately identify the position of

$\backslash scriptstyle\backslash mathbb\{E\}\backslash \{X\backslash \}$ and its image

$\backslash scriptstyle\backslash varphi(\backslash mathbb\{E\}\backslash \{X\backslash \})$ in the graph. Noticing that for convex mappings

$\backslash scriptstyle\; Y\backslash ,=\backslash ,\backslash varphi(X)$ the corresponding distribution of

*Y* values is increasingly "stretched out" for increasing values of

*X*, it is easy to see that the distribution of

*Y* is broader in the interval corresponding to

*X* >

*X*_{0} and narrower in

*X* <

*X*_{0} for any

*X*_{0}; in particular, this is also true for

$\backslash scriptstyle\; X\_0\; \backslash ,=\backslash ,\; \backslash mathbb\{E\}\backslash \{\; X\; \backslash \}$. Consequently, in this picture the expectation of

*Y* will always shift upwards with respect to the position of

$\backslash scriptstyle\backslash varphi(\backslash mathbb\{E\}\backslash \{\; X\; \backslash \}\; )$, and this "proves" the inequality, i.e.

$\backslash mathbb\{E\}\backslash \{Y\backslash \}\; =\; \backslash mathbb\{E\}\backslash \{\; \backslash varphi(X)\; \backslash \}\; \backslash geq\; \backslash varphi(\backslash mathbb\{E\}\backslash \{\; X\; \backslash \}\; ),$
with equality when

*φ*(

*X*) is not strictly convex, e.g. when it is a straight line, or when

*X* follows a

degenerate distribution (i.e. is a constant).
The proofs below formalize this intuitive notion.

### Proof 1 (finite form)

If

*λ*_{1} and

*λ*_{2} are two arbitrary positive real numbers such that

*λ*_{1} +

*λ*_{2} = 1 then convexity of

$\backslash scriptstyle\backslash varphi$ implies

$\backslash varphi(\backslash lambda\_1\; x\_1+\backslash lambda\_2\; x\_2)\backslash leq\; \backslash lambda\_1\backslash ,\backslash varphi(x\_1)+\backslash lambda\_2\backslash ,\backslash varphi(x\_2)\backslash text\{\; for\; any\; \}x\_1,\backslash ,x\_2.$
This can be easily generalized: if

*λ*_{1},

*λ*_{2}, ...,

*λ*_{n} are positive real numbers such that

*λ*_{1} + ... +

*λ*_{n} = 1, then

$\backslash varphi(\backslash lambda\_1\; x\_1+\backslash lambda\_2\; x\_2+\backslash cdots+\backslash lambda\_n\; x\_n)\backslash leq\; \backslash lambda\_1\backslash ,\backslash varphi(x\_1)+\backslash lambda\_2\backslash ,\backslash varphi(x\_2)+\backslash cdots+\backslash lambda\_n\backslash ,\backslash varphi(x\_n),$
for any

*x*_{1}, ...,

*x*_{n}. This

*finite form* of the Jensen's inequality can be proved by

inductionMathematical induction is a method of mathematical proof typically used to establish that a given statement is true of all natural numbers...

: by convexity hypotheses, the statement is true for

*n* = 2. Suppose it is true also for some

*n*, one needs to prove it for

*n* + 1. At least one of the

*λ*_{i} is strictly positive, say

*λ*_{1}; therefore by convexity inequality:

$\backslash begin\{align\}\; \backslash varphi\backslash left(\backslash sum\_\{i=1\}^\{n+1\}\backslash lambda\_i\; x\_i\backslash right)\; \&\; =\; \backslash varphi\backslash left(\backslash lambda\_1\; x\_1+(1-\backslash lambda\_1)\backslash sum\_\{i=2\}^\{n+1\}\; \backslash frac\{\backslash lambda\_i\}\{1-\backslash lambda\_1\}\; x\_i\backslash right)\; \backslash \backslash \; \&\; \backslash leq\; \backslash lambda\_1\backslash ,\backslash varphi(x\_1)+(1-\backslash lambda\_1)\; \backslash varphi\backslash left(\backslash sum\_\{i=2\}^\{n+1\}\backslash left(\; \backslash frac\{\backslash lambda\_i\}\{1-\backslash lambda\_1\}\; x\_i\backslash right)\backslash right).\; \backslash end\{align\}$
Since

$\backslash scriptstyle\; \backslash sum\_\{i=2\}^\{n+1\}\; \backslash lambda\_i/(1-\backslash lambda\_1)\backslash ,\; =\backslash ,1$, one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.
In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

$\backslash varphi\backslash left(\backslash int\; x\backslash ,d\backslash mu\_n(x)\; \backslash right)\backslash leq\; \backslash int\; \backslash varphi(x)\backslash ,d\backslash mu\_n(x),$
where

*μ*_{n} is a measure given by an arbitrary

convex combinationIn convex geometry, a convex combination is a linear combination of points where all coefficients are non-negative and sum up to 1....

of Dirac deltas:

$\backslash mu\_n=\backslash sum\_\{i=1\}^n\; \backslash lambda\_i\; \backslash delta\_\{x\_i\}.$
Since convex functions are

continuousIn mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...

, and since convex combinations of Dirac deltas are

weaklyIn mathematics, weak topology is an alternative term for initial topology. The term is most commonly used for the initial topology of a topological vector space with respect to its continuous dual...

denseIn topology and related areas of mathematics, a subset A of a topological space X is called dense if any point x in X belongs to A or is a limit point of A...

in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

### Proof 2 (measure-theoretic form)

Let

*g* be a real-valued μ-integrable function on a probability space Ω, and let

*φ* be a convex function on the real numbers. Since φ is convex, at each real number x we have a nonempty set of

subderivativeIn mathematics, the concepts of subderivative, subgradient, and subdifferential arise in convex analysis, that is, in the study of convex functions, often in connection to convex optimization....

s, which may be thought of as lines touching the graph of φ at x, but for which at or below the graph of φ at all points.
Now, if we define

$x\_0:=\backslash int\_\backslash Omega\; g\backslash ,\; d\backslash mu,$
because of the existence of subderivatives for convex functions, we may choose an a and b such that

$ax\; +\; b\; \backslash leq\; \backslash varphi(x)$,
for all real x and

$ax\_0+\; b\; =\; \backslash varphi(x\_0)$.
But then we have that

$\backslash varphi\backslash circ\; g\; (x)\; \backslash geq\; ag(x)+\; b$
for all x. Since we have a probability measure, the integral is monotone with μ(Ω)=1 so that

$\backslash int\_\backslash Omega\; \backslash varphi\backslash circ\; g\backslash ,\; d\backslash mu\; \backslash geq\; \backslash int\_\backslash Omega\; (ag\; +\; b)\backslash ,\; d\backslash mu$### Proof 3 (general inequality in a probabilistic setting)

Let

*X* be an integrable random variable that takes values in a real topological vector space

*T*. Since

$\backslash scriptstyle\backslash varphi:T\; \backslash mapsto\; \backslash mathbb\{R\}$ is convex, for any

$x,y\; \backslash in\; T$, the quantity

$\backslash frac\{\backslash varphi(x+\backslash theta\backslash ,y)-\backslash varphi(x)\}\{\backslash theta\},$
is decreasing as θ approaches 0

^{+}. In particular, the

*subdifferential* of

*φ* evaluated at

*x* in the direction

*y* is well-defined by

$(D\backslash varphi)(x)\backslash cdot\; y:=\backslash lim\_\{\backslash theta\; \backslash downarrow\; 0\}\; \backslash frac\{\backslash varphi(x+\backslash theta\backslash ,y)-\backslash varphi(x)\}\{\backslash theta\}=\backslash inf\_\{\backslash theta\; \backslash neq\; 0\}\; \backslash frac\{\backslash varphi(x+\backslash theta\backslash ,y)-\backslash varphi(x)\}\{\backslash theta\}.$
It is easily seen that the subdifferential is linear in

*y* and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for

*θ* = 1, one gets

$\backslash varphi(x)\backslash leq\; \backslash varphi(x+y)-(D\backslash varphi)(x)\backslash cdot\; y.\backslash ,$
In particular, for an arbitrary sub-σ-algebra

$\backslash scriptstyle\backslash mathfrak\{G\}$ we can evaluate the last inequality when

$\backslash scriptstyle\; x\backslash ,=\backslash ,\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \},\backslash ,y=X-\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \}$ to obtain

$\backslash varphi(\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \})\backslash leq\; \backslash varphi(X)-(D\backslash varphi)(\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \})\backslash cdot\; (X-\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \}).$
Now, if we take the expectation conditioned to

$\backslash scriptstyle\backslash mathfrak\{G\}$ on both sides of the previous expression, we get the result since:

$\backslash mathbb\{E\}\backslash \{\backslash left[(D\backslash varphi)(\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \})\backslash cdot\; (X-\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \})\backslash right]|\backslash mathfrak\{G\}\backslash \}=(D\backslash varphi)(\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \})\backslash cdot\; \backslash mathbb\{E\}\backslash \{\; \backslash left(\; X-\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \}\; \backslash right)\; |\backslash mathfrak\{G\}\backslash \}=0,$
by the linearity of the subdifferential in the

*y* variable, and the following well-known property of the

conditional expectationIn probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....

:

$\backslash mathbb\{E\}\backslash \{\; \backslash left(\backslash mathbb\{E\}\backslash \{X|\backslash mathfrak\{G\}\backslash \}\; \backslash right)\; |\backslash mathfrak\{G\}\backslash \}=\backslash mathbb\{E\}\backslash \{\; X\; |\backslash mathfrak\{G\}\backslash \}.$
### Form involving a probability density function

Suppose Ω is a measurable subset of the real line and

*f*(

*x*) is a non-negative function such that

$\backslash int\_\{-\backslash infty\}^\backslash infty\; f(x)\backslash ,dx\; =\; 1.$
In probabilistic language,

*f* is a

probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

.
Then Jensen's inequality becomes the following statement about convex integrals:
If

*g* is any real-valued measurable function and φ is convex over the range of

*g*, then

$\backslash varphi\backslash left(\backslash int\_\{-\backslash infty\}^\backslash infty\; g(x)f(x)\backslash ,\; dx\backslash right)\; \backslash le\; \backslash int\_\{-\backslash infty\}^\backslash infty\; \backslash varphi(g(x))\; f(x)\backslash ,\; dx.$
If

*g*(

*x*) =

*x*, then this form of the inequality reduces to a commonly used special case:

$\backslash varphi\backslash left(\backslash int\_\{-\backslash infty\}^\backslash infty\; x\backslash ,\; f(x)\backslash ,\; dx\backslash right)\; \backslash le\; \backslash int\_\{-\backslash infty\}^\backslash infty\; \backslash varphi(x)\backslash ,f(x)\backslash ,\; dx.$
### Alternative finite form

If

$\backslash Omega$ is some finite set

$\backslash \{x\_1,x\_2,\backslash ldots,x\_n\backslash \}$, and if

$\backslash mu$ is a

counting measureIn mathematics, the counting measure is an intuitive way to put a measure on any set: the "size" of a subset is taken to be the number of elements in the subset, if the subset is finite, and ∞ if the subset is infinite....

on

$\backslash Omega$, then the general form reduces to a statement about sums:

$\backslash varphi\backslash left(\backslash sum\_\{i=1\}^\{n\}\; g(x\_i)\backslash lambda\_i\; \backslash right)\; \backslash le\; \backslash sum\_\{i=1\}^\{n\}\; \backslash varphi(g(x\_i))\backslash lambda\_i,$
provided that

$\backslash lambda\_1\; +\; \backslash lambda\_2\; +\; \backslash cdots\; +\; \backslash lambda\_n\; =\; 1,\; \backslash lambda\_i\; \backslash ge\; 0.$
There is also an infinite discrete form.

### Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

$e^\{\backslash langle\; X\; \backslash rangle\}\; \backslash leq\; \backslash left\backslash langle\; e^X\; \backslash right\backslash rangle,$
where angle brackets denote

expected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

s with respect to some

probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

in the

random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

*X*.
The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing

$\backslash left\backslash langle\; e^X\; \backslash right\backslash rangle\; e^\{\backslash langle\; X\; \backslash rangle\}\; \backslash left\backslash langle\; e^\{X\; -\; \backslash langle\; X\; \backslash rangle\}\; \backslash right\backslash rangle$
and then applying the inequality

$e^X\; \backslash geq\; 1+X\; \backslash ,$
to the final exponential.

### Information theory

If

*p*(

*x*) is the true probability distribution for

*x*, and

*q*(

*x*) is another distribution, then applying Jensen's inequality for the random variable

*Y*(

*x*) =

*q*(

*x*)/

*p*(

*x*) and the function

$\backslash varphi$(

*y*) = −log(

*y*) gives

$\backslash Bbb\{E\}\backslash \{\backslash varphi(Y)\backslash \}\; \backslash ge\; \backslash varphi(\backslash Bbb\{E\}\backslash \{Y\backslash \})$
$\backslash Rightarrow\; \backslash int\; p(x)\; \backslash log\; \backslash frac\{p(x)\}\{q(x)\}\; \backslash ,\; dx\; \backslash ge\; -\; \backslash log\; \backslash int\; p(x)\; \backslash frac\{q(x)\}\{p(x)\}\; \backslash ,\; dx$
$\backslash Rightarrow\; \backslash int\; p(x)\; \backslash log\; \backslash frac\{p(x)\}\{q(x)\}\; \backslash ,\; dx\; \backslash ge\; 0$
$\backslash Rightarrow\; -\; \backslash int\; p(x)\; \backslash log\; q(x)\; \backslash ,\; dx\; \backslash ge\; -\; \backslash int\; p(x)\; \backslash log\; p(x)\; \backslash ,\; dx,$
a result called

Gibbs' inequalityIn information theory, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality....

.
It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities

*p* rather than any other distribution

*q*. The quantity that is non-negative is called the Kullback–Leibler divergence of

*q* from

*p*.

### Rao–Blackwell theorem

{{main|Rao–Blackwell theorem}}
If

*L* is a convex function, then from Jensen's inequality we get

$L(\backslash Bbb\{E\}\backslash \{\backslash delta(X)\backslash \})\; \backslash le\; \backslash Bbb\{E\}\backslash \{L(\backslash delta(X))\backslash \}\; \backslash quad\; \backslash Rightarrow\; \backslash quad\; \backslash Bbb\{E\}\backslash \{L(\backslash Bbb\{E\}\backslash \{\backslash delta(X)\backslash \})\backslash \}\; \backslash le\; \backslash Bbb\{E\}\backslash \{L(\backslash delta(X))\backslash \}.\; \backslash ,$
So if δ(

*X*) is some

estimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

of an unobserved parameter θ given a vector of observables

*X*; and if

*T*(

*X*) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss

*L*, can be obtained by calculating

$\backslash delta\_1\; (X)\; =\; \backslash Bbb\{E\}\_\{\backslash theta\}\backslash \{\backslash delta(X\text{'})\; \backslash ,|\backslash ,\; T(X\text{'})=\; T(X)\backslash \},\; \backslash ,$
the expected value of δ with respect to θ, taken over all possible vectors of observations

*X* compatible with the same value of

*T*(

*X*) as that observed.
This result is known as the Rao–Blackwell theorem.

## See also

NEWLINE

NEWLINE- Karamata's inequality
In mathematics, Karamata's inequality, named after Jovan Karamata, also known as the majorization inequality, is a theorem in elementary algebra for convex and concave real-valued functions, defined on an interval of the real line...

for a more general inequality. NEWLINE- Law of averages
The law of averages is a lay term used to express a belief that outcomes of a random event will "even out" within a small sample.As invoked in everyday life, the "law" usually reflects bad statistics or wishful thinking rather than any mathematical principle...

NEWLINE- The operator Jensen inequality of Hansen and Pedersen.

NEWLINE
{{refimprove|date=October 2011}}