Chebyshev's inequality

# Chebyshev's inequality

Discussion

Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, Chebyshev’s inequality (also spelled as Tchebysheff’s inequality) guarantees that in any data sample
Sample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...

or probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

,"nearly all" values are close to the mean
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

— the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean. The inequality has great utility because it can be applied to completely arbitrary distributions (unknown except for mean and variance), for example it can be used to prove the weak law of large numbers.

The theorem is named after Russian mathematician Pafnuty Chebyshev
Pafnuty Chebyshev
Pafnuty Lvovich Chebyshev was a Russian mathematician. His name can be alternatively transliterated as Chebychev, Chebysheff, Chebyshov, Tschebyshev, Tchebycheff, or Tschebyscheff .-Early years:One of nine children, Chebyshev was born in the village of Okatovo in the district of Borovsk,...

, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé
Irénée-Jules Bienaymé
Irénée-Jules Bienaymé , was a French statistician. He built on the legacy of Laplace generalizing his least squares method. He contributed to the fields and probability, and statistics and to their application to finance, demography and social sciences...

. It can be stated quite generally using measure theory; the statement in the language of probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

then follows as a particular case, for a space of measure 1.

The term Chebyshev’s inequality may also refer to the Markov's inequality
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...

, especially in the context of analysis.

### Measure-theoretic statement

Let (X, Σ, μ) be a measure space, and let f be an extended real
Extended real number line
In mathematics, the affinely extended real number system is obtained from the real number system R by adding two elements: +∞ and −∞ . The projective extended real number system adds a single object, ∞ and makes no distinction between "positive" or "negative" infinity...

-valued measurable function
Measurable function
In mathematics, particularly in measure theory, measurable functions are structure-preserving functions between measurable spaces; as such, they form a natural context for the theory of integration...

defined on X. Then for any real number t > 0,

More generally, if g is an extended real-valued measurable function, nonnegative and nondecreasing on the range of f, then

The previous statement then follows by defining g(t) as

and taking |f| instead of f.

### Probabilistic statement

Let X be a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

with finite expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

μ and non-zero variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

σ2. Then for any real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

,

Only the case provides useful information (when the right-hand side is greater than one, so the inequality becomes vacuous, as the probability of any event cannot be greater than one). As an example, using shows that at least half of the values lie in the interval .

Because it can be applied to completely arbitrary distributions (unknown except for mean and variance), the inequality generally gives a poor bound compared to what might be possible if something is known about the distribution involved.

For example, suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 SDs of the mean) must be more than 75%, because there is less than chance to be outside that range, by Chebyshev’s inequality. But if we additionally know that the distribution is normal, we can say that is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

As demonstrated in the example above, the theorem will typically provide rather loose bounds. However, the bounds provided by Chebyshev’s inequality cannot, in general (remaining sound for variables of arbitrary distribution), be improved upon. For example, for any k ≥ 1, the following example meets the bounds exactly.

For this distribution mean μ = 0, standard deviation σ =

Equality holds exactly for any distribution that is a linear transformation of this one. Inequality holds for any distribution that is not a linear transformation of this one.

## Variant: One-sided Chebyshev inequality

A one-tailed variant with k > 0, is

The one-sided version of the Chebyshev inequality is called Cantelli's inequality, and is due to Francesco Paolo Cantelli
Francesco Paolo Cantelli
Francesco Paolo Cantelli was an Italian mathematician. He was the founder of the Istituto Italiano degli Attuari for the applications of mathematics and probability to economics....

.

### An application: distance between the mean and the median

The one-sided variant can be used to prove the proposition that for probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s having an expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

and a median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

, the mean (i.e., the expected value) and the median can never differ from each other by more than one standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. To express this in mathematical notation, let μ, m, and σ be respectively the mean, the median, and the standard deviation. Then

(There is no need to rely on an assumption that the variance exists, i.e., is finite. Unlike the situation with the expected value, saying the variance exists is equivalent to saying the variance is finite. But this inequality is trivially true if the variance is infinite.)

#### Proof using Chebyshev's inequality

Setting k = 1 in the statement for the one-sided inequality gives:

By changing the sign of X and so of μ, we get

Thus the median is within one standard deviation of the mean.

#### Proof using Jensen's inequality

This proof uses Jensen's inequality
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...

twice. We have

The first inequality comes from (the convex version of) Jensen's inequality applied to the absolute value function, which is convex. The second comes from the fact that the median minimizes the absolute deviation
Absolute deviation
In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

function

The third inequality comes from (the concave version of) Jensen's inequality applied to the square root function, which is concave. Q.E.D.
Q.E.D.
Q.E.D. is an initialism of the Latin phrase , which translates as "which was to be demonstrated". The phrase is traditionally placed in its abbreviated form at the end of a mathematical proof or philosophical argument when what was specified in the enunciation — and in the setting-out —...

### Measure-theoretic proof

Fix t and let At be defined as At  := {x ∈ X | ƒ(x) ≥ t}, and let 1At be the indicator function of the set At. Then, it is easy to check that, for any x

since g is nondecreasing on the range of f, and therefore,

The desired inequality follows from dividing the above inequality by g(t).

### Probabilistic proof

Markov's inequality
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...

states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (X − μ)2 with a = (σk)2.

It can also be proved directly. For any event A, let IA be the indicator random variable of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then

The direct proof shows why the bounds are quite loose in typical cases: the number 1 to the left of "≥" is replaced by [(X − μ)/(kσ)]2 to the right of "≥" whenever the latter exceeds 1. In some cases it exceeds 1 by a very wide margin.

## Chebyshev's Inequality (More General)

A more general version of Chebyshev's Inequality states:
This version can be proved from Markov's inequality
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...

. Also, this version can be used to derive the more specific statement above.

• Markov's inequality
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...

• A stronger result applicable to unimodal probability distributions is the Vysochanskiï–Petunin inequality.
• Proof of the weak law of large numbers where Chebyshev's inequality is used.
• Table of mathematical symbols
Table of mathematical symbols
This is a listing of common symbols found within all branches of mathematics. Each symbol is listed in both HTML, which depends on appropriate fonts being installed, and in , as an image.-Symbols:-Variations:...

• Multidimensional Chebyshev's inequality
Multidimensional Chebyshev's inequality
In probability theory, the multidimensional Chebyshev's inequality is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified amount....