Large deviations theory - AbsoluteAstronomy.com

Probability theory

Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. Some basic ideas of the theory can be tracked back to Laplace

Pierre-Simon Laplace

Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...

and Cramér

Harald Cramér

Harald Cramér was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. He was once described by John Kingman as "one of the giants of statistical theory".-Early life:Harald Cramér was born in Stockholm, Sweden on September...

, although a clear unified formal definition was introduced in 1966 by Varadhan

S. R. Srinivasa Varadhan

Sathamangalam Ranga Iyengar Srinivasa Varadhan FRS is an Indian-American mathematician from Madras , Tamil Nadu, India.-Biography:...

. Large deviations theory formalizes the heuristic ideas of concentration of measures and widely generalizes the notion of convergence of probability measures.

Roughly speaking, large deviations theory concerns itself with the exponential decay of the probability measures of certain kinds of extreme or tail events, as the number of observations grows arbitrarily large.

An elementary example

Consider a sequence of independent tosses of a fair
coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by

, where we encode head as 1 and tail as 0. Now let

denote the mean value after

trials, namely

Then

lies between 0 and 1. From the law of large numbers

Law of large numbers

In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...

(and also from our experience) we know that as N grows, the distribution of

converges to

almost surely.

Moreover, by the central limit theorem

Central limit theorem

In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, we know that

is approximately normally distributed for large

. The central limit theorem can provide more detailed information about the behavior of

than the law of large numbers. For example, we can approximately find a tail probability of

, that

is greater than

, for a fixed value of

. However, the approximation by the CLT may not be accurate if

is far from

. Also, it does not provide information about the convergence of the tail probabilities as

. However, the large deviation theory can provide answers for such problems.

Let us make this statement more precise. For a given value

, let us compute the tail probability

. Define

(Note that the function

is a convex increasing function.) Then by Chernoff's inequality, it can be shown that

. This bound is rather sharp, in the sense that

cannot be replaced with a larger number which would yield a strict inequality for all positive

. (However, the exponential bound can still be reduced by a subexponential factor on the order of

.) Hence, we obtain the following result:

The probability

decays exponentially as

grows to infinity, at a rate depending on x. This formula approximates any tail probability of the sample mean of i.i.d. and gives its convergence as the number of samples increases.

Large deviations for sums of independent random variables

In the above example of coin-tossing, we explicitly assumed that each toss is an
independent trial. And for each toss, the probability of getting head or tail is always the
same. This makes the random numbers

independent and identically distributed (i.i.d.). For i.i.d. variables
whose common distribution satisfies a certain growth condition, large deviation theory states that the following limit exists:

The function

is called the "rate function

Rate function

In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. It is required to have several "nice" properties which assist in the formulation of the large deviation principle...

" or "Cramér function" or sometimes the "entropy function". Roughly speaking, the existence of this limit is what establishes the above mentioned exponential decay and allows us to conclude that for large

takes the form:

which is the basic result of large deviations theory in this setting. Note that the inequality given in the first paragraph, as opposed to the asymptotic formula presented here, need not remain valid in more general settings.

In the i.i.d. setting,
if we know the probability distribution of

, an explicit
expression for the rate function can be obtained. This is given by a
Legendre–Fenchel transformation,

where the function

is called the cumulant generating function (CGF), given by

Here

denotes expectation value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

with respect to the probability
distribution function of

and

is any one of the

s. If

follows a normal distribution,
the rate function becomes a parabola with its apex at the mean of the normal
distribution.

If the condition of independent identical distribution is relaxed, particularly
if the numbers

are not independent but nevertheless
satisfy the Markov property

Markov property

In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It was named after the Russian mathematician Andrey Markov....

, the basic large deviations result stated above can be generalized.

Formal definition

Given a Polish space

Polish space

In the mathematical discipline of general topology, a Polish space is a separable completely metrizable topological space; that is, a space homeomorphic to a complete metric space that has a countable dense subset. Polish spaces are so named because they were first extensively studied by Polish...

let

be a sequence of Borel

Borel algebra

In mathematics, a Borel set is any set in a topological space that can be formed from open sets through the operations of countable union, countable intersection, and relative complement...

probability measures on

, let

be a sequence of positive real numbers such that

, and finally let

be a lower semicontinuous functional on

. The sequence

is said to satisfy a large deviation principle with speed

and rate

if, and only if, for each Borel measurable set

where

and

denote respectively the closure

Closure (topology)

In mathematics, the closure of a subset S in a topological space consists of all points in S plus the limit points of S. Intuitively, these are all the points that are "near" S. A point which is in the closure of S is a point of closure of S...

and interior

Interior (topology)

In mathematics, specifically in topology, the interior of a set S of points of a topological space consists of all points of S that do not belong to the boundary of S. A point that is in the interior of S is an interior point of S....

Brief history

The first rigorous results concerning large deviations are due to the Swedish mathematician
Harald Cramér

Harald Cramér

, who applied them to model the insurance business. From the point
of view of an insurance company, the earning is at a constant rate per month
(the monthly premium) but the claims come randomly. For the company to be successful
over a certain period of time (preferably many months), the total earning should
exceed the total claim. Thus to estimate the premium you have to ask the following
question : "What should we choose as the premium

such that over

months the total claim

should
be less than

? " This is clearly the same question asked by
the large deviations theory. Cramér gave a solution to this question for i.i.d. random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s, where the rate function is expressed as a power series.
The results we have quoted above were later obtained by Chernoff

Herman Chernoff

Herman Chernoff is an American applied mathematician, statistician and physicist formerly a professor at MIT and currently working at Harvard University.-Education:* Ph.D., Applied Mathematics, 1948. Brown University....

, among other people. A very
incomplete list of mathematicians who have made important advances would
include S.R.S. Varadhan (who has won the Abel prize), D. Ruelle and O.E. Lanford

Oscar Lanford

Oscar Eramus Lanford III is an American mathematician working on mathematical physics and dynamical systems theory. He was awarded his undergraduate degree from Wesleyan University and the Ph.D. from Princeton University in 1966 under the supervision of Arthur Wightman...

Applications

Principles of large deviations may be effectively applied to gather information out of a probabilistic model. Thus, theory of large deviations finds its applications in information theory

Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

and risk management

Risk management

Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

. In Physics, the best known application of large deviations theory arise in Thermodynamics

Thermodynamics

Thermodynamics is a physical science that studies the effects on material bodies, and on radiation in regions of space, of transfer of heat and of work done on or by the bodies or radiation...

and Statistical Mechanics

Statistical mechanics

Statistical mechanics or statistical thermodynamicsThe terms statistical mechanics and statistical thermodynamics are used interchangeably...

(in connection with relating entropy

Entropy

Entropy is a thermodynamic property that can be used to determine the energy available for useful work in a thermodynamic process, such as in energy conversion devices, engines, or machines. Such devices can only be driven by convertible energy, and have a theoretical maximum efficiency when...

with rate function).

Large deviations and entropy

The rate function is related to the entropy

Entropy

in statistical mechanics. This can be heuristically seen
in the following way. In statistical mechanics the entropy of a particular macro-state is related
to the number of micro-states which corresponds to this macro-state. In our coin tossing example the
mean value

could designate a particular macro-state. And the particular sequence of
heads and tails which gives rise to a particular value of

constitutes a particular
micro-state. Loosely speaking a macro-state having a higher number of micro-states giving rise to it,
has higher entropy. And a state with higher entropy has a higher chance of being realised in actual
experiments. The macro-state with mean value of zero (as many heads as tails) has the highest number micro-states giving rise to it and it is indeed the state with the highest entropy. And in most practical situation
we shall indeed obtain this macro-state for large number of trials. The "rate function" on the other
hand measures the probability of appearance of a particular macro-state. The smaller the rate function
the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value equal to zero is zero. In this way one can see the "rate function" as the negative of the "entropy".

External links

An elementary introduction to the Large Deviations Theory

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.