In
probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single...
, the normal (or Gaussian) distribution is a continuous probability distribution that has a bellshaped
probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
, known as the
Gaussian function or informally the bell curve:
[The designation "bell curve" is ambiguous: there are many other distributions which are "bell"shaped: the Cauchy distribution]The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
, Student's tdistribution, generalized normal, logistic, etc.

where parameter μ is the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
or expectationIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
(location of the peak) and σ^{ 2} is the varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
, the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
. The distribution with and is called the standard normal. A normal distribution is often used as a first approximation to describe realvalued random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s that cluster around a single meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
value.
The normal distribution is considered the most prominent probability distribution in statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
. There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.
For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural scienceThe natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...
s, and social sciences as a simple model for complex phenomena. For example, the observational errorObservational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.Science and experiments:...
in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...
is computed using this assumption. Note that a normallydistributed variable has a symmetric distribution about its mean. Quantities that grow exponentiallyExponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value...
, such as prices, incomes or populations, are often skewed to the rightIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a realvalued random variable. The skewness value can be positive or negative, or even undefined...
, and hence may be better described by other distributions, such as the lognormal distribution or Pareto distribution. In addition, the probability of seeing a normallydistributed value that is far (i.e. more than a few standard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
s) from the mean drops off extremely rapidly. As a result, statistical inferenceIn statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavytailed distribution such as the Student's tdistribution.
From a technical perspective, alternative characterizations are possible, for example:
 The normal distribution is the only absolutely continuous
In mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...
distribution all of whose cumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
s beyond the first two (i.e. other than the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
and varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
) are zero.
 For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropy
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....
.
Definition
The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

The factor $\backslash scriptstyle\backslash \; 1/\backslash sqrt\{2\backslash pi\}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,^{[proofGaussian integralThe Gaussian integral, also known as the EulerPoisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...]} and in the exponent makes the "width" of the curve (measured as half the distance between the inflection pointIn differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...
s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phiPhi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...
), whereas density functionsIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic functionA quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the yaxis....
(just as an exponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
results from exponentiating a linear function):

This yields the classic "bell curve" shape, provided that so that the quadratic function is concaveIn mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.Definition:...
. everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the xaxis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).
Rather than using a, b, and c, it is far more common to describe a normal distribution by its meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
and varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,

For a standard normal distribution, and . The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.
The parameter μ is at the same time the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
, the medianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
and the modeIn statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
of the normal distribution. The parameter σ^{2} is called the variance; as for any random variable, it describes how concentrated the distribution is around its meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
. The square root of σ^{2} is called the standard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
and is the width of the density function.
The normal distribution is usually denoted by N(μ, σ^{2}). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeXLaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...
). Thus when a random variable X is distributed normally with mean μ and variance σ^{2}, we write

Alternative formulations
Some authors advocate using the precisionIn statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....
instead of the variance, and variously define it as or . This parametrization has an advantage in numerical applications where σ^{2} is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.
The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance :

goes even further and insists the standard normal to be with the variance :

According to the author, this formulation is advantageous because of a much simpler and easiertoremember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal with mean μ and precision τ will be equal to

Characterization
In the previous section the normal distribution was defined by specifying its probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
. However there are other ways to characterizeIn mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in...
a probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
. They include: the cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
, the momentsIn mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
, the cumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
s, the characteristic functionIn probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...
, the momentgenerating functionIn probability theory and statistics, the momentgenerating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...
, etc.
Probability density function
The probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
(pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:

This is a proper function only when the variance σ^{2} is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".
Properties:
 Function f(x) is unimodal and symmetric around the point , which is at the same time the mode
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
, the medianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
and the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
of the distribution.
 The inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...
s of the curve occur one standard deviation away from the mean (i.e., at and ).
 Function f(x) is logconcave.
 The standard normal density ϕ(x) is an eigenfunction
In mathematics, an eigenfunction of a linear operator, A, defined on some function space is any nonzero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has...
of the Fourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...
.
 The function is supersmooth of order 2, implying that it is infinitely differentiable.
 The first derivative of ϕ(x) is ; the second derivative is . More generally, the nth derivative is given by , where H_{n} is the Hermite polynomial of order n.
When , the density function doesn't exist. However a generalized functionIn mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges...
that defines a measure on the real line, and it can be used to calculate, for example, expected value is

where δ(x) is the Dirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...
which is equal to infinity at and is zero elsewhere.
Cumulative distribution function
The cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(CDF) describes probability of a random variable falling in the interval .
The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phiPhi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...
), and can be computed as an integral of the probability density function:

This integral cannot be expressed in terms of elementary functions, so is simply called the error functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...
, or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ^{2} > 0 the CDF will be equal to

The complement of the standard normal CDF, , is referred to as the QfunctionIn statistics, the Qfunction is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x...
, especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Qfunction, all of which are simple transformations of Φ, are also used occasionally.
Properties:
 The standard normal CDF is 2fold rotationally symmetric around point (0, ½): .
 The derivative of Φ(x) is equal to the standard normal pdf ϕ(x): .
 The antiderivative
In calculus, an "antiderivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f...
of Φ(x) is: .
For a normal distribution with zero variance, the CDF is the Heaviside step functionThe Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument....
(with convention):

Quantile function
The inverse of the standard normal CDF, called the quantile functionIn probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability...
or probit function, is expressed in terms of the inverse error functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...
:

QuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...
s of the standard normal distribution are commonly denoted as z_{p}. The quantile z_{p} represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the interval. The quantiles are used in hypothesis testing, construction of confidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s and QQ plotIn statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...
s. The most "famous" normal quantile is . A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.
For a normal random variable with mean μ and variance σ^{2}, the quantile function is

Characteristic function and moment generating function
The characteristic functionIn probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...
φ_{X}(t) of a random variable X is defined as the expected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of e^{itX}, where i is the imaginary unitIn mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek...
, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...
of the density ϕ(x). For a normally distributed X with mean μ and variance σ^{2}, the characteristic function is

The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) e^{iμz − σ2z2} for all z ∈ C.
The moment generating function is defined as the expected value of e^{tX}. For a normal distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
s are nonzero.
Moments
The normal distribution has momentsIn mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
of all orders. That is, for a normally distributed X with mean μ and variance , the expectation ] exists and is finite for all p such that . Usually we are interested only in moments of integer orders: .
 Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of . Using standardization of normal random variables, this expectation will be equal to , where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every odd number from n to 1.
 Central absolute moments are the moments of X − μ. They coincide with regular moments for all even orders, but are nonzero for all odd ps.

The last formula is true for any noninteger .
 Raw moments and raw absolute moments are the moments of X and X respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric function
In mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...
s _{1}F_{1} and U.

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.
 First two cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
s are equal to μ and σ^{ 2} respectively, whereas all higherorder cumulants are equal to zero.
Order  Raw moment  Central moment  Cumulant 
1 
μ 
0 
μ 
2 
μ^{2} + σ^{2} 
σ^{ 2} 
σ^{ 2} 
3 
μ^{3} + 3μσ^{2} 
0 
0 
4 
μ^{4} + 6μ^{2}σ^{2} + 3σ^{4} 
3σ^{ 4} 
0 
5 
μ^{5} + 10μ^{3}σ^{2} + 15μσ^{4} 
0 
0 
6 
μ^{6} + 15μ^{4}σ^{2} + 45μ^{2}σ^{4} + 15σ^{6} 
15σ^{ 6} 
0 
7 
μ^{7} + 21μ^{5}σ^{2} + 105μ^{3}σ^{4} + 105μσ^{6} 
0 
0 
8 
μ^{8} + 28μ^{6}σ^{2} + 210μ^{4}σ^{4} + 420μ^{2}σ^{6} + 105σ^{8} 
105σ^{ 8} 
0 
Standardizing normal random variables
As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ^{2}, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ^{2}:

This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via

Standard deviation and confidence intervals
About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 689599.7 ruleIn statistics, the 689599.7 rule, or threesigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean....
, or the empirical rule, or the 3sigma rule.
To be more precise, the area under the bell curve between and is given by

where erf is the error functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...
. To 12 decimal places, the values for the 1, 2, up to 6sigma points are:
  i.e. 1 minus ...  or 1 in ... 
1 



2 



3 



4 



5 



6 



The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s of the specified levels based on normally distributed (or asymptotically normal) estimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
s:
 n    n 
0.80 


0.999 

0.90 


0.9999 

0.95 


0.99999 

0.98 


0.999999 

0.99 


0.9999999 

0.995 


0.99999999 

0.998 


0.999999999 

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.
Central limit theorem
The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x_{1}, …, x_{n}) is a sequence of iid random variables, each having mean μ and variance σ^{2}, then the central limit theorem states that

The theorem will hold even if the summands x_{i} are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...
s, scoreIn statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...
s, and estimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
 The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
 The Poisson
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
(λ) distribution is approximately normal N(λ, λ) for large values of λ.
 The chisquared distribution χ^{2}(k) is approximately normal N(k, 2k) for large ks.
 The Student's tdistribution t(ν) is approximately normal N(0, 1) when ν is large.
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theoremThe central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...
, improvements of the approximation are given by the Edgeworth expansions.
Miscellaneous
 The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ^{2}, then a linear transform (for some real numbers a and b) is also normally distributed:

Also if X_{1}, X_{2} are two independent normal random variables, with means μ_{1}, μ_{2} and standard deviations σ_{1}, σ_{2}, then their linear combination will also be normally distributed: ^{[proofSum of normally distributed random variablesIn probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.Independent random variables:If X...]}

 The converse of (1) is also true: if X_{1} and X_{2} are independent and their sum is distributed normally, then both X_{1} and X_{2} must also be normal. This is known as Cramér's decomposition theorem
In mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist. Normal random variables :...
. The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary nonnormal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.
 If the characteristic function φ_{X} of some random variable X is of the form , where Q(t) is a polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and nonnegative integer exponents...
, then the Marcinkiewicz theorem (named after Józef MarcinkiewiczJózef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....
) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of nonzero cumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
s.
 If X and Y are jointly normal and uncorrelated
In probability theory and statistics, two realvalued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...
, then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.^{[proofNormally distributed and uncorrelated does not imply independentIn probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....]} For nonnormal random variables uncorrelatedness does not imply independence.
 If X and Y are independent random variables, then and are also independent and identically distributed (this follows from the polarization identity
In mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \x\ \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...
). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that and are also independent, then both X and Y must necessarily have normal distributions.
More generally, if X_{1}, ..., X_{n} are independent random variables, then two linear combinations ∑a_{k}X_{k} and ∑b_{k}X_{k} will be independent if and only if all X_{k}s are normal and , where denotes the variance of X_{k}.
 Normal distribution is infinitely divisible
The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...
: for a normally distributed X with mean μ and variance σ^{2} we can find n independent random variables {X_{1}, …, X_{n}} each distributed normally with means μ/n and variances σ^{2}/n such that

 Normal distribution is stable (with exponent α = 2): if X_{1}, X_{2} are two independent random variables and a, b are arbitrary real numbers, then

where X_{3} is also . This relationship directly follows from property (1).
 The Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a nonsymmetric measure of the difference between two probability distributions P and Q...
between two normal distributions and is given by:

The Hellinger distanceIn probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of fdivergence...
between the same distributions is equal to

 The Fisher information matrix for normal distribution is diagonal and takes form

 Normal distributions belongs to an exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
with natural parameters and , and natural statistics x and x^{2}. The dual, expectation parameters for normal distribution are and .
 The conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...
of the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
of a normal distribution is another normal distribution. Specifically, if x_{1}, …, x_{n} are iid and the prior is , then the posterior distribution for the estimator of μ will be

 Of all probability distributions over the reals with mean μ and variance σ^{2}, the normal distribution is the one with the maximum entropy
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....
.
 The family of normal distributions forms a manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....
with constant curvatureIn mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points...
−1. The same family is flatIn mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°....
with respect to the (±1)connections ∇^{(e)} and ∇^{(m)}.
Operations on a single random variable
If X is distributed normally with mean μ and variance σ^{2}, then
 The exponential of X is distributed lognormally: .
 The absolute value of X has folded normal distribution
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = X has a folded normal distribution. Such a case may be encountered if only the magnitude of some...
: . If this is known as the halfnormal distributionThe halfnormal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e...
.
 The square of X/σ has the noncentral chisquared distribution with one degree of freedom: . If μ = 0, the distribution is called simply chisquared.
 The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...
.
 (X − μ)^{−2} has a Lévy distribution with location 0 and scale σ^{−2}.
Combination of two independent random variables
If X_{1} and X_{2} are two independent standard normal random variables, then
 Their sum and difference is distributed normally with mean zero and variance two: .
 Their product follows the "productnormal" distribution with density function where K_{0} is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic function
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...
.
 Their ratio follows the standard Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
: .
 Their Euclidean norm $\backslash scriptstyle\backslash sqrt\{X\_1^2\backslash ,+\backslash ,X\_2^2\}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.
Combination of two or more independent random variables
 If X_{1}, X_{2}, …, X_{n} are independent standard normal random variables, then the sum of their squares has the chisquared distribution with n degrees of freedom: $\backslash scriptstyle\; X\_1^2\; +\; \backslash cdots\; +\; X\_n^2\backslash \; \backslash sim\backslash \; \backslash chi\_n^2$.
 If X_{1}, X_{2}, …, X_{n} are independent normally distributed random variables with means μ and variances σ^{2}, then their sample mean is independent from the sample standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
, which can be demonstrated using the Basu's theoremIn statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....
or Cochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :...
. The ratio of these two quantities will have the Student's tdistribution with n − 1 degrees of freedom:

 If X_{1}, …, X_{n}, Y_{1}, …, Y_{m} are independent standard normal random variables, then the ratio of their normalized sums of squares will have the with (n, m) degrees of freedom:

Operations on the density function
The split normal distributionIn probability theory and statistics, the split normal distribution also known as the twopiece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances...
is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distributionIn probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...
results from rescaling a section of a single density function.
Extensions
The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is onedimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
 Multivariate normal distribution describes the Gaussian law in the kdimensional Euclidean space
In mathematics, Euclidean space is the Euclidean plane and threedimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...
. A vector is multivariatenormally distributed if any linear combination of its components has a (univariate) normal distribution. The variance of X is a k×k symmetric positivedefinite matrix V.
 Rectified Gaussian distribution
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0...
a rectified version of normal distribution with all the negative elements reset to 0
 Complex normal distribution
In probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C...
deals with the complex normal vectors. A complex vector is said to be normal if both its real and imaginary components jointly possess a 2kdimensional multivariate normal distribution. The variancecovariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
 Matrix normal distribution
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrixvalued random variables. Definition :...
describes the case of normally distributed matrices.
 Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...
es are the normally distributed stochastic processIn probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
es. These can be viewed as elements of some infinitedimensional Hilbert spaceThe mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the twodimensional Euclidean plane and threedimensional space to spaces with any finite or infinite number of dimensions...
H, and thus are the analogues of multivariate normal vectors for the case . A random element is said to be normal if for any constant the scalar product has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance . Several Gaussian processes became popular enough to have their own names:
 Brownian motion
In mathematics, the Wiener process is a continuoustime stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...
,
 Brownian bridge
A Brownian bridge is a continuoustime stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...
,
 Ornstein–Uhlenbeck process.
 Gaussian qdistribution
In mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...
is an abstract mathematical construction which represents a "qanalogue" of the normal distribution.
 the qGaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropy
In physics, the Tsallis entropy is a generalization of the standard BoltzmannGibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated...
, and is one type of Tsallis distributionIn qanalog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual...
. Note that this distribution is different from the Gaussian qdistributionIn mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...
above.
One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
 Pearson distribution
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History :...
— a fourparametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.
Normality tests
Normality tests assess the likelihood that the given data set {x_{1}, …, x_{n}} comes from a normal distribution. Typically the null hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
H_{0} is that the observations are distributed normally with unspecified mean μ and variance σ^{2}, versus the alternative H_{a} that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
 "Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
 QQ plot
In statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...
— is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ^{−1}(p_{k}), x_{(k)}), where plotting points p_{k} are equal to p_{k} = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
 PP plot — similar to the QQ plot, but used much less frequently. This method consists of plotting the points (Φ(z_{(k)}), p_{k}), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
 Wilk–Shapiro test employs the fact that the line in the QQ plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
 Normal probability plot
The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed....
(rankitIn statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.Example:This is perhaps most...
plot)
 Moment tests:
 D'Agostino's Ksquared test
In statistics, D’Agostino’s K2 test is a goodnessoffit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...
 Jarque–Bera test
In statistics, the Jarque–Bera test is a goodnessoffit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...
 Empirical distribution function tests:
 Lilliefors test
In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test...
(an adaptation of the Kolmogorov–Smirnov test)
 Anderson–Darling test
Estimation of parameters
It is often the case that we don't know the parameters of the normal distribution, but instead want to estimateEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
them. That is, having a sample (x_{1}, …, x_{n}) from a normal population we would like to learn the approximate values of parameters μ and σ^{2}. The standard approach to this problem is the maximum likelihoodIn statistics, maximumlikelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximumlikelihood estimation provides estimates for the model's parameters....
method, which requires maximization of the loglikelihood function:

Taking derivatives with respect to μ and σ^{2} and solving the resulting system of first order conditions yields the maximum likelihood estimates:

Estimator $\backslash scriptstyle\backslash hat\backslash mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\backslash scriptstyle\backslash overline\{x\}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theoremIn statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...
, $\backslash scriptstyle\backslash hat\backslash mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. This implies that the estimator is finitesample efficient. Of practical importance is the fact that the standard errorThe standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....
of $\backslash scriptstyle\backslash hat\backslash mu$ is proportional to $\backslash scriptstyle1/\backslash sqrt\{n\}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion pollAn opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...
s and the number of trials in Monte Carlo simulations.
From the standpoint of the asymptotic theoryIn statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...
, $\backslash scriptstyle\backslash hat\backslash mu$ is consistentIn statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...
, that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is called the sample variance, since it is the variance of the sample (x_{1}, …, x_{n}). In practice, another estimator is often used instead of the $\backslash scriptstyle\backslash hat\backslash sigma^2$. This other estimator is denoted s^{2}, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s^{2} differs from $\backslash scriptstyle\backslash hat\backslash sigma^2$ by having instead of n in the denominator (the so called Bessel's correctionIn statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...
):

The difference between s^{2} and $\backslash scriptstyle\backslash hat\backslash sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s^{2} is that it is an unbiased estimator of the underlying parameter σ^{2}, whereas $\backslash scriptstyle\backslash hat\backslash sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s^{2} is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is "better" than the s^{2} in terms of the mean squared errorIn statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...
(MSE) criterion. In finite samples both s^{2} and $\backslash scriptstyle\backslash hat\backslash sigma^2$ have scaled chisquared distribution with degrees of freedom:

The first of these expressions shows that the variance of s^{2} is equal to , which is slightly greater than the σσelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. Thus, s^{2} is not an efficient estimator for σ^{2}, and moreover, since s^{2} is UMVU, we can conclude that the finitesample efficient estimator for σ^{2} does not exist.
Applying the asymptotic theory, both estimators s^{2} and $\backslash scriptstyle\backslash hat\backslash sigma^2$ are consistent, that is they converge in probability to σ^{2} as the sample size . The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ^{2}.
By Cochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :...
, for normal distribution the sample mean $\backslash scriptstyle\backslash hat\backslash mu$ and the sample variance s^{2} are independent, which means there can be no gain in considering their joint distributionIn the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...
. There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\backslash scriptstyle\backslash hat\backslash mu$ and s can be employed to construct the socalled tstatistic:

This quantity t has the Student's tdistribution with degrees of freedom, and it is an ancillary statisticIn statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...
(independent of the value of the parameters). Inverting the distribution of this tstatistics will allow us to construct the confidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
for μ; similarly, inverting the χ^{2} distribution of the statistic s^{2} will give us the confidence interval for σ^{2}:

where t_{k,p} and are the p^{th} quantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...
s of the t and χ^{2}distributions respectively. These confidence intervals are of the level , meaning that the true values μ and σ^{2} fall outside of these intervals with probability α. In practice people usually take , resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\backslash scriptstyle\backslash hat\backslash mu$ and s^{2}. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles z_{α/2 do not depend on n. In particular, the most popular value of , results in .
Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three categories:
Exactly normal distributions;
Approximately normal laws, for example when such approximation is justified by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...; and
Distributions modeled as normal — the normal distribution being the distribution with maximum entropyPrinciple of maximum entropyIn Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution... for a given mean and variance.
Exact normality
Certain quantities in physicsPhysicsPhysics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic... are distributed normally, as was first demonstrated by James Clerk MaxwellJames Clerk MaxwellJames Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory.... Examples of such quantities are:
Velocities of the molecules in the ideal gasIdeal gasAn ideal gas is a theoretical gas composed of a set of randomlymoving, noninteracting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as.... More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
Probability density function of a ground state in a quantum harmonic oscillatorQuantum harmonic oscillatorThe quantum harmonic oscillator is the quantummechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics....
The position of a particle which experiences diffusionDiffusionMolecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles.... If initially the particle is located at a specific point (that is its probability distribution is the dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation . If the initial location is given by a certain density function g(x), then the density at time t is the convolutionConvolutionIn mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to crosscorrelation... of g and the normal PDF.
Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
In counting problems, where the central limit theorem includes a discretetocontinuum approximation and where infinitely divisibleInfinite divisibilityThe concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory... and decomposableIndecomposable distributionIn probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more nonconstant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:... distributions are involved, such as
Binomial random variables, associated with binary response variables;
Poisson random variablesPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since..., associated with rare events;
Thermal light has a Bose–EinsteinBose–Einstein statisticsIn statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.Concept:... distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.
Assumed normality
There are statistical methods to empirically test that assumption, see the above Normality tests section.
In biologyBiologyBiology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines..., the logarithm of various variables tend to have a normal distribution, that is, they tend to have a lognormal distribution (after separation on male/female subpopulations), with examples including:
Measures of size of living tissue (length, height, skin area, weight);
The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
Certain physiological measurements, such as blood pressure of adult humans.
In financeFinance"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created..., in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît MandelbrotBenoît MandelbrotBenoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child... have argued that logLevy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crashStock market crashA stock market crash is a sudden dramatic decline of stock prices across a significant crosssection of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...es.
Measurement errorsPropagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them... in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.
In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ testIntelligence quotientAn intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SATSATThe SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...'s traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
Many scores are derived from the normal distribution, including percentile rankPercentile rankThe percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....s ("percentiles" or "quantiles"), normal curve equivalentNormal curve equivalentIn educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...s, stanineStanineStanine is a method of scaling test scores on a ninepoint standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...s, zscoresStandard scoreIn statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation..., and Tscores. Additionally, a number of behavioral statisticalStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments.... procedures are based on the assumption that scores are normally distributed; for example, ttestsStudent's ttestA ttest is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known... and ANOVAsAnalysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation.... Bell curve gradingBell curve gradingIn education, grading on a curve is a statistical method of assigning grades designed to yield a predetermined distribution of grades among the students in a class... assigns relative grades based on a normal distribution of scores.
In hydrologyHydrologyHydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability... the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysisCumulative frequency analysisCumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the nonexceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent....
Generating values from normal distribution
In computer simulations, especially in applications of the MonteCarlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a can be generated as , where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniformUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... random variates.
The most straightforward method is based on the probability integral transformProbability integral transformIn statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution... property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in and in the erfError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations... article.
An easy to program approximate approach, that relies on the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common..., is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12section eleventhorder polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).
The Box–Muller method uses two independent random numbers U and V distributed uniformlyUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... on (0,1). Then the two random variables X and Y
will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm will have the chisquared distribution with two degrees of freedom, which is an easily generated exponentialExponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e... random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.
Marsaglia polar methodMarsaglia polar methodThe polar method is a pseudorandom number sampling method for generating a pair of independent standard normal random variables... is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities
are returned. Again, X and Y will be independent and standard normally distributed.
The Ratio method is a rejection method. The algorithm proceeds as follows:
Generate two independent uniform deviates U and V;
Compute X = (V − 0.5)/U;
If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.
The ziggurat algorithmZiggurat algorithmThe ziggurat algorithm is an algorithm for pseudorandom number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformlydistributed random numbers, typically from a pseudorandom number generator, as well as precomputed tables. The... is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an iftest. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.
There is also some investigation into the connection between the fast Hadamard transformHadamard transformThe Hadamard transform is an example of a generalized class of Fourier transforms... and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.
Numerical approximations for the normal CDF
The standard normal CDFCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"... is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integrationNumerical integrationIn numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of..., Taylor seriesTaylor seriesIn mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point...., asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.
give the approximation for Φ(x) for x > 0 with the absolute error ε(x) < 7.5·10−8 (algorithm 26.2.17):
where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
lists almost a hundred of rational functionRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:... approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by combines Hart's algorithm 5666 with a continued fractionContinued fractionIn mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on... approximation in the tail to provide a fast computation algorithm with a 16digit precision.
after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev ApproximationRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:.... (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).
suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion
for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when ).
The GNU Scientific LibraryGNU Scientific LibraryIn computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science... calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.
Development
Some authors attribute the credit for the discovery of the normal distribution to de MoivreAbraham de MoivreAbraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling..., who in 1738
De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example .
published in the second edition of his "The Doctrine of ChancesThe Doctrine of ChancesThe Doctrine of Chances was the first textbook on probability theory, written by 18thcentury French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots..." the study of the coefficients in the binomial expansion of . De Moivre proved that the middle term in this expansion has the approximate magnitude of $\backslash scriptstyle\; 2/\backslash sqrt\{2\backslash pi\; n\}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\backslash scriptstyle\; \backslash frac\{2\backslash ell\backslash ell\}\{n\}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, StiglerStephen StiglerStephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics... points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.
In 1809 GaussCarl Friedrich GaussJohann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum... published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, , to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the wellknown answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:
$$
\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{\mathrm{hh}\Delta\Delta},
where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the nonlinear weighted least squares (NWLS) method.
Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral
{{Aboutthe univariate normal distributionnormally distributed vectorsMultivariate normal distribution}}
{{Probability distribution
 name =
 type = density
 pdf_image = The red line is the standard normal distribution
 cdf_image = Colors match the image above
 notation =
 parameters = {{nowrapμ ∈ R}} — mean (locationLocation parameterIn statistics, a location family is a class of probability distributions that is parametrized by a scalar or vectorvalued parameter μ, which determines the "location" or shift of the distribution...){{nowrapσ2 > 0}} — variance (squared scaleScale parameterIn probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...)
 support = x ∈ R
 pdf =
 cdf =
 mean = μ
 median = μ
 mode = μ
 variance = σ2
 skewness = 0
 kurtosis = 0
 entropy =
 mgf =
 char =
 fisher =
 conjugate prior = Normal distribution
}}
In probability theoryProbability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single..., the normal (or Gaussian) distribution is a continuous probability distribution that has a bellshaped probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the..., known as the Gaussian function or informally the bell curve:The designation "bell curve" is ambiguous: there are many other distributions which are "bell"shaped: the Cauchy distributionCauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner..., Student's tdistribution, generalized normal, logistic, etc.
where parameter μ is the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... or expectationExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on... (location of the peak) and σ 2 is the varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution..., the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average.... The distribution with {{nowrapμ {{=}} 0}} and {{nowrapσ 2 {{=}} 1}} is called the standard normal. A normal distribution is often used as a first approximation to describe realvalued random variableRandom variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...s that cluster around a single meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... value.
The normal distribution is considered the most prominent probability distribution in statisticsStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments..... There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common..., which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.
For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural scienceNatural scienceThe natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...s, and social sciences as a simple model for complex phenomena. For example, the observational errorObservational errorObservational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.Science and experiments:... in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertaintyPropagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them... is computed using this assumption. Note that a normallydistributed variable has a symmetric distribution about its mean. Quantities that grow exponentiallyExponential growthExponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value..., such as prices, incomes or populations, are often skewed to the rightSkewnessIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a realvalued random variable. The skewness value can be positive or negative, or even undefined..., and hence may be better described by other distributions, such as the lognormal distribution or Pareto distribution. In addition, the probability of seeing a normallydistributed value that is far (i.e. more than a few standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...s) from the mean drops off extremely rapidly. As a result, statistical inferenceStatistical inferenceIn statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation... using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavytailed distribution such as the Student's tdistribution.
From a technical perspective, alternative characterizations are possible, for example:
The normal distribution is the only absolutely continuousAbsolute continuityIn mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity... distribution all of whose cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s beyond the first two (i.e. other than the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... and varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...) are zero.
For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropyMaximum entropy probability distributionIn statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions.....
Definition
The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
The factor $\backslash scriptstyle\backslash \; 1/\backslash sqrt\{2\backslash pi\}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proofGaussian integralThe Gaussian integral, also known as the EulerPoisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...] and {{frac212}} in the exponent makes the "width" of the curve (measured as half the distance between the inflection pointInflection pointIn differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phiPhi (letter)Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...), whereas density functionsProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the... for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic functionQuadratic functionA quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the yaxis.... (just as an exponential distributionExponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e... results from exponentiating a linear function):
This yields the classic "bell curve" shape, provided that {{nowrapa < 0}} so that the quadratic function is concaveConcave functionIn mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.Definition:.... {{nowrapf(x) > 0}} everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the xaxis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).
Rather than using a, b, and c, it is far more common to describe a normal distribution by its meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... {{nowrapμ {{=}} − {{frac2b2a}}}} and varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution... {{nowrapσ2 {{=}} − {{frac212a}}}}. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,
For a standard normal distribution, {{nowrap1=μ = 0}} and {{nowrap1=σ2 = 1}}. The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.
The parameter μ is at the same time the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean...., the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to... and the modeMode (statistics)In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.... of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean..... The square root of σ2 is called the standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average... and is the width of the density function.
The normal distribution is usually denoted by N(μ, σ2). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeXLaTeXLaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write
Alternative formulations
Some authors advocate using the precisionPrecision (statistics)In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall.... instead of the variance, and variously define it as {{nowrapτ {{=}} σ−2}} or {{nowrapτ {{=}} σ−1}}. This parametrization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.
The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance {{nowrapσ2 {{=}} {{frac212}}}} :
{{harvtxtStigler1982}} goes even further and insists the standard normal to be with the variance {{nowrapσ2 {{=}} {{frac212π}}}} :
According to the author, this formulation is advantageous because of a much simpler and easiertoremember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal {{nowrapN(μ, τ)}} with mean μ and precision τ will be equal to
Characterization
In the previous section the normal distribution was defined by specifying its probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the.... However there are other ways to characterizeCharacterization (mathematics)In mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in... a probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values..... They include: the cumulative distribution functionCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"..., the momentsMoment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by..., the cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s, the characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative..., the momentgenerating functionMomentgenerating functionIn probability theory and statistics, the momentgenerating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or..., etc.
Probability density function
The probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the... (pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:
This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".
Properties:
Function f(x) is unimodal and symmetric around the point {{nowrapx {{=}} μ}}, which is at the same time the modeMode (statistics)In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score...., the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to... and the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... of the distribution.
The inflection pointInflection pointIn differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...s of the curve occur one standard deviation away from the mean (i.e., at {{nowrapx {{=}} μ − σ}} and {{nowrapx {{=}} μ + σ}}).
Function f(x) is logconcave.
The standard normal density ϕ(x) is an eigenfunctionEigenfunctionIn mathematics, an eigenfunction of a linear operator, A, defined on some function space is any nonzero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has... of the Fourier transformFourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions....
The function is supersmooth of order 2, implying that it is infinitely differentiable.
The first derivative of ϕ(x) is {{nowrapϕ′(x) {{=}} −x·ϕ(x)}}; the second derivative is {{nowrapϕ′′(x) {{=}} (x2 − 1)ϕ(x)}}. More generally, the nth derivative is given by {{nowrapϕ(n)(x) {{=}} (−1)nHn(x)ϕ(x)}}, where Hn is the Hermite polynomial of order n.
When {{nowrapσ2 {{=}} 0}}, the density function doesn't exist. However a generalized functionGeneralized functionIn mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges... that defines a measure on the real line, and it can be used to calculate, for example, expected value is
where δ(x) is the Dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical... which is equal to infinity at {{nowrapx {{=}} 0}} and is zero elsewhere.
Cumulative distribution function
The cumulative distribution functionCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"... (CDF) describes probability of a random variable falling in the interval {{nowrap(−∞, x]}}.
The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phiPhi (letter)Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...), and can be computed as an integral of the probability density function:
This integral cannot be expressed in terms of elementary functions, so is simply called the error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations..., or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the CDF will be equal to
The complement of the standard normal CDF, {{nowrapQ(x) {{=}} 1 − Φ(x)}}, is referred to as the QfunctionQfunctionIn statistics, the Qfunction is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x..., especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Qfunction, all of which are simple transformations of Φ, are also used occasionally.
Properties:
The standard normal CDF is 2fold rotationally symmetric around point (0, ½): {{nowrap Φ(−x) {{=}} 1 − Φ(x)}}.
The derivative of Φ(x) is equal to the standard normal pdf ϕ(x): {{nowrap Φ′(x) {{=}} ϕ(x)}}.
The antiderivativeAntiderivativeIn calculus, an "antiderivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f... of Φ(x) is: {{nowrap1 = ∫ Φ(x) dx = x Φ(x) + ϕ(x)}}.
For a normal distribution with zero variance, the CDF is the Heaviside step functionHeaviside step functionThe Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument.... (with {{nowrapH(0) {{=}} 1}} convention):
Quantile function
The inverse of the standard normal CDF, called the quantile functionQuantile functionIn probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability... or probit function, is expressed in terms of the inverse error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...:
QuantileQuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...s of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the {{nowrap(−∞, zp]}} interval. The quantiles are used in hypothesis testing, construction of confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...s and QQ plotQQ plotIn statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...s. The most "famous" normal quantile is {{nowrap1.961.961.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the... {{=}} z0.975}}. A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.
For a normal random variable with mean μ and variance σ2, the quantile function is
Characteristic function and moment generating function
The characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative... φX(t) of a random variable X is defined as the expected valueExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on... of eitX, where i is the imaginary unitImaginary unitIn mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek..., and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transformFourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions... of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is
The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) {{=}} eiμz − {{frac212}}σ2z2 for all z ∈ C.
The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to
The cumulant generating function is the logarithm of the moment generating function:
Since this is a quadratic polynomial in t, only the first two cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s are nonzero.
Moments
{{see alsoList of integrals of Gaussian functions}}
The normal distribution has momentsMoment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by... of all orders. That is, for a normally distributed X with mean μ and variance {{nowrapσ 2}}, the expectation {{nowrapE[{{!}}X{{!}}p}}] exists and is finite for all p such that {{nowrapRe[p] > −1}}. Usually we are interested only in moments of integer orders: {{nowrapp {{=}} 1, 2, 3, …}}.
Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of {{nowrap(X − μ) p}}. Using standardization of normal random variables, this expectation will be equal to {{nowrapσ p · E[Zp]}}, where Z is standard normal.
Here n!! denotes the double factorial, that is the product of every odd number from n to 1.
Central absolute moments are the moments of X − μ. They coincide with regular moments for all even orders, but are nonzero for all odd ps.
The last formula is true for any noninteger {{nowrapp > −1}}.
Raw moments and raw absolute moments are the moments of X and X respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric functionConfluent hypergeometric functionIn mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...s 1F1 and U.{{Citation neededdate=June 2010}}
These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.
First two cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s are equal to μ and σ 2 respectively, whereas all higherorder cumulants are equal to zero.
Order Raw moment Central moment Cumulant
1
μ
0
μ
2
μ2 + σ2
σ 2
σ 2
3
μ3 + 3μσ2
0
0
4
μ4 + 6μ2σ2 + 3σ4
3σ 4
0
5
μ5 + 10μ3σ2 + 15μσ4
0
0
6
μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6
15σ 6
0
7
μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6
0
0
8
μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8
105σ 8
0
Standardizing normal random variables
As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then
has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:
This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via
Standard deviation and confidence intervals
About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 689599.7 rule689599.7 ruleIn statistics, the 689599.7 rule, or threesigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean...., or the empirical rule, or the 3sigma rule.
To be more precise, the area under the bell curve between {{nowrapμ − nσ}} and {{nowrapμ + nσ}} is given by
where erf is the error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations.... To 12 decimal places, the values for the 1, 2, up to 6sigma points are:
i.e. 1 minus ... or 1 in ...
1
{{val0.682689492137}}
{{val0.317310507863}}
{{val3.15148718753}}
2
{{val0.954499736104}}
{{val0.045500263896}}
{{val21.9778945080}}
3
{{val0.997300203937}}
{{val0.002699796063}}
{{val370.398347345}}
4
{{val0.999936657516}}
{{val0.000063342484}}
{{val15787.1927673}}
5
{{val0.999999426697}}
{{val0.000000573303}}
{{val1744277.89362}}
6
{{val0.999999998027}}
{{val0.000000001973}}
{{val506797345.897}}
The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...s of the specified levels based on normally distributed (or asymptotically normal) estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....s:
n n
0.80
{{val1.281551565545}}
0.999
{{val3.290526731492}}
0.90
{{val1.644853626951}}
0.9999
{{val3.890591886413}}
0.95
{{val1.959963984540}}
0.99999
{{val4.417173413469}}
0.98
{{val2.326347874041}}
0.999999
{{val4.891638475699}}
0.99
{{val2.575829303549}}
0.9999999
{{val5.326723886384}}
0.995
{{val2.807033768344}}
0.99999999
{{val5.730728868236}}
0.998
{{val3.090232306168}}
0.999999999
{{val6.109410204869}}
where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.
Central limit theorem
{{MainCentral limit theorem}}
The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2, then the central limit theorem states that
The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test statisticTest statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...s, scoreScore (statistics)In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...s, and estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
The PoissonPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...(λ) distribution is approximately normal N(λ, λ) for large values of λ.
The chisquared distribution χ2(k) is approximately normal N(k, 2k) for large ks.
The Student's tdistribution t(ν) is approximately normal N(0, 1) when ν is large.
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theoremBerry–Esséen theoremThe central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased..., improvements of the approximation are given by the Edgeworth expansions.
Miscellaneous
The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform {{nowrapaX + b}} (for some real numbers a and b) is also normally distributed:
Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proofSum of normally distributed random variablesIn probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.Independent random variables:If X...]
The converse of (1) is also true: if X1 and X2 are independent and their sum {{nowrapX1 + X2}} is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér's decomposition theoremCramér's theoremIn mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist. Normal random variables :.... The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary nonnormal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.
If the characteristic function φX of some random variable X is of the form {{nowrapφX(t) {{=}} eQ(t)}}, where Q(t) is a polynomialPolynomialIn mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and nonnegative integer exponents..., then the Marcinkiewicz theorem (named after Józef MarcinkiewiczJózef MarcinkiewiczJózef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of nonzero cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s.
If X and Y are jointly normal and uncorrelatedUncorrelatedIn probability theory and statistics, two realvalued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e..., then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.[proofNormally distributed and uncorrelated does not imply independentIn probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....] For nonnormal random variables uncorrelatedness does not imply independence.
If X and Y are independent {{nowrapN(μ, σ 2)}} random variables, then {{nowrapX + Y}} and {{nowrapX − Y}} are also independent and identically distributed (this follows from the polarization identityPolarization identityIn mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \x\ \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that {{nowrapX + Y}} and {{nowrapX − Y}} are also independent, then both X and Y must necessarily have normal distributions.
More generally, if X1, ..., Xn are independent random variables, then two linear combinations ∑akXk and ∑bkXk will be independent if and only if all Xks are normal and {{nowrap∑akbk{{SubSupσk2}} {{=}} 0}}, where {{SubSupσk2}} denotes the variance of Xk.
Normal distribution is infinitely divisibleInfinite divisibility (probability)The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that
Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent {{nowrapN(μ, σ2)}} random variables and a, b are arbitrary real numbers, then
where X3 is also {{nowrapN(μ, σ2)}}. This relationship directly follows from property (1).
The Kullback–Leibler divergenceKullback–Leibler divergenceIn probability theory and information theory, the Kullback–Leibler divergence is a nonsymmetric measure of the difference between two probability distributions P and Q... between two normal distributions {{nowrap1=X1 ∼ N(μ1, σ21 )}}and {{nowrap1=X2 ∼ N(μ2, σ22 )}}is given by:
The Hellinger distanceHellinger distanceIn probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of fdivergence... between the same distributions is equal to
The Fisher information matrix for normal distribution is diagonal and takes form
Normal distributions belongs to an exponential familyExponential familyIn probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential... with natural parameters and , and natural statistics x and x2. The dual, expectation parameters for normal distribution are {{nowrap1=η1 = μ}} and {{nowrap1=η2 = μ2 + σ2}}.
The conjugate priorConjugate priorIn Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood... of the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... of a normal distribution is another normal distribution. Specifically, if x1, …, xn are iid {{nowrapN(μ, σ2)}} and the prior is {{nowrapμ ~ N(μ0, σ{{sup=2b=0}})}}, then the posterior distribution for the estimator of μ will be
Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution {{nowrapN(μ, σ2)}} is the one with the maximum entropyMaximum entropy probability distributionIn statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions.....
The family of normal distributions forms a manifoldManifoldIn mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold.... with constant curvatureConstant curvatureIn mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points... −1. The same family is flatFlat manifoldIn mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°.... with respect to the (±1)connections ∇(e) and ∇(m).
Operations on a single random variable
If X is distributed normally with mean μ and variance σ2, then
The exponential of X is distributed lognormally: {{nowrapeX ~ lnN (μ, σ2)}}.
The absolute value of X has folded normal distributionFolded Normal DistributionThe folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = X has a folded normal distribution. Such a case may be encountered if only the magnitude of some...: {{nowrapIXI ~ Nf (μ, σ2)}}. If {{nowrapμ {{=}} 0}} this is known as the halfnormal distributionHalfnormal distributionThe halfnormal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e....
The square of X/σ has the noncentral chisquared distribution with one degree of freedom: {{nowrap1= X2/σ2 ~ χ21(μ2/σ2)}}. If μ = 0, the distribution is called simply chisquared.
The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distributionTruncated normal distributionIn probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics....
(X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
Combination of two independent random variables
If X1 and X2 are two independent standard normal random variables, then
Their sum and difference is distributed normally with mean zero and variance two: {{nowrapX1 ± X2 ∼ N(0, 2)}}.
Their product {{nowrapZ {{=}} X1·X2}} follows the "productnormal" distribution with density function {{nowrapfZ(z) {{=}} π−1K0({{!}}z{{!}}),}} where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative... {{nowrap1= φZ(t) = (1 + t 2)−1/2}}.
Their ratio follows the standard Cauchy distributionCauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...: {{nowrapX1 ÷ X2 ∼ Cauchy(0, 1)}}.
Their Euclidean norm $\backslash scriptstyle\backslash sqrt\{X\_1^2\backslash ,+\backslash ,X\_2^2\}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.
Combination of two or more independent random variables
If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chisquared distribution with n degrees of freedom: $\backslash scriptstyle\; X\_1^2\; +\; \backslash cdots\; +\; X\_n^2\backslash \; \backslash sim\backslash \; \backslash chi\_n^2$.
If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average..., which can be demonstrated using the Basu's theoremBasu's theoremIn statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu.... or Cochran's theoremCochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :.... The ratio of these two quantities will have the Student's tdistribution with n − 1 degrees of freedom:
If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the {{nowrapFdistribution}} with (n, m) degrees of freedom:
Operations on the density function
The split normal distributionSplit normal distributionIn probability theory and statistics, the split normal distribution also known as the twopiece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances... is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distributionTruncated normal distributionIn probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics... results from rescaling a section of a single density function.
Extensions
The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is onedimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
Multivariate normal distribution describes the Gaussian law in the kdimensional Euclidean spaceEuclidean spaceIn mathematics, Euclidean space is the Euclidean plane and threedimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions.... A vector {{nowrapX ∈ Rk}} is multivariatenormally distributed if any linear combination of its components {{nowrap∑{{sup=kb=j=1}}aj Xj}} has a (univariate) normal distribution. The variance of X is a k×k symmetric positivedefinite matrix V.
Rectified Gaussian distributionRectified Gaussian DistributionIn probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0... a rectified version of normal distribution with all the negative elements reset to 0
Complex normal distributionComplex normal distributionIn probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C... deals with the complex normal vectors. A complex vector {{nowrapX ∈ Ck}} is said to be normal if both its real and imaginary components jointly possess a 2kdimensional multivariate normal distribution. The variancecovariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
Matrix normal distributionMatrix normal distributionThe matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrixvalued random variables. Definition :... describes the case of normally distributed matrices.
Gaussian processGaussian processIn probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...es are the normally distributed stochastic processStochastic processIn probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...es. These can be viewed as elements of some infinitedimensional Hilbert spaceHilbert spaceThe mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the twodimensional Euclidean plane and threedimensional space to spaces with any finite or infinite number of dimensions... H, and thus are the analogues of multivariate normal vectors for the case {{nowrapk {{=}} ∞}}. A random element {{nowraph ∈ H}} is said to be normal if for any constant {{nowrapa ∈ H}} the scalar product {{nowrap(a, h)}} has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance {{nowrapoperator K: H → H}}. Several Gaussian processes became popular enough to have their own names:
Brownian motionWiener processIn mathematics, the Wiener process is a continuoustime stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...,
Brownian bridgeBrownian bridgeA Brownian bridge is a continuoustime stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...,
Ornstein–Uhlenbeck process.
Gaussian qdistributionGaussian qdistributionIn mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution... is an abstract mathematical construction which represents a "qanalogue" of the normal distribution.
the qGaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropyTsallis entropyIn physics, the Tsallis entropy is a generalization of the standard BoltzmannGibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated..., and is one type of Tsallis distributionTsallis distributionIn qanalog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual.... Note that this distribution is different from the Gaussian qdistributionGaussian qdistributionIn mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution... above.
One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
Pearson distributionPearson distributionThe Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History :... — a fourparametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.
Normality tests
{{MainNormality tests}}
Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesisNull hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position... H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
"Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
QQ plotQQ plotIn statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen... — is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
PP plot — similar to the QQ plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
Wilk–Shapiro test employs the fact that the line in the QQ plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
Normal probability plotNormal probability plotThe normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed.... (rankitRankitIn statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.Example:This is perhaps most... plot)
Moment tests:
D'Agostino's Ksquared testD'Agostino's Ksquared testIn statistics, D’Agostino’s K2 test is a goodnessoffit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...
Jarque–Bera testJarque–Bera testIn statistics, the Jarque–Bera test is a goodnessoffit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...
Empirical distribution function tests:
Lilliefors testLilliefors testIn statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test... (an adaptation of the Kolmogorov–Smirnov test)
Anderson–Darling test
Estimation of parameters
It is often the case that we don't know the parameters of the normal distribution, but instead want to estimateEstimation theoryEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the... them. That is, having a sample (x1, …, xn) from a normal {{nowrapN(μ, σ2)}} population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihoodMaximum likelihoodIn statistics, maximumlikelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximumlikelihood estimation provides estimates for the model's parameters.... method, which requires maximization of the loglikelihood function:
Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:
Estimator $\backslash scriptstyle\backslash hat\backslash mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\backslash scriptstyle\backslash overline\{x\}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theoremLehmann–Scheffé theoremIn statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation..., $\backslash scriptstyle\backslash hat\backslash mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:
The variance of this estimator is equal to the μμelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. This implies that the estimator is finitesample efficient. Of practical importance is the fact that the standard errorStandard error (statistics)The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate.... of $\backslash scriptstyle\backslash hat\backslash mu$ is proportional to $\backslash scriptstyle1/\backslash sqrt\{n\}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion pollOpinion pollAn opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...s and the number of trials in Monte Carlo simulations.
From the standpoint of the asymptotic theoryAsymptotic theory (statistics)In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests..., $\backslash scriptstyle\backslash hat\backslash mu$ is consistentConsistent estimatorIn statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0..., that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:
The estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the $\backslash scriptstyle\backslash hat\backslash sigma^2$. This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from $\backslash scriptstyle\backslash hat\backslash sigma^2$ by having {{nowrap(n − 1)}} instead of n in the denominator (the so called Bessel's correctionBessel's correctionIn statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...):
The difference between s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas $\backslash scriptstyle\backslash hat\backslash sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is "better" than the s2 in terms of the mean squared errorMean squared errorIn statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or... (MSE) criterion. In finite samples both s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ have scaled chisquared distribution with {{nowrap(n − 1)}} degrees of freedom:
The first of these expressions shows that the variance of s2 is equal to {{nowrap2σ4/(n−1)}}, which is slightly greater than the σσelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finitesample efficient estimator for σ2 does not exist.
Applying the asymptotic theory, both estimators s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ are consistent, that is they converge in probability to σ2 as the sample size {{nowrapn → ∞}}. The two estimators are also both asymptotically normal:
In particular, both estimators are asymptotically efficient for σ2.
By Cochran's theoremCochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :..., for normal distribution the sample mean $\backslash scriptstyle\backslash hat\backslash mu$ and the sample variance s2 are independent, which means there can be no gain in considering their joint distributionJoint distributionIn the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y.... There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\backslash scriptstyle\backslash hat\backslash mu$ and s can be employed to construct the socalled tstatistic:
This quantity t has the Student's tdistribution with {{nowrap(n − 1)}} degrees of freedom, and it is an ancillary statisticAncillary statisticIn statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken... (independent of the value of the parameters). Inverting the distribution of this tstatistics will allow us to construct the confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the... for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2:
where tk,p and {{SubSupχk,p2}} are the pth quantileQuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...s of the t and χ2distributions respectively. These confidence intervals are of the level {{nowrap1 − α}}, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take {{nowrapα {{=}} 5%}}, resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\backslash scriptstyle\backslash hat\backslash mu$ and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of {{nowrapα {{=}} 5%}}, results in {{nowrap{{!}}z0.025{{!}} {{=}} 1.961.961.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...}}.
Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three categories:
Exactly normal distributions;
Approximately normal laws, for example when such approximation is justified by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...; and
Distributions modeled as normal — the normal distribution being the distribution with maximum entropyPrinciple of maximum entropyIn Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution... for a given mean and variance.
Exact normality
Certain quantities in physicsPhysicsPhysics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic... are distributed normally, as was first demonstrated by James Clerk MaxwellJames Clerk MaxwellJames Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory.... Examples of such quantities are:
Velocities of the molecules in the ideal gasIdeal gasAn ideal gas is a theoretical gas composed of a set of randomlymoving, noninteracting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as.... More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
Probability density function of a ground state in a quantum harmonic oscillatorQuantum harmonic oscillatorThe quantum harmonic oscillator is the quantummechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics....
The position of a particle which experiences diffusionDiffusionMolecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles.... If initially the particle is located at a specific point (that is its probability distribution is the dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation {{nowrap1={{frac2∂∂t}} f(x,t) = {{frac212}} {{frac2∂2∂x2}} f(x,t)}}. If the initial location is given by a certain density function g(x), then the density at time t is the convolutionConvolutionIn mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to crosscorrelation... of g and the normal PDF.
Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
In counting problems, where the central limit theorem includes a discretetocontinuum approximation and where infinitely divisibleInfinite divisibilityThe concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory... and decomposableIndecomposable distributionIn probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more nonconstant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:... distributions are involved, such as
Binomial random variables, associated with binary response variables;
Poisson random variablesPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since..., associated with rare events;
Thermal light has a Bose–EinsteinBose–Einstein statisticsIn statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.Concept:... distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.
Assumed normality
{{cquoteI can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first approximation, particularly in theoretical investigations. — {{harvtxtPearson1901}}}}
There are statistical methods to empirically test that assumption, see the above Normality tests section.
In biologyBiologyBiology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines..., the logarithm of various variables tend to have a normal distribution, that is, they tend to have a lognormal distribution (after separation on male/female subpopulations), with examples including:
Measures of size of living tissue (length, height, skin area, weight);
The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
Certain physiological measurements, such as blood pressure of adult humans.
In financeFinance"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created..., in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît MandelbrotBenoît MandelbrotBenoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child... have argued that logLevy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crashStock market crashA stock market crash is a sudden dramatic decline of stock prices across a significant crosssection of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...es.
Measurement errorsPropagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them... in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.
In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ testIntelligence quotientAn intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SATSATThe SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...'s traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
Many scores are derived from the normal distribution, including percentile rankPercentile rankThe percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....s ("percentiles" or "quantiles"), normal curve equivalentNormal curve equivalentIn educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...s, stanineStanineStanine is a method of scaling test scores on a ninepoint standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...s, zscoresStandard scoreIn statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation..., and Tscores. Additionally, a number of behavioral statisticalStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments.... procedures are based on the assumption that scores are normally distributed; for example, ttestsStudent's ttestA ttest is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known... and ANOVAsAnalysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation.... Bell curve gradingBell curve gradingIn education, grading on a curve is a statistical method of assigning grades designed to yield a predetermined distribution of grades among the students in a class... assigns relative grades based on a normal distribution of scores.
In hydrologyHydrologyHydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability... the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysisCumulative frequency analysisCumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the nonexceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent....
Generating values from normal distribution
In computer simulations, especially in applications of the MonteCarlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a {{nowrapN(μ, σ{{sup=2}})}} can be generated as {{nowrapX {{=}} μ + σZ}}, where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniformUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... random variates.
The most straightforward method is based on the probability integral transformProbability integral transformIn statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution... property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in {{harvtxtHart1968}} and in the erfError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations... article.
An easy to program approximate approach, that relies on the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common..., is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12section eleventhorder polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).
The Box–Muller method uses two independent random numbers U and V distributed uniformlyUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... on (0,1). Then the two random variables X and Y
will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm {{nowrapX2 + Y2}} will have the chisquared distribution with two degrees of freedom, which is an easily generated exponentialExponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e... random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.
Marsaglia polar methodMarsaglia polar methodThe polar method is a pseudorandom number sampling method for generating a pair of independent standard normal random variables... is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities
are returned. Again, X and Y will be independent and standard normally distributed.
The Ratio method is a rejection method. The algorithm proceeds as follows:
Generate two independent uniform deviates U and V;
Compute X = {{sqrt8/e}} (V − 0.5)/U;
If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.
The ziggurat algorithmZiggurat algorithmThe ziggurat algorithm is an algorithm for pseudorandom number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformlydistributed random numbers, typically from a pseudorandom number generator, as well as precomputed tables. The... {{harvMarsagliaTsang2000}} is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an iftest. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.
There is also some investigation{{Citation neededdate=June 2010}} into the connection between the fast Hadamard transformHadamard transformThe Hadamard transform is an example of a generalized class of Fourier transforms... and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.
Numerical approximations for the normal CDF
The standard normal CDFCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"... is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integrationNumerical integrationIn numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of..., Taylor seriesTaylor seriesIn mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point...., asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.
{{harvtxtAbramowitzStegun1964}} give the approximation for Φ(x) for x > 0 with the absolute error ε(x) < 7.5·10−8 (algorithm 26.2.17):
where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
{{harvtxtHart1968}} lists almost a hundred of rational functionRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:... approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by {{harvtxtWest2009}} combines Hart's algorithm 5666 with a continued fractionContinued fractionIn mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on... approximation in the tail to provide a fast computation algorithm with a 16digit precision.
{{harvtxtW. J. Cody1969}} after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev ApproximationRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:.... (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).
{{harvtxtMarsaglia2004}} suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion
for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when {{nowrap1=x = 10}}).
The GNU Scientific LibraryGNU Scientific LibraryIn computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science... calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.
Development
Some authors attribute the credit for the discovery of the normal distribution to de MoivreAbraham de MoivreAbraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling..., who in 1738
De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii {{nowrap(a + b)n}} in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example {{harvtxtWalker1985}}.
published in the second edition of his "The Doctrine of ChancesThe Doctrine of ChancesThe Doctrine of Chances was the first textbook on probability theory, written by 18thcentury French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots..." the study of the coefficients in the binomial expansion of {{nowrap(a + b)n}}. De Moivre proved that the middle term in this expansion has the approximate magnitude of $\backslash scriptstyle\; 2/\backslash sqrt\{2\backslash pi\; n\}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\backslash scriptstyle\; \backslash frac\{2\backslash ell\backslash ell\}\{n\}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, StiglerStephen StiglerStephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics... points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.
In 1809 GaussCarl Friedrich GaussJohann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum... published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, {{nobrM′}}, {{nobrM′′, …}} to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability {{nobrφ(M−V) · φ(M′−V) · φ(M′′−V) · …}} of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the wellknown answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — {{harvtxtGauss1809loc=section 177}} Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:
$$
\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{\mathrm{hh}\Delta\Delta},
where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the nonlinear weighted least squares (NWLS) method.
Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from {{harvtxtPearson1905loc=p. 189}} It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral
{{Aboutthe univariate normal distributionnormally distributed vectorsMultivariate normal distribution}}
{{Probability distribution
 name =
 type = density
 pdf_image = The red line is the standard normal distribution
 cdf_image = Colors match the image above
 notation =
 parameters = {{nowrapμ ∈ R}} — mean (locationLocation parameterIn statistics, a location family is a class of probability distributions that is parametrized by a scalar or vectorvalued parameter μ, which determines the "location" or shift of the distribution...){{nowrapσ2 > 0}} — variance (squared scaleScale parameterIn probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...)
 support = x ∈ R
 pdf =
 cdf =
 mean = μ
 median = μ
 mode = μ
 variance = σ2
 skewness = 0
 kurtosis = 0
 entropy =
 mgf =
 char =
 fisher =
 conjugate prior = Normal distribution
}}
In probability theoryProbability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single..., the normal (or Gaussian) distribution is a continuous probability distribution that has a bellshaped probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the..., known as the Gaussian function or informally the bell curve:The designation "bell curve" is ambiguous: there are many other distributions which are "bell"shaped: the Cauchy distributionCauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner..., Student's tdistribution, generalized normal, logistic, etc.
where parameter μ is the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... or expectationExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on... (location of the peak) and σ 2 is the varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution..., the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average.... The distribution with {{nowrapμ {{=}} 0}} and {{nowrapσ 2 {{=}} 1}} is called the standard normal. A normal distribution is often used as a first approximation to describe realvalued random variableRandom variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...s that cluster around a single meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... value.
The normal distribution is considered the most prominent probability distribution in statisticsStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments..... There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common..., which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.
For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural scienceNatural scienceThe natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...s, and social sciences as a simple model for complex phenomena. For example, the observational errorObservational errorObservational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.Science and experiments:... in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertaintyPropagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them... is computed using this assumption. Note that a normallydistributed variable has a symmetric distribution about its mean. Quantities that grow exponentiallyExponential growthExponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value..., such as prices, incomes or populations, are often skewed to the rightSkewnessIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a realvalued random variable. The skewness value can be positive or negative, or even undefined..., and hence may be better described by other distributions, such as the lognormal distribution or Pareto distribution. In addition, the probability of seeing a normallydistributed value that is far (i.e. more than a few standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...s) from the mean drops off extremely rapidly. As a result, statistical inferenceStatistical inferenceIn statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation... using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavytailed distribution such as the Student's tdistribution.
From a technical perspective, alternative characterizations are possible, for example:
The normal distribution is the only absolutely continuousAbsolute continuityIn mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity... distribution all of whose cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s beyond the first two (i.e. other than the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... and varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...) are zero.
For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropyMaximum entropy probability distributionIn statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions.....
Definition
The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
The factor $\backslash scriptstyle\backslash \; 1/\backslash sqrt\{2\backslash pi\}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proofGaussian integralThe Gaussian integral, also known as the EulerPoisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...] and {{frac212}} in the exponent makes the "width" of the curve (measured as half the distance between the inflection pointInflection pointIn differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phiPhi (letter)Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...), whereas density functionsProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the... for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic functionQuadratic functionA quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the yaxis.... (just as an exponential distributionExponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e... results from exponentiating a linear function):
This yields the classic "bell curve" shape, provided that {{nowrapa < 0}} so that the quadratic function is concaveConcave functionIn mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.Definition:.... {{nowrapf(x) > 0}} everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the xaxis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).
Rather than using a, b, and c, it is far more common to describe a normal distribution by its meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... {{nowrapμ {{=}} − {{frac2b2a}}}} and varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution... {{nowrapσ2 {{=}} − {{frac212a}}}}. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,
For a standard normal distribution, {{nowrap1=μ = 0}} and {{nowrap1=σ2 = 1}}. The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.
The parameter μ is at the same time the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean...., the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to... and the modeMode (statistics)In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.... of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean..... The square root of σ2 is called the standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average... and is the width of the density function.
The normal distribution is usually denoted by N(μ, σ2). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeXLaTeXLaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write
Alternative formulations
Some authors advocate using the precisionPrecision (statistics)In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall.... instead of the variance, and variously define it as {{nowrapτ {{=}} σ−2}} or {{nowrapτ {{=}} σ−1}}. This parametrization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.
The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance {{nowrapσ2 {{=}} {{frac212}}}} :
{{harvtxtStigler1982}} goes even further and insists the standard normal to be with the variance {{nowrapσ2 {{=}} {{frac212π}}}} :
According to the author, this formulation is advantageous because of a much simpler and easiertoremember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal {{nowrapN(μ, τ)}} with mean μ and precision τ will be equal to
Characterization
In the previous section the normal distribution was defined by specifying its probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the.... However there are other ways to characterizeCharacterization (mathematics)In mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in... a probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values..... They include: the cumulative distribution functionCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"..., the momentsMoment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by..., the cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s, the characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative..., the momentgenerating functionMomentgenerating functionIn probability theory and statistics, the momentgenerating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or..., etc.
Probability density function
The probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the... (pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:
This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".
Properties:
Function f(x) is unimodal and symmetric around the point {{nowrapx {{=}} μ}}, which is at the same time the modeMode (statistics)In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score...., the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to... and the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... of the distribution.
The inflection pointInflection pointIn differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...s of the curve occur one standard deviation away from the mean (i.e., at {{nowrapx {{=}} μ − σ}} and {{nowrapx {{=}} μ + σ}}).
Function f(x) is logconcave.
The standard normal density ϕ(x) is an eigenfunctionEigenfunctionIn mathematics, an eigenfunction of a linear operator, A, defined on some function space is any nonzero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has... of the Fourier transformFourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions....
The function is supersmooth of order 2, implying that it is infinitely differentiable.
The first derivative of ϕ(x) is {{nowrapϕ′(x) {{=}} −x·ϕ(x)}}; the second derivative is {{nowrapϕ′′(x) {{=}} (x2 − 1)ϕ(x)}}. More generally, the nth derivative is given by {{nowrapϕ(n)(x) {{=}} (−1)nHn(x)ϕ(x)}}, where Hn is the Hermite polynomial of order n.
When {{nowrapσ2 {{=}} 0}}, the density function doesn't exist. However a generalized functionGeneralized functionIn mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges... that defines a measure on the real line, and it can be used to calculate, for example, expected value is
where δ(x) is the Dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical... which is equal to infinity at {{nowrapx {{=}} 0}} and is zero elsewhere.
Cumulative distribution function
The cumulative distribution functionCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"... (CDF) describes probability of a random variable falling in the interval {{nowrap(−∞, x]}}.
The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phiPhi (letter)Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...), and can be computed as an integral of the probability density function:
This integral cannot be expressed in terms of elementary functions, so is simply called the error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations..., or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the CDF will be equal to
The complement of the standard normal CDF, {{nowrapQ(x) {{=}} 1 − Φ(x)}}, is referred to as the QfunctionQfunctionIn statistics, the Qfunction is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x..., especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Qfunction, all of which are simple transformations of Φ, are also used occasionally.
Properties:
The standard normal CDF is 2fold rotationally symmetric around point (0, ½): {{nowrap Φ(−x) {{=}} 1 − Φ(x)}}.
The derivative of Φ(x) is equal to the standard normal pdf ϕ(x): {{nowrap Φ′(x) {{=}} ϕ(x)}}.
The antiderivativeAntiderivativeIn calculus, an "antiderivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f... of Φ(x) is: {{nowrap1 = ∫ Φ(x) dx = x Φ(x) + ϕ(x)}}.
For a normal distribution with zero variance, the CDF is the Heaviside step functionHeaviside step functionThe Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument.... (with {{nowrapH(0) {{=}} 1}} convention):
Quantile function
The inverse of the standard normal CDF, called the quantile functionQuantile functionIn probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability... or probit function, is expressed in terms of the inverse error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...:
QuantileQuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...s of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the {{nowrap(−∞, zp]}} interval. The quantiles are used in hypothesis testing, construction of confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...s and QQ plotQQ plotIn statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...s. The most "famous" normal quantile is {{nowrap1.961.961.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the... {{=}} z0.975}}. A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.
For a normal random variable with mean μ and variance σ2, the quantile function is
Characteristic function and moment generating function
The characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative... φX(t) of a random variable X is defined as the expected valueExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on... of eitX, where i is the imaginary unitImaginary unitIn mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek..., and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transformFourier transformIn mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions... of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is
The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) {{=}} eiμz − {{frac212}}σ2z2 for all z ∈ C.
The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to
The cumulant generating function is the logarithm of the moment generating function:
Since this is a quadratic polynomial in t, only the first two cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s are nonzero.
Moments
{{see alsoList of integrals of Gaussian functions}}
The normal distribution has momentsMoment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by... of all orders. That is, for a normally distributed X with mean μ and variance {{nowrapσ 2}}, the expectation {{nowrapE[{{!}}X{{!}}p}}] exists and is finite for all p such that {{nowrapRe[p] > −1}}. Usually we are interested only in moments of integer orders: {{nowrapp {{=}} 1, 2, 3, …}}.
Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of {{nowrap(X − μ) p}}. Using standardization of normal random variables, this expectation will be equal to {{nowrapσ p · E[Zp]}}, where Z is standard normal.
Here n!! denotes the double factorial, that is the product of every odd number from n to 1.
Central absolute moments are the moments of X − μ. They coincide with regular moments for all even orders, but are nonzero for all odd ps.
The last formula is true for any noninteger {{nowrapp > −1}}.
Raw moments and raw absolute moments are the moments of X and X respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric functionConfluent hypergeometric functionIn mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...s 1F1 and U.{{Citation neededdate=June 2010}}
These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.
First two cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s are equal to μ and σ 2 respectively, whereas all higherorder cumulants are equal to zero.
Order Raw moment Central moment Cumulant
1
μ
0
μ
2
μ2 + σ2
σ 2
σ 2
3
μ3 + 3μσ2
0
0
4
μ4 + 6μ2σ2 + 3σ4
3σ 4
0
5
μ5 + 10μ3σ2 + 15μσ4
0
0
6
μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6
15σ 6
0
7
μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6
0
0
8
μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8
105σ 8
0
Standardizing normal random variables
As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then
has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:
This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via
Standard deviation and confidence intervals
About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 689599.7 rule689599.7 ruleIn statistics, the 689599.7 rule, or threesigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean...., or the empirical rule, or the 3sigma rule.
To be more precise, the area under the bell curve between {{nowrapμ − nσ}} and {{nowrapμ + nσ}} is given by
where erf is the error functionError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations.... To 12 decimal places, the values for the 1, 2, up to 6sigma points are:
i.e. 1 minus ... or 1 in ...
1
{{val0.682689492137}}
{{val0.317310507863}}
{{val3.15148718753}}
2
{{val0.954499736104}}
{{val0.045500263896}}
{{val21.9778945080}}
3
{{val0.997300203937}}
{{val0.002699796063}}
{{val370.398347345}}
4
{{val0.999936657516}}
{{val0.000063342484}}
{{val15787.1927673}}
5
{{val0.999999426697}}
{{val0.000000573303}}
{{val1744277.89362}}
6
{{val0.999999998027}}
{{val0.000000001973}}
{{val506797345.897}}
The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...s of the specified levels based on normally distributed (or asymptotically normal) estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....s:
n n
0.80
{{val1.281551565545}}
0.999
{{val3.290526731492}}
0.90
{{val1.644853626951}}
0.9999
{{val3.890591886413}}
0.95
{{val1.959963984540}}
0.99999
{{val4.417173413469}}
0.98
{{val2.326347874041}}
0.999999
{{val4.891638475699}}
0.99
{{val2.575829303549}}
0.9999999
{{val5.326723886384}}
0.995
{{val2.807033768344}}
0.99999999
{{val5.730728868236}}
0.998
{{val3.090232306168}}
0.999999999
{{val6.109410204869}}
where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.
Central limit theorem
{{MainCentral limit theorem}}
The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2, then the central limit theorem states that
The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test statisticTest statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...s, scoreScore (statistics)In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...s, and estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
The PoissonPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...(λ) distribution is approximately normal N(λ, λ) for large values of λ.
The chisquared distribution χ2(k) is approximately normal N(k, 2k) for large ks.
The Student's tdistribution t(ν) is approximately normal N(0, 1) when ν is large.
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theoremBerry–Esséen theoremThe central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased..., improvements of the approximation are given by the Edgeworth expansions.
Miscellaneous
The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform {{nowrapaX + b}} (for some real numbers a and b) is also normally distributed:
Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proofSum of normally distributed random variablesIn probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.Independent random variables:If X...]
The converse of (1) is also true: if X1 and X2 are independent and their sum {{nowrapX1 + X2}} is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér's decomposition theoremCramér's theoremIn mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist. Normal random variables :.... The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary nonnormal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.
If the characteristic function φX of some random variable X is of the form {{nowrapφX(t) {{=}} eQ(t)}}, where Q(t) is a polynomialPolynomialIn mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and nonnegative integer exponents..., then the Marcinkiewicz theorem (named after Józef MarcinkiewiczJózef MarcinkiewiczJózef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of nonzero cumulantCumulantIn probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...s.
If X and Y are jointly normal and uncorrelatedUncorrelatedIn probability theory and statistics, two realvalued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e..., then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.[proofNormally distributed and uncorrelated does not imply independentIn probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....] For nonnormal random variables uncorrelatedness does not imply independence.
If X and Y are independent {{nowrapN(μ, σ 2)}} random variables, then {{nowrapX + Y}} and {{nowrapX − Y}} are also independent and identically distributed (this follows from the polarization identityPolarization identityIn mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \x\ \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that {{nowrapX + Y}} and {{nowrapX − Y}} are also independent, then both X and Y must necessarily have normal distributions.
More generally, if X1, ..., Xn are independent random variables, then two linear combinations ∑akXk and ∑bkXk will be independent if and only if all Xks are normal and {{nowrap∑akbk{{SubSupσk2}} {{=}} 0}}, where {{SubSupσk2}} denotes the variance of Xk.
Normal distribution is infinitely divisibleInfinite divisibility (probability)The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that
Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent {{nowrapN(μ, σ2)}} random variables and a, b are arbitrary real numbers, then
where X3 is also {{nowrapN(μ, σ2)}}. This relationship directly follows from property (1).
The Kullback–Leibler divergenceKullback–Leibler divergenceIn probability theory and information theory, the Kullback–Leibler divergence is a nonsymmetric measure of the difference between two probability distributions P and Q... between two normal distributions {{nowrap1=X1 ∼ N(μ1, σ21 )}}and {{nowrap1=X2 ∼ N(μ2, σ22 )}}is given by:
The Hellinger distanceHellinger distanceIn probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of fdivergence... between the same distributions is equal to
The Fisher information matrix for normal distribution is diagonal and takes form
Normal distributions belongs to an exponential familyExponential familyIn probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential... with natural parameters and , and natural statistics x and x2. The dual, expectation parameters for normal distribution are {{nowrap1=η1 = μ}} and {{nowrap1=η2 = μ2 + σ2}}.
The conjugate priorConjugate priorIn Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood... of the meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean.... of a normal distribution is another normal distribution. Specifically, if x1, …, xn are iid {{nowrapN(μ, σ2)}} and the prior is {{nowrapμ ~ N(μ0, σ{{sup=2b=0}})}}, then the posterior distribution for the estimator of μ will be
Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution {{nowrapN(μ, σ2)}} is the one with the maximum entropyMaximum entropy probability distributionIn statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions.....
The family of normal distributions forms a manifoldManifoldIn mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold.... with constant curvatureConstant curvatureIn mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points... −1. The same family is flatFlat manifoldIn mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°.... with respect to the (±1)connections ∇(e) and ∇(m).
Operations on a single random variable
If X is distributed normally with mean μ and variance σ2, then
The exponential of X is distributed lognormally: {{nowrapeX ~ lnN (μ, σ2)}}.
The absolute value of X has folded normal distributionFolded Normal DistributionThe folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = X has a folded normal distribution. Such a case may be encountered if only the magnitude of some...: {{nowrapIXI ~ Nf (μ, σ2)}}. If {{nowrapμ {{=}} 0}} this is known as the halfnormal distributionHalfnormal distributionThe halfnormal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e....
The square of X/σ has the noncentral chisquared distribution with one degree of freedom: {{nowrap1= X2/σ2 ~ χ21(μ2/σ2)}}. If μ = 0, the distribution is called simply chisquared.
The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distributionTruncated normal distributionIn probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics....
(X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
Combination of two independent random variables
If X1 and X2 are two independent standard normal random variables, then
Their sum and difference is distributed normally with mean zero and variance two: {{nowrapX1 ± X2 ∼ N(0, 2)}}.
Their product {{nowrapZ {{=}} X1·X2}} follows the "productnormal" distribution with density function {{nowrapfZ(z) {{=}} π−1K0({{!}}z{{!}}),}} where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic functionCharacteristic function (probability theory)In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative... {{nowrap1= φZ(t) = (1 + t 2)−1/2}}.
Their ratio follows the standard Cauchy distributionCauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...: {{nowrapX1 ÷ X2 ∼ Cauchy(0, 1)}}.
Their Euclidean norm $\backslash scriptstyle\backslash sqrt\{X\_1^2\backslash ,+\backslash ,X\_2^2\}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.
Combination of two or more independent random variables
If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chisquared distribution with n degrees of freedom: $\backslash scriptstyle\; X\_1^2\; +\; \backslash cdots\; +\; X\_n^2\backslash \; \backslash sim\backslash \; \backslash chi\_n^2$.
If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average..., which can be demonstrated using the Basu's theoremBasu's theoremIn statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu.... or Cochran's theoremCochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :.... The ratio of these two quantities will have the Student's tdistribution with n − 1 degrees of freedom:
If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the {{nowrapFdistribution}} with (n, m) degrees of freedom:
Operations on the density function
The split normal distributionSplit normal distributionIn probability theory and statistics, the split normal distribution also known as the twopiece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances... is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distributionTruncated normal distributionIn probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics... results from rescaling a section of a single density function.
Extensions
The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is onedimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
Multivariate normal distribution describes the Gaussian law in the kdimensional Euclidean spaceEuclidean spaceIn mathematics, Euclidean space is the Euclidean plane and threedimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions.... A vector {{nowrapX ∈ Rk}} is multivariatenormally distributed if any linear combination of its components {{nowrap∑{{sup=kb=j=1}}aj Xj}} has a (univariate) normal distribution. The variance of X is a k×k symmetric positivedefinite matrix V.
Rectified Gaussian distributionRectified Gaussian DistributionIn probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0... a rectified version of normal distribution with all the negative elements reset to 0
Complex normal distributionComplex normal distributionIn probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C... deals with the complex normal vectors. A complex vector {{nowrapX ∈ Ck}} is said to be normal if both its real and imaginary components jointly possess a 2kdimensional multivariate normal distribution. The variancecovariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
Matrix normal distributionMatrix normal distributionThe matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrixvalued random variables. Definition :... describes the case of normally distributed matrices.
Gaussian processGaussian processIn probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...es are the normally distributed stochastic processStochastic processIn probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...es. These can be viewed as elements of some infinitedimensional Hilbert spaceHilbert spaceThe mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the twodimensional Euclidean plane and threedimensional space to spaces with any finite or infinite number of dimensions... H, and thus are the analogues of multivariate normal vectors for the case {{nowrapk {{=}} ∞}}. A random element {{nowraph ∈ H}} is said to be normal if for any constant {{nowrapa ∈ H}} the scalar product {{nowrap(a, h)}} has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance {{nowrapoperator K: H → H}}. Several Gaussian processes became popular enough to have their own names:
Brownian motionWiener processIn mathematics, the Wiener process is a continuoustime stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...,
Brownian bridgeBrownian bridgeA Brownian bridge is a continuoustime stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...,
Ornstein–Uhlenbeck process.
Gaussian qdistributionGaussian qdistributionIn mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution... is an abstract mathematical construction which represents a "qanalogue" of the normal distribution.
the qGaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropyTsallis entropyIn physics, the Tsallis entropy is a generalization of the standard BoltzmannGibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated..., and is one type of Tsallis distributionTsallis distributionIn qanalog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual.... Note that this distribution is different from the Gaussian qdistributionGaussian qdistributionIn mathematical physics and probability and statistics, the Gaussian qdistribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution... above.
One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
Pearson distributionPearson distributionThe Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History :... — a fourparametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.
Normality tests
{{MainNormality tests}}
Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesisNull hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position... H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
"Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
QQ plotQQ plotIn statistics, a QQ plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen... — is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
PP plot — similar to the QQ plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
Wilk–Shapiro test employs the fact that the line in the QQ plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
Normal probability plotNormal probability plotThe normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed.... (rankitRankitIn statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.Example:This is perhaps most... plot)
Moment tests:
D'Agostino's Ksquared testD'Agostino's Ksquared testIn statistics, D’Agostino’s K2 test is a goodnessoffit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...
Jarque–Bera testJarque–Bera testIn statistics, the Jarque–Bera test is a goodnessoffit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...
Empirical distribution function tests:
Lilliefors testLilliefors testIn statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test... (an adaptation of the Kolmogorov–Smirnov test)
Anderson–Darling test
Estimation of parameters
It is often the case that we don't know the parameters of the normal distribution, but instead want to estimateEstimation theoryEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the... them. That is, having a sample (x1, …, xn) from a normal {{nowrapN(μ, σ2)}} population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihoodMaximum likelihoodIn statistics, maximumlikelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximumlikelihood estimation provides estimates for the model's parameters.... method, which requires maximization of the loglikelihood function:
Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:
Estimator $\backslash scriptstyle\backslash hat\backslash mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\backslash scriptstyle\backslash overline\{x\}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theoremLehmann–Scheffé theoremIn statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation..., $\backslash scriptstyle\backslash hat\backslash mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:
The variance of this estimator is equal to the μμelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. This implies that the estimator is finitesample efficient. Of practical importance is the fact that the standard errorStandard error (statistics)The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate.... of $\backslash scriptstyle\backslash hat\backslash mu$ is proportional to $\backslash scriptstyle1/\backslash sqrt\{n\}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion pollOpinion pollAn opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...s and the number of trials in Monte Carlo simulations.
From the standpoint of the asymptotic theoryAsymptotic theory (statistics)In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests..., $\backslash scriptstyle\backslash hat\backslash mu$ is consistentConsistent estimatorIn statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0..., that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:
The estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the $\backslash scriptstyle\backslash hat\backslash sigma^2$. This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from $\backslash scriptstyle\backslash hat\backslash sigma^2$ by having {{nowrap(n − 1)}} instead of n in the denominator (the so called Bessel's correctionBessel's correctionIn statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...):
The difference between s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas $\backslash scriptstyle\backslash hat\backslash sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\backslash scriptstyle\backslash hat\backslash sigma^2$ is "better" than the s2 in terms of the mean squared errorMean squared errorIn statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or... (MSE) criterion. In finite samples both s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ have scaled chisquared distribution with {{nowrap(n − 1)}} degrees of freedom:
The first of these expressions shows that the variance of s2 is equal to {{nowrap2σ4/(n−1)}}, which is slightly greater than the σσelement of the inverse Fisher information matrix $\backslash scriptstyle\backslash mathcal\{I\}^\{1\}$. Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finitesample efficient estimator for σ2 does not exist.
Applying the asymptotic theory, both estimators s2 and $\backslash scriptstyle\backslash hat\backslash sigma^2$ are consistent, that is they converge in probability to σ2 as the sample size {{nowrapn → ∞}}. The two estimators are also both asymptotically normal:
In particular, both estimators are asymptotically efficient for σ2.
By Cochran's theoremCochran's theoremIn statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance. Statement :..., for normal distribution the sample mean $\backslash scriptstyle\backslash hat\backslash mu$ and the sample variance s2 are independent, which means there can be no gain in considering their joint distributionJoint distributionIn the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y.... There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\backslash scriptstyle\backslash hat\backslash mu$ and s can be employed to construct the socalled tstatistic:
This quantity t has the Student's tdistribution with {{nowrap(n − 1)}} degrees of freedom, and it is an ancillary statisticAncillary statisticIn statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken... (independent of the value of the parameters). Inverting the distribution of this tstatistics will allow us to construct the confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the... for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2:
where tk,p and {{SubSupχk,p2}} are the pth quantileQuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equalsized data subsets is the motivation for qquantiles; the quantiles are the data values marking the boundaries between consecutive subsets...s of the t and χ2distributions respectively. These confidence intervals are of the level {{nowrap1 − α}}, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take {{nowrapα {{=}} 5%}}, resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\backslash scriptstyle\backslash hat\backslash mu$ and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of {{nowrapα {{=}} 5%}}, results in {{nowrap{{!}}z0.025{{!}} {{=}} 1.961.961.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...}}.
Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three categories:
Exactly normal distributions;
Approximately normal laws, for example when such approximation is justified by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...; and
Distributions modeled as normal — the normal distribution being the distribution with maximum entropyPrinciple of maximum entropyIn Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution... for a given mean and variance.
Exact normality
Certain quantities in physicsPhysicsPhysics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic... are distributed normally, as was first demonstrated by James Clerk MaxwellJames Clerk MaxwellJames Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory.... Examples of such quantities are:
Velocities of the molecules in the ideal gasIdeal gasAn ideal gas is a theoretical gas composed of a set of randomlymoving, noninteracting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as.... More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
Probability density function of a ground state in a quantum harmonic oscillatorQuantum harmonic oscillatorThe quantum harmonic oscillator is the quantummechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics....
The position of a particle which experiences diffusionDiffusionMolecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles.... If initially the particle is located at a specific point (that is its probability distribution is the dirac delta functionDirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation {{nowrap1={{frac2∂∂t}} f(x,t) = {{frac212}} {{frac2∂2∂x2}} f(x,t)}}. If the initial location is given by a certain density function g(x), then the density at time t is the convolutionConvolutionIn mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to crosscorrelation... of g and the normal PDF.
Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
In counting problems, where the central limit theorem includes a discretetocontinuum approximation and where infinitely divisibleInfinite divisibilityThe concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory... and decomposableIndecomposable distributionIn probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more nonconstant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:... distributions are involved, such as
Binomial random variables, associated with binary response variables;
Poisson random variablesPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since..., associated with rare events;
Thermal light has a Bose–EinsteinBose–Einstein statisticsIn statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.Concept:... distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.
Assumed normality
{{cquoteI can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first approximation, particularly in theoretical investigations. — {{harvtxtPearson1901}}}}
There are statistical methods to empirically test that assumption, see the above Normality tests section.
In biologyBiologyBiology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines..., the logarithm of various variables tend to have a normal distribution, that is, they tend to have a lognormal distribution (after separation on male/female subpopulations), with examples including:
Measures of size of living tissue (length, height, skin area, weight);
The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
Certain physiological measurements, such as blood pressure of adult humans.
In financeFinance"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created..., in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît MandelbrotBenoît MandelbrotBenoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child... have argued that logLevy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crashStock market crashA stock market crash is a sudden dramatic decline of stock prices across a significant crosssection of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...es.
Measurement errorsPropagation of uncertaintyIn statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them... in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.
In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ testIntelligence quotientAn intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SATSATThe SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...'s traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
Many scores are derived from the normal distribution, including percentile rankPercentile rankThe percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....s ("percentiles" or "quantiles"), normal curve equivalentNormal curve equivalentIn educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...s, stanineStanineStanine is a method of scaling test scores on a ninepoint standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...s, zscoresStandard scoreIn statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation..., and Tscores. Additionally, a number of behavioral statisticalStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments.... procedures are based on the assumption that scores are normally distributed; for example, ttestsStudent's ttestA ttest is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known... and ANOVAsAnalysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation.... Bell curve gradingBell curve gradingIn education, grading on a curve is a statistical method of assigning grades designed to yield a predetermined distribution of grades among the students in a class... assigns relative grades based on a normal distribution of scores.
In hydrologyHydrologyHydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability... the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common.... The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysisCumulative frequency analysisCumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the nonexceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent....
Generating values from normal distribution
In computer simulations, especially in applications of the MonteCarlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a {{nowrapN(μ, σ{{sup=2}})}} can be generated as {{nowrapX {{=}} μ + σZ}}, where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniformUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... random variates.
The most straightforward method is based on the probability integral transformProbability integral transformIn statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution... property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in {{harvtxtHart1968}} and in the erfError functionIn mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations... article.
An easy to program approximate approach, that relies on the central limit theoremCentral limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common..., is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12section eleventhorder polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).
The Box–Muller method uses two independent random numbers U and V distributed uniformlyUniform distribution (continuous)In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by... on (0,1). Then the two random variables X and Y
will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm {{nowrapX2 + Y2}} will have the chisquared distribution with two degrees of freedom, which is an easily generated exponentialExponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e... random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.
Marsaglia polar methodMarsaglia polar methodThe polar method is a pseudorandom number sampling method for generating a pair of independent standard normal random variables... is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities
are returned. Again, X and Y will be independent and standard normally distributed.
The Ratio method is a rejection method. The algorithm proceeds as follows:
Generate two independent uniform deviates U and V;
Compute X = {{sqrt8/e}} (V − 0.5)/U;
If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.
The ziggurat algorithmZiggurat algorithmThe ziggurat algorithm is an algorithm for pseudorandom number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformlydistributed random numbers, typically from a pseudorandom number generator, as well as precomputed tables. The... {{harvMarsagliaTsang2000}} is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an iftest. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.
There is also some investigation{{Citation neededdate=June 2010}} into the connection between the fast Hadamard transformHadamard transformThe Hadamard transform is an example of a generalized class of Fourier transforms... and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.
Numerical approximations for the normal CDF
The standard normal CDFCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"... is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integrationNumerical integrationIn numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of..., Taylor seriesTaylor seriesIn mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point...., asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.
{{harvtxtAbramowitzStegun1964}} give the approximation for Φ(x) for x > 0 with the absolute error ε(x) < 7.5·10−8 (algorithm 26.2.17):
where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
{{harvtxtHart1968}} lists almost a hundred of rational functionRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:... approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by {{harvtxtWest2009}} combines Hart's algorithm 5666 with a continued fractionContinued fractionIn mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on... approximation in the tail to provide a fast computation algorithm with a 16digit precision.
{{harvtxtW. J. Cody1969}} after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev ApproximationRational functionIn mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.Definitions:.... (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).
{{harvtxtMarsaglia2004}} suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion
for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when {{nowrap1=x = 10}}).
The GNU Scientific LibraryGNU Scientific LibraryIn computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science... calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.
Development
Some authors attribute the credit for the discovery of the normal distribution to de MoivreAbraham de MoivreAbraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling..., who in 1738
De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii {{nowrap(a + b)n}} in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example {{harvtxtWalker1985}}.
published in the second edition of his "The Doctrine of ChancesThe Doctrine of ChancesThe Doctrine of Chances was the first textbook on probability theory, written by 18thcentury French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots..." the study of the coefficients in the binomial expansion of {{nowrap(a + b)n}}. De Moivre proved that the middle term in this expansion has the approximate magnitude of $\backslash scriptstyle\; 2/\backslash sqrt\{2\backslash pi\; n\}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\backslash scriptstyle\; \backslash frac\{2\backslash ell\backslash ell\}\{n\}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, StiglerStephen StiglerStephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics... points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.
In 1809 GaussCarl Friedrich GaussJohann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum... published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, {{nobrM′}}, {{nobrM′′, …}} to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability {{nobrφ(M−V) · φ(M′−V) · φ(M′′−V) · …}} of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the wellknown answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — {{harvtxtGauss1809loc=section 177}} Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:
$$
\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{\mathrm{hh}\Delta\Delta},
where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the nonlinear weighted least squares (NWLS) method.
Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from {{harvtxtPearson1905loc=p. 189}} It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral {{nowrapGaussian integralThe Gaussian integral, also known as the EulerPoisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and... in 1782, providing the normalization constant for the normal distribution. Finally, it was Laplace who in 1810 proved and presented to the Academy the fundamental central limit theorem, which emphasized the theoretical importance of the normal distribution.
It is of interest to note that in 1809 an American mathematician AdrainRobert AdrainRobert Adrain was a scientist and mathematician, considered one of the most brilliant mathematical minds of the time in America.... published two derivations of the normal probability law, simultaneously and independently from Gauss. His works remained largely unnoticed by the scientific community, until in 1871 they were "rediscovered" by AbbeCleveland AbbeCleveland Abbe was an American meteorologist and advocate of time zones. While director of the Cincinnati Observatory in Cincinnati, Ohio, he developed a system of telegraphic weather reports, daily weather maps, and weather forecasts. Congress in 1870 established the U.S. Weather Bureau and....
In the middle of the 19th century MaxwellJames Clerk MaxwellJames Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory... demonstrated that the normal distribution is not just a convenient mathematical tool, but may also occur in natural phenomena: "The number of particles whose velocity, resolved in a certain direction, lies between x and x + dx is
Naming
Since its introduction, the normal distribution has been known by many different names: the law of error, the law of facility of errors, Laplace's second law, Gaussian law, etc. By the end of the 19th century some authorsBesides those specifically referenced here, such use is encountered in the works of Peirce, GaltonFrancis GaltonSir Francis Galton /ˈfrɑːnsɪs ˈgɔːltn̩/ FRS , cousin of Douglas Strutt Galton, halfcousin of Charles Darwin, was an English Victorian polymath: anthropologist, eugenicist, tropical explorer, geographer, inventor, meteorologist, protogeneticist, psychometrician, and statistician...{{full}} and LexisWilhelm LexisWilhelm Lexis was an eminent German statistician, economist, and social scientist and a founder of the interdisciplinary study of insurance....{{full}} approximately around 1875.{{Citation neededdate=June 2011}} had started using the name normal distribution, where the word "normal" was used as an adjective — the term was derived from the fact that this distribution was seen as typical, common, normal. Peirce (one of those authors) once defined "normal" thus: "...the 'normal' is not the average (or any other kind of mean) of what actually occurs, but of what would, in the long run, occur under certain circumstances." Around the turn of the 20th century PearsonKarl PearsonKarl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics.... popularized the term normal as a designation for this distribution.
{{cquoteMany years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another 'abnormal'. — {{harvtxtPearson1920}}}}
Also, it was Pearson who first wrote the distribution in terms of the standard deviation σ as in modern notation. Soon after this, in year 1915, FisherRonald FisherSir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation... added the location parameter to the formula for normal distribution, expressing it in the way it is written nowadays:
The term "standard normal" which denotes the normal distribution with zero mean and unit variance came into general use around 1950s, appearing in the popular textbooks by P.G. Hoel (1947) "Introduction to mathematical statistics" and A.M. Mood (1950) "Introduction to the theory of statistics".
When the name is used, the "Gaussian distribution" was named after Carl Friedrich GaussCarl Friedrich GaussJohann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum..., who introduced the distribution in 1809 as a way of rationalizing the method of least squares as outlined above. The related work of Laplace, also outlined above has led to the normal distribution being sometimes called Laplacian,{{Citation neededdate=October 2010}} especially in Frenchspeaking countries. Among English speakers, both "normal distribution" and "Gaussian distribution" are in common use, with different terms preferred by different communities.
See also
{{PortalStatistics}}
Behrens–Fisher problem—the longstanding problem of testing whether two normal samples with different variances have same means;
Erdős–Kac theoremErdos–Kac theoremIn number theory, the Erdős–Kac theorem, named after Paul Erdős and Mark Kac, and also known as the fundamental theorem of probabilistic number theory, states that if ω is the number of distinct prime factors of n, then, loosely speaking, the probability distribution ofis the standard normal...—on the occurrence of the normal distribution in number theoryNumber theoryNumber theory is a branch of pure mathematics devoted primarily to the study of the integers. Number theorists study prime numbers as well...
Gaussian blurGaussian blurA Gaussian blur is the result of blurring an image by a Gaussian function. It is a widely used effect in graphics software, typically to reduce image noise and reduce detail...—convolutionConvolutionIn mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to crosscorrelation... which uses the normal distribution as a kernel
Sum of normally distributed random variablesSum of normally distributed random variablesIn probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.Independent random variables:If X...
External links
{{Commons categoryGalton box}}
Normal Distribution Video Tutorial Part 12
An 8 feet (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. YouTube link originating from Index Funds Advisors
{{Common univariate probability distributions}}
{{ProbDistributionscontinuousinfinite}}
{{DEFAULTSORT:Normal Distribution}}
ShowWikipediaFooter("Normal_distribution")
}