Normal distribution

# Normal distribution

Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

, known as the Gaussian function or informally the bell curve:The designation "bell curve" is ambiguous: there are many other distributions which are "bell"-shaped: the Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

, Student's t-distribution, generalized normal, logistic, etc.

where parameter μ is the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

or expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

(location of the peak) and σ 2 is the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. The distribution with and is called the standard normal. A normal distribution is often used as a first approximation to describe real-valued random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s that cluster around a single mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

value.

The normal distribution is considered the most prominent probability distribution in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.

For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural science
Natural science
The natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...

s, and social sciences as a simple model for complex phenomena. For example, the observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertainty
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

is computed using this assumption. Note that a normally-distributed variable has a symmetric distribution about its mean. Quantities that grow exponentially
Exponential growth
Exponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value...

, such as prices, incomes or populations, are often skewed to the right
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

, and hence may be better described by other distributions, such as the log-normal distribution or Pareto distribution. In addition, the probability of seeing a normally-distributed value that is far (i.e. more than a few standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

s) from the mean drops off extremely rapidly. As a result, statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavy-tailed distribution such as the Student's t-distribution.

From a technical perspective, alternative characterizations are possible, for example:
• The normal distribution is the only absolutely continuous
Absolute continuity
In mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...

distribution all of whose cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s beyond the first two (i.e. other than the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

) are zero.
• For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

## Definition

The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

The factor $\scriptstyle\ 1/\sqrt{2\pi}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof
Gaussian integral
The Gaussian integral, also known as the Euler-Poisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...

]
and in the exponent makes the "width" of the curve (measured as half the distance between the inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), whereas density functions
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.

More generally, a normal distribution results from exponentiating a quadratic function
A quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the y-axis....

(just as an exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

results from exponentiating a linear function):

This yields the classic "bell curve" shape, provided that so that the quadratic function is concave
Concave function
In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

. everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).

Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,

For a standard normal distribution, and . The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.

The parameter μ is at the same time the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

. The square root of σ2 is called the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

and is the width of the density function.

The normal distribution is usually denoted by N(μ, σ2). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write

### Alternative formulations

Some authors advocate using the precision
Precision (statistics)
In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....

instead of the variance, and variously define it as or . This parametrization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.

The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance :

goes even further and insists the standard normal to be with the variance :

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal with mean μ and precision τ will be equal to

## Characterization

In the previous section the normal distribution was defined by specifying its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

. However there are other ways to characterize
Characterization (mathematics)
In mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in...

a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. They include: the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

, the moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

, the cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s, the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

, the moment-generating function
Moment-generating function
In probability theory and statistics, the moment-generating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...

, etc.

### Probability density function

The probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

(pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:

This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".

Properties:
• Function f(x) is unimodal and symmetric around the point , which is at the same time the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of the distribution.
• The inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s of the curve occur one standard deviation away from the mean (i.e., at and ).
• Function f(x) is log-concave.
• The standard normal density ϕ(x) is an eigenfunction
Eigenfunction
In mathematics, an eigenfunction of a linear operator, A, defined on some function space is any non-zero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has...

of the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The first derivative of ϕ(x) is ; the second derivative is . More generally, the n-th derivative is given by , where Hn is the Hermite polynomial of order n.

When , the density function doesn't exist. However a generalized function
Generalized function
In mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges...

that defines a measure on the real line, and it can be used to calculate, for example, expected value is

where δ(x) is the Dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

which is equal to infinity at and is zero elsewhere.

### Cumulative distribution function

The cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) describes probability of a random variable falling in the interval .

The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), and can be computed as an integral of the probability density function:

This integral cannot be expressed in terms of elementary functions, so is simply called the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

, or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the CDF will be equal to

The complement of the standard normal CDF, , is referred to as the Q-function
Q-function
In statistics, the Q-function is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x...

, especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are simple transformations of Φ, are also used occasionally.

Properties:
• The standard normal CDF is 2-fold rotationally symmetric around point (0, ½):  .
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  .
• The antiderivative
Antiderivative
In calculus, an "anti-derivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f...

of Φ(x) is:  .

For a normal distribution with zero variance, the CDF is the Heaviside step function
Heaviside step function
The Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument....

(with convention):

### Quantile function

The inverse of the standard normal CDF, called the quantile function
Quantile function
In probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability...

or probit function, is expressed in terms of the inverse error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

:

Quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the interval. The quantiles are used in hypothesis testing, construction of confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s and Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

s. The most "famous" normal quantile is . A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.

For a normal random variable with mean μ and variance σ2, the quantile function is

### Characteristic function and moment generating function

The characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

φX(t) of a random variable X is defined as the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of eitX, where i is the imaginary unit
Imaginary unit
In mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek...

, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is

The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) eiμz − σ2z2 for all z ∈ C.

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are nonzero.

### Moments

The normal distribution has moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

of all orders. That is, for a normally distributed X with mean μ and variance , the expectation ] exists and is finite for all p such that . Usually we are interested only in moments of integer orders: .

• Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of . Using standardization of normal random variables, this expectation will be equal to , where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every odd number from n to 1.

• Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders, but are nonzero for all odd ps.

The last formula is true for any non-integer .

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric function
Confluent hypergeometric function
In mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...

s 1F1 and U.

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.

• First two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.

Order Raw moment Central moment Cumulant
1 μ 0 μ
2 μ2 + σ2 σ 2 σ 2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4  4 0
5 μ5 + 10μ3σ2 + 15μσ4 0 0
6 μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ 6 0
7 μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6 0 0
8 μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ 8 0

### Standardizing normal random variables

As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via

### Standard deviation and confidence intervals

About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 68-95-99.7 rule
68-95-99.7 rule
In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean....

, or the empirical rule, or the 3-sigma rule.
To be more precise, the area under the bell curve between and is given by

where erf is the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:
i.e. 1 minus ... or 1 in ...
1
2
3
4
5
6

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s of the specified levels based on normally distributed (or asymptotically normal) estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s:
n     n
0.80 0.999
0.90 0.9999
0.95 0.99999
0.98 0.999999
0.99 0.9999999
0.995 0.99999999
0.998 0.999999999

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.

### Central limit theorem

The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2, then the central limit theorem states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

The importance of the central limit theorem cannot be overemphasized. A great number of test statistic
Test statistic
In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...

s, score
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

s, and estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
• The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
• The Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

(λ) distribution is approximately normal N(λ, λ) for large values of λ.
• The chi-squared distribution χ2(k) is approximately normal N(k, 2k) for large ks.
• The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theorem
Berry–Esséen theorem
The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...

, improvements of the approximation are given by the Edgeworth expansions.

### Miscellaneous

1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform (for some real numbers a and b) is also normally distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proof
Sum of normally distributed random variables
In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...

]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér's decomposition theorem
Cramér's theorem
In mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist.- Normal random variables :...

. The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary non-normal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.

3. If the characteristic function φX of some random variable X is of the form , where Q(t) is a polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

, then the Marcinkiewicz theorem (named after Józef Marcinkiewicz
Józef Marcinkiewicz
Józef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....

) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of non-zero cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s.

4. If X and Y are jointly normal and uncorrelated
Uncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

, then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.[proof
Normally distributed and uncorrelated does not imply independent
In probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....

]
For non-normal random variables uncorrelatedness does not imply independence.

5. If X and Y are independent random variables, then and are also independent and identically distributed (this follows from the polarization identity
Polarization identity
In mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \|x\| \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...

). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that and are also independent, then both X and Y must necessarily have normal distributions.

More generally, if X1, ..., Xn are independent random variables, then two linear combinations ∑akXk and ∑bkXk will be independent if and only if all Xks are normal and , where denotes the variance of Xk.

6. Normal distribution is infinitely divisible
Infinite divisibility (probability)
The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...

: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that

7. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent random variables and a, b are arbitrary real numbers, then

where X3 is also . This relationship directly follows from property (1).

8. The Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

between two normal distributions and is given by:

The Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

between the same distributions is equal to

9. The Fisher information matrix for normal distribution is diagonal and takes form

10. Normal distributions belongs to an exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

with natural parameters and , and natural statistics x and x2. The dual, expectation parameters for normal distribution are and .

11. The conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

of the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of a normal distribution is another normal distribution. Specifically, if x1, …, xn are iid and the prior is , then the posterior distribution for the estimator of μ will be

12. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution is the one with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

13. The family of normal distributions forms a manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

with constant curvature
Constant curvature
In mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points...

−1. The same family is flat
Flat manifold
In mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°....

with respect to the (±1)-connections ∇(e) and ∇(m).

### Operations on a single random variable

If X is distributed normally with mean μ and variance σ2, then

• The exponential of X is distributed log-normally: .
• The absolute value of X has folded normal distribution
Folded Normal Distribution
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some...

: . If this is known as the half-normal distribution
Half-normal distribution
The half-normal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e...

.
• The square of X/σ has the noncentral chi-squared distribution with one degree of freedom: . If μ = 0, the distribution is called simply chi-squared.
• The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

.
• (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.

#### Combination of two independent random variables

If X1 and X2 are two independent standard normal random variables, then

• Their sum and difference is distributed normally with mean zero and variance two: .
• Their product follows the "product-normal" distribution with density function where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

.
• Their ratio follows the standard Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

: .
• Their Euclidean norm $\scriptstyle\sqrt{X_1^2\,+\,X_2^2}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.

#### Combination of two or more independent random variables

• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chi-squared distribution with n degrees of freedom: $\scriptstyle X_1^2 + \cdots + X_n^2\ \sim\ \chi_n^2$.
• If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

, which can be demonstrated using the Basu's theorem
Basu's theorem
In statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....

or Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

. The ratio of these two quantities will have the Student's t-distribution with n − 1 degrees of freedom:

• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the with (n, m) degrees of freedom:

#### Operations on the density function

The split normal distribution
Split normal distribution
In probability theory and statistics, the split normal distribution also known as the two-piece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances...

is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

results from rescaling a section of a single density function.

### Extensions

The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is one-dimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
• Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space
Euclidean space
In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...

. A vector is multivariate-normally distributed if any linear combination of its components has a (univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V.
• Rectified Gaussian distribution
Rectified Gaussian Distribution
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0...

a rectified version of normal distribution with all the negative elements reset to 0
• Complex normal distribution
Complex normal distribution
In probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C...

deals with the complex normal vectors. A complex vector is said to be normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
• Matrix normal distribution
Matrix normal distribution
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrix-valued random variables.- Definition :...

describes the case of normally distributed matrices.
• Gaussian process
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...

es are the normally distributed stochastic process
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

es. These can be viewed as elements of some infinite-dimensional Hilbert space
Hilbert space
The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions...

H, and thus are the analogues of multivariate normal vectors for the case . A random element is said to be normal if for any constant the scalar product  has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance . Several Gaussian processes became popular enough to have their own names:
• Brownian motion
Wiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...

,
• Brownian bridge
Brownian bridge
A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...

,
• Ornstein–Uhlenbeck process.
• Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

is an abstract mathematical construction which represents a "q-analogue" of the normal distribution.
• the q-Gaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropy
Tsallis entropy
In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated...

, and is one type of Tsallis distribution
Tsallis distribution
In q-analog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual...

. Note that this distribution is different from the Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

above.

One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
• Pearson distribution
Pearson distribution
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.- History :...

— a four-parametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.

## Normality tests

Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
• "Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
• Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

— is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
• P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
• Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
• Normal probability plot
Normal probability plot
The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed....

(rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...

plot)

• Moment tests:
• D'Agostino's K-squared test
D'Agostino's K-squared test
In statistics, D’Agostino’s K2 test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...

• Jarque–Bera test
Jarque–Bera test
In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...

• Empirical distribution function tests:
• Lilliefors test
Lilliefors test
In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test...

(an adaptation of the Kolmogorov–Smirnov test)
• Anderson–Darling test

## Estimation of parameters

It is often the case that we don't know the parameters of the normal distribution, but instead want to estimate
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

them. That is, having a sample (x1, …, xn) from a normal population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

method, which requires maximization of the log-likelihood function:

Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:

Estimator $\scriptstyle\hat\mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\scriptstyle\overline{x}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...

, $\scriptstyle\hat\mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. This implies that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....

of $\scriptstyle\hat\mu$ is proportional to $\scriptstyle1/\sqrt{n}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...

s and the number of trials in Monte Carlo simulations.

From the standpoint of the asymptotic theory
Asymptotic theory (statistics)
In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...

, $\scriptstyle\hat\mu$ is consistent
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

, that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator $\scriptstyle\hat\sigma^2$ is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the $\scriptstyle\hat\sigma^2$. This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from $\scriptstyle\hat\sigma^2$ by having instead of n in the denominator (the so called Bessel's correction
Bessel's correction
In statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...

):

The difference between s2 and $\scriptstyle\hat\sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas $\scriptstyle\hat\sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\scriptstyle\hat\sigma^2$ is "better" than the s2 in terms of the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

(MSE) criterion. In finite samples both s2 and $\scriptstyle\hat\sigma^2$ have scaled chi-squared distribution with degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to , which is slightly greater than the σσ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist.

Applying the asymptotic theory, both estimators s2 and $\scriptstyle\hat\sigma^2$ are consistent, that is they converge in probability to σ2 as the sample size . The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ2.

By Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

, for normal distribution the sample mean $\scriptstyle\hat\mu$ and the sample variance s2 are independent, which means there can be no gain in considering their joint distribution
Joint distribution
In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...

. There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\scriptstyle\hat\mu$ and s can be employed to construct the so-called t-statistic:

This quantity t has the Student's t-distribution with degrees of freedom, and it is an ancillary statistic
Ancillary statistic
In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...

(independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct the confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2:

where tk,p and are the pth quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the t- and χ2-distributions respectively. These confidence intervals are of the level , meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take , resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\scriptstyle\hat\mu$ and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of , results in .

## Occurrence

The occurrence of normal distribution in practical problems can be loosely classified into three categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

; and
3. Distributions modeled as normal — the normal distribution being the distribution with maximum entropy
Principle of maximum entropy
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...

for a given mean and variance.

### Exact normality

Certain quantities in physics
Physics
Physics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...

are distributed normally, as was first demonstrated by James Clerk Maxwell
James Clerk Maxwell
James Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory...

. Examples of such quantities are:
• Velocities of the molecules in the ideal gas
Ideal gas
An ideal gas is a theoretical gas composed of a set of randomly-moving, non-interacting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as...

. More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
• Probability density function of a ground state in a quantum harmonic oscillator
Quantum harmonic oscillator
The quantum harmonic oscillator is the quantum-mechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics...

.
• The position of a particle which experiences diffusion
Diffusion
Molecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles...

. If initially the particle is located at a specific point (that is its probability distribution is the dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation  . If the initial location is given by a certain density function g(x), then the density at time t is the convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

of g and the normal PDF.

### Approximate normality

Approximately normal distributions occur in many situations, as explained by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where infinitely divisible
Infinite divisibility
The concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory...

and decomposable
Indecomposable distribution
In probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more non-constant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:...

distributions are involved, such as
• Binomial random variables, associated with binary response variables;
• Poisson random variables
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

, associated with rare events;
• Thermal light has a Bose–Einstein
Bose–Einstein statistics
In statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.-Concept:...

distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

### Assumed normality

There are statistical methods to empirically test that assumption, see the above Normality tests section.
• In biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

, the logarithm of various variables tend to have a normal distribution, that is, they tend to have a log-normal distribution (after separation on male/female subpopulations), with examples including:
• Measures of size of living tissue (length, height, skin area, weight);
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
• Certain physiological measurements, such as blood pressure of adult humans.
• In finance
Finance
"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created...

, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît Mandelbrot
Benoît Mandelbrot
Benoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child...

have argued that log-Levy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crash
Stock market crash
A stock market crash is a sudden dramatic decline of stock prices across a significant cross-section of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...

es.
• Measurement errors
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.

• In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ test
Intelligence quotient
An intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...

), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SAT
SAT
The SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...

's traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
• Many scores are derived from the normal distribution, including percentile rank
Percentile rank
The percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....

s ("percentiles" or "quantiles"), normal curve equivalent
Normal curve equivalent
In educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...

s, stanine
Stanine
Stanine is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...

s, z-scores
Standard score
In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation...

, and T-scores. Additionally, a number of behavioral statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

procedures are based on the assumption that scores are normally distributed; for example, t-tests
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

and ANOVAs
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

. Bell curve grading
In education, grading on a curve is a statistical method of assigning grades designed to yield a pre-determined distribution of grades among the students in a class...

assigns relative grades based on a normal distribution of scores.
• In hydrology
Hydrology
Hydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability...

the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis
Cumulative frequency analysis
Cumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the non-exceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent...

.

## Generating values from normal distribution

In computer simulations, especially in applications of the Monte-Carlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a can be generated as , where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniform
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

random variates.

• The most straightforward method is based on the probability integral transform
Probability integral transform
In statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution...

property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in and in the erf
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

article.

• An easy to program approximate approach, that relies on the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).

• The Box–Muller method uses two independent random numbers U and V distributed uniformly
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

on (0,1). Then the two random variables X and Y

will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm will have the chi-squared distribution with two degrees of freedom, which is an easily generated exponential
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.

• Marsaglia polar method
Marsaglia polar method
The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables...

is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities

are returned. Again, X and Y will be independent and standard normally distributed.

• The Ratio method is a rejection method. The algorithm proceeds as follows:
• Generate two independent uniform deviates U and V;
• Compute X = (V − 0.5)/U;
• If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
• If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
• If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.

• The ziggurat algorithm
Ziggurat algorithm
The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The...

is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.

• There is also some investigation into the connection between the fast Hadamard transform
The Hadamard transform is an example of a generalized class of Fourier transforms...

and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.

## Numerical approximations for the normal CDF

The standard normal CDF
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration
Numerical integration
In numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of...

, Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....

, asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.

• give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8 (algorithm 26.2.17):

where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.

• lists almost a hundred of rational function
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by combines Hart's algorithm 5666 with a continued fraction
Continued fraction
In mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on...

approximation in the tail to provide a fast computation algorithm with a 16-digit precision.

• after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev Approximation
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

. (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).

• suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion

for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when ).

• The GNU Scientific Library
GNU Scientific Library
In computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science...

calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.

### Development

Some authors attribute the credit for the discovery of the normal distribution to de Moivre
Abraham de Moivre
Abraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling...

, who in 1738

De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example .

published in the second edition of his "The Doctrine of Chances
The Doctrine of Chances
The Doctrine of Chances was the first textbook on probability theory, written by 18th-century French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots...

" the study of the coefficients in the binomial expansion of . De Moivre proved that the middle term in this expansion has the approximate magnitude of $\scriptstyle 2/\sqrt{2\pi n}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\scriptstyle -\frac{2\ell\ell}{n}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, Stigler
Stephen Stigler
Stephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics...

points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.

In 1809 Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, , to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the well-known answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:


\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{-\mathrm{hh}\Delta\Delta},

where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.

Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral
{{About|the univariate normal distribution|normally distributed vectors|Multivariate normal distribution}}

{{Probability distribution
| name =
| type = density
| pdf_image =

The red line is the standard normal distribution
| cdf_image =

Colors match the image above
| notation =
| parameters = {{nowrap|μ ∈ R}} — mean (
location
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

)
{{nowrap|σ2 > 0}} — variance (squared scale
Scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...

)
| support = x ∈ R
| pdf =
| cdf =
| mean = μ
| median = μ
| mode = μ
| variance = σ2
| skewness = 0
| kurtosis = 0
| entropy =
| mgf =
| char =
| fisher =
| conjugate prior = Normal distribution
}}

In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

, known as the Gaussian function or informally the bell curve:The designation "bell curve" is ambiguous: there are many other distributions which are "bell"-shaped: the Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

, Student's t-distribution, generalized normal, logistic, etc.

where parameter μ is the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

or expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

(location of the peak) and σ 2 is the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. The distribution with {{nowrap|μ {{=}} 0}} and {{nowrap|σ 2 {{=}} 1}} is called the standard normal. A normal distribution is often used as a first approximation to describe real-valued random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s that cluster around a single mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

value.

The normal distribution is considered the most prominent probability distribution in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.

For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural science
Natural science
The natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...

s, and social sciences as a simple model for complex phenomena. For example, the observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertainty
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

is computed using this assumption. Note that a normally-distributed variable has a symmetric distribution about its mean. Quantities that grow exponentially
Exponential growth
Exponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value...

, such as prices, incomes or populations, are often skewed to the right
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

, and hence may be better described by other distributions, such as the log-normal distribution or Pareto distribution. In addition, the probability of seeing a normally-distributed value that is far (i.e. more than a few standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

s) from the mean drops off extremely rapidly. As a result, statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavy-tailed distribution such as the Student's t-distribution.

From a technical perspective, alternative characterizations are possible, for example:
• The normal distribution is the only absolutely continuous
Absolute continuity
In mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...

distribution all of whose cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s beyond the first two (i.e. other than the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

) are zero.
• For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

## Definition

The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

The factor $\scriptstyle\ 1/\sqrt{2\pi}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof
Gaussian integral
The Gaussian integral, also known as the Euler-Poisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...

]
and {{frac2|1|2}} in the exponent makes the "width" of the curve (measured as half the distance between the inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), whereas density functions
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.

More generally, a normal distribution results from exponentiating a quadratic function
A quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the y-axis....

(just as an exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

results from exponentiating a linear function):

This yields the classic "bell curve" shape, provided that {{nowrap|a < 0}} so that the quadratic function is concave
Concave function
In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

. {{nowrap|f(x) > 0}} everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).

Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

{{nowrap|μ {{=}} − {{frac2|b|2a}}}} and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

{{nowrap|σ2 {{=}} − {{frac2|1|2a}}}}. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,

For a standard normal distribution, {{nowrap|1=μ = 0}} and {{nowrap|1=σ2 = 1}}. The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.

The parameter μ is at the same time the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

. The square root of σ2 is called the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

and is the width of the density function.

The normal distribution is usually denoted by N(μ, σ2). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write

### Alternative formulations

Some authors advocate using the precision
Precision (statistics)
In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....

instead of the variance, and variously define it as {{nowrap|τ {{=}} σ−2}} or {{nowrap|τ {{=}} σ−1}}. This parametrization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.

The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance {{nowrap|σ2 {{=}} {{frac2|1|2}}}} :

{{harvtxt|Stigler|1982}} goes even further and insists the standard normal to be with the variance {{nowrap|σ2 {{=}} {{frac2|1|2π}}}} :

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal {{nowrap|N(μ, τ)}} with mean μ and precision τ will be equal to

## Characterization

In the previous section the normal distribution was defined by specifying its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

. However there are other ways to characterize
Characterization (mathematics)
In mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in...

a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. They include: the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

, the moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

, the cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s, the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

, the moment-generating function
Moment-generating function
In probability theory and statistics, the moment-generating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...

, etc.

### Probability density function

The probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

(pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:

This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".

Properties:
• Function f(x) is unimodal and symmetric around the point {{nowrap|x {{=}} μ}}, which is at the same time the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of the distribution.
• The inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s of the curve occur one standard deviation away from the mean (i.e., at {{nowrap|x {{=}} μ − σ}} and {{nowrap|x {{=}} μ + σ}}).
• Function f(x) is log-concave.
• The standard normal density ϕ(x) is an eigenfunction
Eigenfunction
In mathematics, an eigenfunction of a linear operator, A, defined on some function space is any non-zero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has...

of the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The first derivative of ϕ(x) is {{nowrap|ϕ′(x) {{=}} −x·ϕ(x)}}; the second derivative is {{nowrap|ϕ′′(x) {{=}} (x2 − 1)ϕ(x)}}. More generally, the n-th derivative is given by {{nowrap|ϕ(n)(x) {{=}} (−1)nHn(x)ϕ(x)}}, where Hn is the Hermite polynomial of order n.

When {{nowrap|σ2 {{=}} 0}}, the density function doesn't exist. However a generalized function
Generalized function
In mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges...

that defines a measure on the real line, and it can be used to calculate, for example, expected value is

where δ(x) is the Dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

which is equal to infinity at {{nowrap|x {{=}} 0}} and is zero elsewhere.

### Cumulative distribution function

The cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) describes probability of a random variable falling in the interval {{nowrap|(−∞, x]}}.

The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), and can be computed as an integral of the probability density function:

This integral cannot be expressed in terms of elementary functions, so is simply called the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

, or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the CDF will be equal to

The complement of the standard normal CDF, {{nowrap|Q(x) {{=}} 1 − Φ(x)}}, is referred to as the Q-function
Q-function
In statistics, the Q-function is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x...

, especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are simple transformations of Φ, are also used occasionally.

Properties:
• The standard normal CDF is 2-fold rotationally symmetric around point (0, ½):  {{nowrap| Φ(−x) {{=}} 1 − Φ(x)}}.
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  {{nowrap| Φ′(x) {{=}} ϕ(x)}}.
• The antiderivative
Antiderivative
In calculus, an "anti-derivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f...

of Φ(x) is:  {{nowrap|1 = ∫ Φ(x) dx = x Φ(x) + ϕ(x)}}.

For a normal distribution with zero variance, the CDF is the Heaviside step function
Heaviside step function
The Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument....

(with {{nowrap|H(0) {{=}} 1}} convention):

### Quantile function

The inverse of the standard normal CDF, called the quantile function
Quantile function
In probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability...

or probit function, is expressed in terms of the inverse error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

:

Quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the {{nowrap|(−∞, zp]}} interval. The quantiles are used in hypothesis testing, construction of confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s and Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

s. The most "famous" normal quantile is {{nowrap|1.96
1.96
1.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...

{{=}} z0.975}}. A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.

For a normal random variable with mean μ and variance σ2, the quantile function is

### Characteristic function and moment generating function

The characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

φX(t) of a random variable X is defined as the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of eitX, where i is the imaginary unit
Imaginary unit
In mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek...

, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is

The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) {{=}} eiμz − {{frac2|1|2}}σ2z2 for all z ∈ C.

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are nonzero.

### Moments

{{see also|List of integrals of Gaussian functions}}
The normal distribution has moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

of all orders. That is, for a normally distributed X with mean μ and variance {{nowrap|σ 2}}, the expectation {{nowrap|E[{{!}}X{{!}}p}}] exists and is finite for all p such that {{nowrap|Re[p] > −1}}. Usually we are interested only in moments of integer orders: {{nowrap|p {{=}} 1, 2, 3, …}}.

• Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of {{nowrap|(X − μ) p}}. Using standardization of normal random variables, this expectation will be equal to {{nowrap|σ p · E[Zp]}}, where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every odd number from n to 1.

• Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders, but are nonzero for all odd ps.

The last formula is true for any non-integer {{nowrap|p > −1}}.

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric function
Confluent hypergeometric function
In mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...

s 1F1 and U.{{Citation needed|date=June 2010}}

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.

• First two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.

Order Raw moment Central moment Cumulant
1 μ 0 μ
2 μ2 + σ2 σ 2 σ 2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4  4 0
5 μ5 + 10μ3σ2 + 15μσ4 0 0
6 μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ 6 0
7 μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6 0 0
8 μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ 8 0

### Standardizing normal random variables

As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via

### Standard deviation and confidence intervals

About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 68-95-99.7 rule
68-95-99.7 rule
In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean....

, or the empirical rule, or the 3-sigma rule.
To be more precise, the area under the bell curve between {{nowrap|μ − nσ}} and {{nowrap|μ + nσ}} is given by

where erf is the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:
i.e. 1 minus ... or 1 in ...
1 {{val|0.682689492137}} {{val|0.317310507863}} {{val|3.15148718753}}
2 {{val|0.954499736104}} {{val|0.045500263896}} {{val|21.9778945080}}
3 {{val|0.997300203937}} {{val|0.002699796063}} {{val|370.398347345}}
4 {{val|0.999936657516}} {{val|0.000063342484}} {{val|15787.1927673}}
5 {{val|0.999999426697}} {{val|0.000000573303}} {{val|1744277.89362}}
6 {{val|0.999999998027}} {{val|0.000000001973}} {{val|506797345.897}}

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s of the specified levels based on normally distributed (or asymptotically normal) estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s:
n     n
0.80 {{val|1.281551565545}} 0.999 {{val|3.290526731492}}
0.90 {{val|1.644853626951}} 0.9999 {{val|3.890591886413}}
0.95 {{val|1.959963984540}} 0.99999 {{val|4.417173413469}}
0.98 {{val|2.326347874041}} 0.999999 {{val|4.891638475699}}
0.99 {{val|2.575829303549}} 0.9999999 {{val|5.326723886384}}
0.995 {{val|2.807033768344}} 0.99999999 {{val|5.730728868236}}
0.998 {{val|3.090232306168}} 0.999999999 {{val|6.109410204869}}

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.

### Central limit theorem

{{Main|Central limit theorem}}

The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2, then the central limit theorem states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

The importance of the central limit theorem cannot be overemphasized. A great number of test statistic
Test statistic
In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...

s, score
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

s, and estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
• The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
• The Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

(λ) distribution is approximately normal N(λ, λ) for large values of λ.
• The chi-squared distribution χ2(k) is approximately normal N(k, 2k) for large ks.
• The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theorem
Berry–Esséen theorem
The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...

, improvements of the approximation are given by the Edgeworth expansions.

### Miscellaneous

1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform {{nowrap|aX + b}} (for some real numbers a and b) is also normally distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proof
Sum of normally distributed random variables
In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...

]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum {{nowrap|X1 + X2}} is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér's decomposition theorem
Cramér's theorem
In mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist.- Normal random variables :...

. The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary non-normal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.

3. If the characteristic function φX of some random variable X is of the form {{nowrap|φX(t) {{=}} eQ(t)}}, where Q(t) is a polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

, then the Marcinkiewicz theorem (named after Józef Marcinkiewicz
Józef Marcinkiewicz
Józef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....

) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of non-zero cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s.

4. If X and Y are jointly normal and uncorrelated
Uncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

, then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.[proof
Normally distributed and uncorrelated does not imply independent
In probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....

]
For non-normal random variables uncorrelatedness does not imply independence.

5. If X and Y are independent {{nowrap|N(μ, σ 2)}} random variables, then {{nowrap|X + Y}} and {{nowrap|X − Y}} are also independent and identically distributed (this follows from the polarization identity
Polarization identity
In mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \|x\| \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...

). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that {{nowrap|X + Y}} and {{nowrap|X − Y}} are also independent, then both X and Y must necessarily have normal distributions.

More generally, if X1, ..., Xn are independent random variables, then two linear combinations ∑akXk and ∑bkXk will be independent if and only if all Xks are normal and {{nowrap|∑akbk{{SubSup|σ|k|2}} {{=}} 0}}, where {{SubSup|σ|k|2}} denotes the variance of Xk.

6. Normal distribution is infinitely divisible
Infinite divisibility (probability)
The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...

: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that

7. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent {{nowrap|N(μ, σ2)}} random variables and a, b are arbitrary real numbers, then

where X3 is also {{nowrap|N(μ, σ2)}}. This relationship directly follows from property (1).

8. The Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

between two normal distributions {{nowrap|1=X1 ∼ N(μ1, σ21 )}}and {{nowrap|1=X2 ∼ N(μ2, σ22 )}}is given by:

The Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

between the same distributions is equal to

9. The Fisher information matrix for normal distribution is diagonal and takes form

10. Normal distributions belongs to an exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

with natural parameters and , and natural statistics x and x2. The dual, expectation parameters for normal distribution are {{nowrap|1=η1 = μ}} and {{nowrap|1=η2 = μ2 + σ2}}.

11. The conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

of the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of a normal distribution is another normal distribution. Specifically, if x1, …, xn are iid {{nowrap|N(μ, σ2)}} and the prior is {{nowrap|μ ~ N(μ0, σ{{su|p=2|b=0}})}}, then the posterior distribution for the estimator of μ will be

12. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution {{nowrap|N(μ, σ2)}} is the one with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

13. The family of normal distributions forms a manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

with constant curvature
Constant curvature
In mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points...

−1. The same family is flat
Flat manifold
In mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°....

with respect to the (±1)-connections ∇(e) and ∇(m).

### Operations on a single random variable

If X is distributed normally with mean μ and variance σ2, then

• The exponential of X is distributed log-normally: {{nowrap|eX ~ lnN (μ, σ2)}}.
• The absolute value of X has folded normal distribution
Folded Normal Distribution
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some...

: {{nowrap|IXI ~ Nf (μ, σ2)}}. If {{nowrap|μ {{=}} 0}} this is known as the half-normal distribution
Half-normal distribution
The half-normal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e...

.
• The square of X/σ has the noncentral chi-squared distribution with one degree of freedom: {{nowrap|1= X22 ~ χ2122)}}. If μ = 0, the distribution is called simply chi-squared.
• The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

.
• (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.

#### Combination of two independent random variables

If X1 and X2 are two independent standard normal random variables, then

• Their sum and difference is distributed normally with mean zero and variance two: {{nowrap|X1 ± X2 ∼ N(0, 2)}}.
• Their product {{nowrap|Z {{=}} X1·X2}} follows the "product-normal" distribution with density function {{nowrap|fZ(z) {{=}} π−1K0({{!}}z{{!}}),}} where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

{{nowrap|1= φZ(t) = (1 + t 2)−1/2}}.
• Their ratio follows the standard Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

: {{nowrap|X1 ÷ X2 ∼ Cauchy(0, 1)}}.
• Their Euclidean norm $\scriptstyle\sqrt{X_1^2\,+\,X_2^2}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.

#### Combination of two or more independent random variables

• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chi-squared distribution with n degrees of freedom: $\scriptstyle X_1^2 + \cdots + X_n^2\ \sim\ \chi_n^2$.
• If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

, which can be demonstrated using the Basu's theorem
Basu's theorem
In statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....

or Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

. The ratio of these two quantities will have the Student's t-distribution with n − 1 degrees of freedom:

• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the {{nowrap|F-distribution}} with (n, m) degrees of freedom:

#### Operations on the density function

The split normal distribution
Split normal distribution
In probability theory and statistics, the split normal distribution also known as the two-piece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances...

is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

results from rescaling a section of a single density function.

### Extensions

The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is one-dimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
• Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space
Euclidean space
In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...

. A vector {{nowrap|X ∈ Rk}} is multivariate-normally distributed if any linear combination of its components {{nowrap|∑{{su|p=k|b=j=1}}aj Xj}} has a (univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V.
• Rectified Gaussian distribution
Rectified Gaussian Distribution
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0...

a rectified version of normal distribution with all the negative elements reset to 0
• Complex normal distribution
Complex normal distribution
In probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C...

deals with the complex normal vectors. A complex vector {{nowrap|X ∈ Ck}} is said to be normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
• Matrix normal distribution
Matrix normal distribution
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrix-valued random variables.- Definition :...

describes the case of normally distributed matrices.
• Gaussian process
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...

es are the normally distributed stochastic process
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

es. These can be viewed as elements of some infinite-dimensional Hilbert space
Hilbert space
The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions...

H, and thus are the analogues of multivariate normal vectors for the case {{nowrap|k {{=}} ∞}}. A random element {{nowrap|h ∈ H}} is said to be normal if for any constant {{nowrap|a ∈ H}} the scalar product {{nowrap|(a, h)}} has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance {{nowrap|operator K: H → H}}. Several Gaussian processes became popular enough to have their own names:
• Brownian motion
Wiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...

,
• Brownian bridge
Brownian bridge
A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...

,
• Ornstein–Uhlenbeck process.
• Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

is an abstract mathematical construction which represents a "q-analogue" of the normal distribution.
• the q-Gaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropy
Tsallis entropy
In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated...

, and is one type of Tsallis distribution
Tsallis distribution
In q-analog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual...

. Note that this distribution is different from the Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

above.

One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
• Pearson distribution
Pearson distribution
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.- History :...

— a four-parametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.

## Normality tests

{{Main|Normality tests}}

Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
• "Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
• Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

— is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
• P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
• Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
• Normal probability plot
Normal probability plot
The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed....

(rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...

plot)

• Moment tests:
• D'Agostino's K-squared test
D'Agostino's K-squared test
In statistics, D’Agostino’s K2 test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...

• Jarque–Bera test
Jarque–Bera test
In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...

• Empirical distribution function tests:
• Lilliefors test
Lilliefors test
In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test...

(an adaptation of the Kolmogorov–Smirnov test)
• Anderson–Darling test

## Estimation of parameters

It is often the case that we don't know the parameters of the normal distribution, but instead want to estimate
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

them. That is, having a sample (x1, …, xn) from a normal {{nowrap|N(μ, σ2)}} population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

method, which requires maximization of the log-likelihood function:

Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:

Estimator $\scriptstyle\hat\mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\scriptstyle\overline{x}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...

, $\scriptstyle\hat\mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. This implies that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....

of $\scriptstyle\hat\mu$ is proportional to $\scriptstyle1/\sqrt{n}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...

s and the number of trials in Monte Carlo simulations.

From the standpoint of the asymptotic theory
Asymptotic theory (statistics)
In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...

, $\scriptstyle\hat\mu$ is consistent
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

, that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator $\scriptstyle\hat\sigma^2$ is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the $\scriptstyle\hat\sigma^2$. This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from $\scriptstyle\hat\sigma^2$ by having {{nowrap|(n − 1)}} instead of n in the denominator (the so called Bessel's correction
Bessel's correction
In statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...

):

The difference between s2 and $\scriptstyle\hat\sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas $\scriptstyle\hat\sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\scriptstyle\hat\sigma^2$ is "better" than the s2 in terms of the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

(MSE) criterion. In finite samples both s2 and $\scriptstyle\hat\sigma^2$ have scaled chi-squared distribution with {{nowrap|(n − 1)}} degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to {{nowrap|2σ4/(n−1)}}, which is slightly greater than the σσ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist.

Applying the asymptotic theory, both estimators s2 and $\scriptstyle\hat\sigma^2$ are consistent, that is they converge in probability to σ2 as the sample size {{nowrap|n → ∞}}. The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ2.

By Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

, for normal distribution the sample mean $\scriptstyle\hat\mu$ and the sample variance s2 are independent, which means there can be no gain in considering their joint distribution
Joint distribution
In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...

. There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\scriptstyle\hat\mu$ and s can be employed to construct the so-called t-statistic:

This quantity t has the Student's t-distribution with {{nowrap|(n − 1)}} degrees of freedom, and it is an ancillary statistic
Ancillary statistic
In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...

(independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct the confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2:

where tk,p and {{SubSup|χ|k,p|2}} are the pth quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the t- and χ2-distributions respectively. These confidence intervals are of the level {{nowrap|1 − α}}, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take {{nowrap|α {{=}} 5%}}, resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\scriptstyle\hat\mu$ and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of {{nowrap|α {{=}} 5%}}, results in {{nowrap|{{!}}z0.025{{!}} {{=}} 1.96
1.96
1.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...

}}.

## Occurrence

The occurrence of normal distribution in practical problems can be loosely classified into three categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

; and
3. Distributions modeled as normal — the normal distribution being the distribution with maximum entropy
Principle of maximum entropy
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...

for a given mean and variance.

### Exact normality

Certain quantities in physics
Physics
Physics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...

are distributed normally, as was first demonstrated by James Clerk Maxwell
James Clerk Maxwell
James Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory...

. Examples of such quantities are:
• Velocities of the molecules in the ideal gas
Ideal gas
An ideal gas is a theoretical gas composed of a set of randomly-moving, non-interacting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as...

. More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
• Probability density function of a ground state in a quantum harmonic oscillator
Quantum harmonic oscillator
The quantum harmonic oscillator is the quantum-mechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics...

.
• The position of a particle which experiences diffusion
Diffusion
Molecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles...

. If initially the particle is located at a specific point (that is its probability distribution is the dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation  {{nowrap|1={{frac2|∂|∂t}} f(x,t) = {{frac2|1|2}} {{frac2|∂2|∂x2}} f(x,t)}}. If the initial location is given by a certain density function g(x), then the density at time t is the convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

of g and the normal PDF.

### Approximate normality

Approximately normal distributions occur in many situations, as explained by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where infinitely divisible
Infinite divisibility
The concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory...

and decomposable
Indecomposable distribution
In probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more non-constant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:...

distributions are involved, such as
• Binomial random variables, associated with binary response variables;
• Poisson random variables
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

, associated with rare events;
• Thermal light has a Bose–Einstein
Bose–Einstein statistics
In statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.-Concept:...

distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

### Assumed normality

{{cquote|I can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first approximation, particularly in theoretical investigations. — {{harvtxt|Pearson|1901}}}}
There are statistical methods to empirically test that assumption, see the above Normality tests section.
• In biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

, the logarithm of various variables tend to have a normal distribution, that is, they tend to have a log-normal distribution (after separation on male/female subpopulations), with examples including:
• Measures of size of living tissue (length, height, skin area, weight);
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
• Certain physiological measurements, such as blood pressure of adult humans.
• In finance
Finance
"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created...

, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît Mandelbrot
Benoît Mandelbrot
Benoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child...

have argued that log-Levy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crash
Stock market crash
A stock market crash is a sudden dramatic decline of stock prices across a significant cross-section of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...

es.
• Measurement errors
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.

• In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ test
Intelligence quotient
An intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...

), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SAT
SAT
The SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...

's traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
• Many scores are derived from the normal distribution, including percentile rank
Percentile rank
The percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....

s ("percentiles" or "quantiles"), normal curve equivalent
Normal curve equivalent
In educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...

s, stanine
Stanine
Stanine is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...

s, z-scores
Standard score
In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation...

, and T-scores. Additionally, a number of behavioral statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

procedures are based on the assumption that scores are normally distributed; for example, t-tests
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

and ANOVAs
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

. Bell curve grading
In education, grading on a curve is a statistical method of assigning grades designed to yield a pre-determined distribution of grades among the students in a class...

assigns relative grades based on a normal distribution of scores.
• In hydrology
Hydrology
Hydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability...

the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis
Cumulative frequency analysis
Cumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the non-exceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent...

.

## Generating values from normal distribution

In computer simulations, especially in applications of the Monte-Carlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a {{nowrap|N(μ, σ{{su|p=2}})}} can be generated as {{nowrap|X {{=}} μ + σZ}}, where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniform
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

random variates.

• The most straightforward method is based on the probability integral transform
Probability integral transform
In statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution...

property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in {{harvtxt|Hart|1968}} and in the erf
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

article.

• An easy to program approximate approach, that relies on the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).

• The Box–Muller method uses two independent random numbers U and V distributed uniformly
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

on (0,1). Then the two random variables X and Y

will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm {{nowrap|X2 + Y2}} will have the chi-squared distribution with two degrees of freedom, which is an easily generated exponential
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.

• Marsaglia polar method
Marsaglia polar method
The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables...

is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities

are returned. Again, X and Y will be independent and standard normally distributed.

• The Ratio method is a rejection method. The algorithm proceeds as follows:
• Generate two independent uniform deviates U and V;
• Compute X = {{sqrt|8/e}} (V − 0.5)/U;
• If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
• If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
• If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.

• The ziggurat algorithm
Ziggurat algorithm
The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The...

{{harv|Marsaglia|Tsang|2000}} is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.

• There is also some investigation{{Citation needed|date=June 2010}} into the connection between the fast Hadamard transform
The Hadamard transform is an example of a generalized class of Fourier transforms...

and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.

## Numerical approximations for the normal CDF

The standard normal CDF
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration
Numerical integration
In numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of...

, Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....

, asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.

• {{harvtxt|Abramowitz|Stegun|1964}} give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8 (algorithm 26.2.17):

where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.

• {{harvtxt|Hart|1968}} lists almost a hundred of rational function
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by {{harvtxt|West|2009}} combines Hart's algorithm 5666 with a continued fraction
Continued fraction
In mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on...

approximation in the tail to provide a fast computation algorithm with a 16-digit precision.

• {{harvtxt|W. J. Cody|1969}} after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev Approximation
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

. (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).

• {{harvtxt|Marsaglia|2004}} suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion

for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when {{nowrap|1=x = 10}}).

• The GNU Scientific Library
GNU Scientific Library
In computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science...

calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.

### Development

Some authors attribute the credit for the discovery of the normal distribution to de Moivre
Abraham de Moivre
Abraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling...

, who in 1738

De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii {{nowrap|(a + b)n}} in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example {{harvtxt|Walker|1985}}.

published in the second edition of his "The Doctrine of Chances
The Doctrine of Chances
The Doctrine of Chances was the first textbook on probability theory, written by 18th-century French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots...

" the study of the coefficients in the binomial expansion of {{nowrap|(a + b)n}}. De Moivre proved that the middle term in this expansion has the approximate magnitude of $\scriptstyle 2/\sqrt{2\pi n}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\scriptstyle -\frac{2\ell\ell}{n}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, Stigler
Stephen Stigler
Stephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics...

points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.

In 1809 Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, {{nobr|M′}}, {{nobr|M′′, …}} to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability {{nobr|φ(M−V) · φ(M′−V) · φ(M′′−V) · …}} of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the well-known answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — {{harvtxt|Gauss|1809|loc=section 177}} Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:


\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{-\mathrm{hh}\Delta\Delta},

where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.

Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from {{harvtxt|Pearson|1905|loc=p. 189}} It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral
{{About|the univariate normal distribution|normally distributed vectors|Multivariate normal distribution}}

{{Probability distribution
| name =
| type = density
| pdf_image =

The red line is the standard normal distribution
| cdf_image =

Colors match the image above
| notation =
| parameters = {{nowrap|μ ∈ R}} — mean (
location
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

)
{{nowrap|σ2 > 0}} — variance (squared scale
Scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...

)
| support = x ∈ R
| pdf =
| cdf =
| mean = μ
| median = μ
| mode = μ
| variance = σ2
| skewness = 0
| kurtosis = 0
| entropy =
| mgf =
| char =
| fisher =
| conjugate prior = Normal distribution
}}

In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

, known as the Gaussian function or informally the bell curve:The designation "bell curve" is ambiguous: there are many other distributions which are "bell"-shaped: the Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

, Student's t-distribution, generalized normal, logistic, etc.

where parameter μ is the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

or expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

(location of the peak) and σ 2 is the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. The distribution with {{nowrap|μ {{=}} 0}} and {{nowrap|σ 2 {{=}} 1}} is called the standard normal. A normal distribution is often used as a first approximation to describe real-valued random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s that cluster around a single mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

value.

The normal distribution is considered the most prominent probability distribution in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. There are several reasons for this: First, the normal distribution is very tractable analytically, that is, a large number of results involving this distribution can be derived in explicit form. Second, the normal distribution arises as the outcome of the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally. Finally, the "bell" shape of the normal distribution makes it a convenient choice for modelling a large variety of random variables encountered in practice.

For this reason, the normal distribution is commonly encountered in practice, and is used throughout statistics, natural science
Natural science
The natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...

s, and social sciences as a simple model for complex phenomena. For example, the observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

in an experiment is usually assumed to follow a normal distribution, and the propagation of uncertainty
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

is computed using this assumption. Note that a normally-distributed variable has a symmetric distribution about its mean. Quantities that grow exponentially
Exponential growth
Exponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value...

, such as prices, incomes or populations, are often skewed to the right
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

, and hence may be better described by other distributions, such as the log-normal distribution or Pareto distribution. In addition, the probability of seeing a normally-distributed value that is far (i.e. more than a few standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

s) from the mean drops off extremely rapidly. As a result, statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

using a normal distribution is not robust to the presence of outliers (data that is unexpectedly far from the mean, due to exceptional circumstances, observational error, etc.). When outliers are expected, data may be better described using a heavy-tailed distribution such as the Student's t-distribution.

From a technical perspective, alternative characterizations are possible, for example:
• The normal distribution is the only absolutely continuous
Absolute continuity
In mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...

distribution all of whose cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s beyond the first two (i.e. other than the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

) are zero.
• For a given mean and variance, the corresponding normal distribution is the continuous distribution with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

## Definition

The simplest case of a normal distribution is known as the standard normal distribution, described by the probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

The factor $\scriptstyle\ 1/\sqrt{2\pi}$ in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof
Gaussian integral
The Gaussian integral, also known as the Euler-Poisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...

]
and {{frac2|1|2}} in the exponent makes the "width" of the curve (measured as half the distance between the inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s) also equal to one. It is traditional in statistics to denote this function with the Greek letter ϕ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), whereas density functions
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

for all other distributions are usually denoted with letters f or p. The alternative glyph φ is also used quite often, however within this article "φ" is reserved to denote characteristic functions.

More generally, a normal distribution results from exponentiating a quadratic function
A quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the y-axis....

(just as an exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

results from exponentiating a linear function):

This yields the classic "bell curve" shape, provided that {{nowrap|a < 0}} so that the quadratic function is concave
Concave function
In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

. {{nowrap|f(x) > 0}} everywhere. One can adjust a to control the "width" of the bell, then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control the "height" of the bell. For f(x) to be a true probability density function over R, one must choose c such that (which is only possible when a < 0).

Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

{{nowrap|μ {{=}} − {{frac2|b|2a}}}} and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

{{nowrap|σ2 {{=}} − {{frac2|1|2a}}}}. Changing to these new parameters allows one to rewrite the probability density function in a convenient standard form,

For a standard normal distribution, {{nowrap|1=μ = 0}} and {{nowrap|1=σ2 = 1}}. The last part of the equation above shows that any other normal distribution can be regarded as a version of the standard normal distribution that has been stretched horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell curve's central peak, and σ specifies the "width" of the bell curve.

The parameter μ is at the same time the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

of the normal distribution. The parameter σ2 is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

. The square root of σ2 is called the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

and is the width of the density function.

The normal distribution is usually denoted by N(μ, σ2). Commonly the letter N is written in calligraphic font (typed as \mathcal{N} in LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

). Thus when a random variable X is distributed normally with mean μ and variance σ2, we write

### Alternative formulations

Some authors advocate using the precision
Precision (statistics)
In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....

instead of the variance, and variously define it as {{nowrap|τ {{=}} σ−2}} or {{nowrap|τ {{=}} σ−1}}. This parametrization has an advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as τ is a natural parameter of the normal distribution. Another advantage of using this parametrization is in the study of conditional distributions in multivariate normal case.

The question which normal distribution should be called the "standard" one is also answered differently by various authors. Starting from the works of Gauss the standard normal was considered to be the one with variance {{nowrap|σ2 {{=}} {{frac2|1|2}}}} :

{{harvtxt|Stigler|1982}} goes even further and insists the standard normal to be with the variance {{nowrap|σ2 {{=}} {{frac2|1|2π}}}} :

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the distribution. In Stigler's formulation the density of a normal {{nowrap|N(μ, τ)}} with mean μ and precision τ will be equal to

## Characterization

In the previous section the normal distribution was defined by specifying its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

. However there are other ways to characterize
Characterization (mathematics)
In mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P. It is also common to find statements such as "Property Q characterises Y up to isomorphism". The first type of statement says in...

a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. They include: the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

, the moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

, the cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s, the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

, the moment-generating function
Moment-generating function
In probability theory and statistics, the moment-generating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...

, etc.

### Probability density function

The probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

(pdf) of a random variable describes the relative frequencies of different values for that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous section:

This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth function, defined on the entire real line, and which is called the "Gaussian function".

Properties:
• Function f(x) is unimodal and symmetric around the point {{nowrap|x {{=}} μ}}, which is at the same time the mode
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

and the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of the distribution.
• The inflection point
Inflection point
In differential calculus, an inflection point, point of inflection, or inflection is a point on a curve at which the curvature or concavity changes sign. The curve changes from being concave upwards to concave downwards , or vice versa...

s of the curve occur one standard deviation away from the mean (i.e., at {{nowrap|x {{=}} μ − σ}} and {{nowrap|x {{=}} μ + σ}}).
• Function f(x) is log-concave.
• The standard normal density ϕ(x) is an eigenfunction
Eigenfunction
In mathematics, an eigenfunction of a linear operator, A, defined on some function space is any non-zero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor. More precisely, one has...

of the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The first derivative of ϕ(x) is {{nowrap|ϕ′(x) {{=}} −x·ϕ(x)}}; the second derivative is {{nowrap|ϕ′′(x) {{=}} (x2 − 1)ϕ(x)}}. More generally, the n-th derivative is given by {{nowrap|ϕ(n)(x) {{=}} (−1)nHn(x)ϕ(x)}}, where Hn is the Hermite polynomial of order n.

When {{nowrap|σ2 {{=}} 0}}, the density function doesn't exist. However a generalized function
Generalized function
In mathematics, generalized functions are objects generalizing the notion of functions. There is more than one recognized theory. Generalized functions are especially useful in making discontinuous functions more like smooth functions, and describing physical phenomena such as point charges...

that defines a measure on the real line, and it can be used to calculate, for example, expected value is

where δ(x) is the Dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

which is equal to infinity at {{nowrap|x {{=}} 0}} and is zero elsewhere.

### Cumulative distribution function

The cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) describes probability of a random variable falling in the interval {{nowrap|(−∞, x]}}.

The CDF of the standard normal distribution is denoted with the capital Greek letter Φ (phi
Phi (letter)
Phi , pronounced or sometimes in English, and in modern Greek, is the 21st letter of the Greek alphabet. In modern Greek, it represents , a voiceless labiodental fricative. In Ancient Greek it represented , an aspirated voiceless bilabial plosive...

), and can be computed as an integral of the probability density function:

This integral cannot be expressed in terms of elementary functions, so is simply called the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

, or erf, a special function. Numerical methods for calculation of the standard normal CDF are discussed below. For a generic normal random variable with mean μ and variance σ2 > 0 the CDF will be equal to

The complement of the standard normal CDF, {{nowrap|Q(x) {{=}} 1 − Φ(x)}}, is referred to as the Q-function
Q-function
In statistics, the Q-function is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x...

, especially in engineering texts. This represents the tail probability of the Gaussian distribution, that is the probability that a standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are simple transformations of Φ, are also used occasionally.

Properties:
• The standard normal CDF is 2-fold rotationally symmetric around point (0, ½):  {{nowrap| Φ(−x) {{=}} 1 − Φ(x)}}.
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  {{nowrap| Φ′(x) {{=}} ϕ(x)}}.
• The antiderivative
Antiderivative
In calculus, an "anti-derivative", antiderivative, primitive integral or indefinite integralof a function f is a function F whose derivative is equal to f, i.e., F ′ = f...

of Φ(x) is:  {{nowrap|1 = ∫ Φ(x) dx = x Φ(x) + ϕ(x)}}.

For a normal distribution with zero variance, the CDF is the Heaviside step function
Heaviside step function
The Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument....

(with {{nowrap|H(0) {{=}} 1}} convention):

### Quantile function

The inverse of the standard normal CDF, called the quantile function
Quantile function
In probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability...

or probit function, is expressed in terms of the inverse error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

:

Quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value that a standard normal random variable X has the probability of exactly p to fall inside the {{nowrap|(−∞, zp]}} interval. The quantiles are used in hypothesis testing, construction of confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s and Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

s. The most "famous" normal quantile is {{nowrap|1.96
1.96
1.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...

{{=}} z0.975}}. A standard normal random variable is greater than 1.96 in absolute value in 5% of cases.

For a normal random variable with mean μ and variance σ2, the quantile function is

### Characteristic function and moment generating function

The characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

φX(t) of a random variable X is defined as the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of eitX, where i is the imaginary unit
Imaginary unit
In mathematics, the imaginary unit allows the real number system ℝ to be extended to the complex number system ℂ, which in turn provides at least one root for every polynomial . The imaginary unit is denoted by , , or the Greek...

, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier transform
Fourier transform
In mathematics, Fourier analysis is a subject area which grew from the study of Fourier series. The subject began with the study of the way general functions may be represented by sums of simpler trigonometric functions...

of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is

The characteristic function can be analytically extended to the entire complex plane: one defines φ(z) {{=}} eiμz − {{frac2|1|2}}σ2z2 for all z ∈ C.

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are nonzero.

### Moments

{{see also|List of integrals of Gaussian functions}}
The normal distribution has moments
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

of all orders. That is, for a normally distributed X with mean μ and variance {{nowrap|σ 2}}, the expectation {{nowrap|E[{{!}}X{{!}}p}}] exists and is finite for all p such that {{nowrap|Re[p] > −1}}. Usually we are interested only in moments of integer orders: {{nowrap|p {{=}} 1, 2, 3, …}}.

• Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected value of {{nowrap|(X − μ) p}}. Using standardization of normal random variables, this expectation will be equal to {{nowrap|σ p · E[Zp]}}, where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every odd number from n to 1.

• Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders, but are nonzero for all odd ps.

The last formula is true for any non-integer {{nowrap|p > −1}}.

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these moments are much more complicated, and are given in terms of confluent hypergeometric function
Confluent hypergeometric function
In mathematics, a confluent hypergeometric function is a solution of a confluent hypergeometric equation, which is a degenerate form of a hypergeometric differential equation where two of the three regular singularities merge into an irregular singularity...

s 1F1 and U.{{Citation needed|date=June 2010}}

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.

• First two cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.

Order Raw moment Central moment Cumulant
1 μ 0 μ
2 μ2 + σ2 σ 2 σ 2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4  4 0
5 μ5 + 10μ3σ2 + 15μσ4 0 0
6 μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ 6 0
7 μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6 0 0
8 μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ 8 0

### Standardizing normal random variables

As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This "standardizing" transformation is convenient as it allows one to compute the PDF and especially the CDF of a normal distribution having the table of PDF and CDF values for the standard normal. They will be related via

### Standard deviation and confidence intervals

About 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 68-95-99.7 rule
68-95-99.7 rule
In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean....

, or the empirical rule, or the 3-sigma rule.
To be more precise, the area under the bell curve between {{nowrap|μ − nσ}} and {{nowrap|μ + nσ}} is given by

where erf is the error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:
i.e. 1 minus ... or 1 in ...
1 {{val|0.682689492137}} {{val|0.317310507863}} {{val|3.15148718753}}
2 {{val|0.954499736104}} {{val|0.045500263896}} {{val|21.9778945080}}
3 {{val|0.997300203937}} {{val|0.002699796063}} {{val|370.398347345}}
4 {{val|0.999936657516}} {{val|0.000063342484}} {{val|15787.1927673}}
5 {{val|0.999999426697}} {{val|0.000000573303}} {{val|1744277.89362}}
6 {{val|0.999999998027}} {{val|0.000000001973}} {{val|506797345.897}}

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s of the specified levels based on normally distributed (or asymptotically normal) estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s:
n     n
0.80 {{val|1.281551565545}} 0.999 {{val|3.290526731492}}
0.90 {{val|1.644853626951}} 0.9999 {{val|3.890591886413}}
0.95 {{val|1.959963984540}} 0.99999 {{val|4.417173413469}}
0.98 {{val|2.326347874041}} 0.999999 {{val|4.891638475699}}
0.99 {{val|2.575829303549}} 0.9999999 {{val|5.326723886384}}
0.995 {{val|2.807033768344}} 0.99999999 {{val|5.730728868236}}
0.998 {{val|3.090232306168}} 0.999999999 {{val|6.109410204869}}

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a multiple of the standard deviation that specifies the width of the interval.

### Central limit theorem

{{Main|Central limit theorem}}

The theorem states that under certain (fairly common) conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each having mean μ and variance σ2, then the central limit theorem states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

The importance of the central limit theorem cannot be overemphasized. A great number of test statistic
Test statistic
In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...

s, score
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

s, and estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s encountered in practice contain sums of certain random variables in them, even more estimators can be represented as sums of random variables through the use of influence functions — all of these quantities are governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit theorem is that certain other distributions can be approximated by the normal distribution, for example:
• The binomial distribution B(n, p) is approximately normal N(np, np(1 − p)) for large n and for p not too close to zero or one.
• The Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

(λ) distribution is approximately normal N(λ, λ) for large values of λ.
• The chi-squared distribution χ2(k) is approximately normal N(k, 2k) for large ks.
• The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen theorem
Berry–Esséen theorem
The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...

, improvements of the approximation are given by the Edgeworth expansions.

### Miscellaneous

1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed with mean μ and variance σ2, then a linear transform {{nowrap|aX + b}} (for some real numbers a and b) is also normally distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2, then their linear combination will also be normally distributed: [proof
Sum of normally distributed random variables
In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...

]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum {{nowrap|X1 + X2}} is distributed normally, then both X1 and X2 must also be normal. This is known as Cramér's decomposition theorem
Cramér's theorem
In mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist.- Normal random variables :...

. The interpretation of this property is that a normal distribution is only divisible by other normal distributions. Another application of this property is in connection with the central limit theorem: although the CLT asserts that the distribution of a sum of arbitrary non-normal iid random variables is approximately normal, the Cramér's theorem shows that it can never become exactly normal.

3. If the characteristic function φX of some random variable X is of the form {{nowrap|φX(t) {{=}} eQ(t)}}, where Q(t) is a polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

, then the Marcinkiewicz theorem (named after Józef Marcinkiewicz
Józef Marcinkiewicz
Józef Marcinkiewicz – died in 1940 in Kharkiv, Ukraine) was a Polish mathematician.He was a student of Antoni Zygmund; and later worked with Juliusz Schauder, and Stefan Kaczmarz. He was a professor of the Stefan Batory University in Wilno....

) asserts that Q can be at most a quadratic polynomial, and therefore X a normal random variable. The consequence of this result is that the normal distribution is the only distribution with a finite number (two) of non-zero cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s.

4. If X and Y are jointly normal and uncorrelated
Uncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

, then they are independent. The requirement that X and Y should be jointly normal is essential, without it the property does not hold.[proof
Normally distributed and uncorrelated does not imply independent
In probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....

]
For non-normal random variables uncorrelatedness does not imply independence.

5. If X and Y are independent {{nowrap|N(μ, σ 2)}} random variables, then {{nowrap|X + Y}} and {{nowrap|X − Y}} are also independent and identically distributed (this follows from the polarization identity
Polarization identity
In mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. Let \|x\| \, denote the norm of vector x and \langle x, \ y \rangle \, the inner product of vectors x and y...

). This property uniquely characterizes normal distribution, as can be seen from the Bernstein's theorem: if X and Y are independent and such that {{nowrap|X + Y}} and {{nowrap|X − Y}} are also independent, then both X and Y must necessarily have normal distributions.

More generally, if X1, ..., Xn are independent random variables, then two linear combinations ∑akXk and ∑bkXk will be independent if and only if all Xks are normal and {{nowrap|∑akbk{{SubSup|σ|k|2}} {{=}} 0}}, where {{SubSup|σ|k|2}} denotes the variance of Xk.

6. Normal distribution is infinitely divisible
Infinite divisibility (probability)
The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...

: for a normally distributed X with mean μ and variance σ2 we can find n independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that

7. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent {{nowrap|N(μ, σ2)}} random variables and a, b are arbitrary real numbers, then

where X3 is also {{nowrap|N(μ, σ2)}}. This relationship directly follows from property (1).

8. The Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

between two normal distributions {{nowrap|1=X1 ∼ N(μ1, σ21 )}}and {{nowrap|1=X2 ∼ N(μ2, σ22 )}}is given by:

The Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

between the same distributions is equal to

9. The Fisher information matrix for normal distribution is diagonal and takes form

10. Normal distributions belongs to an exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

with natural parameters and , and natural statistics x and x2. The dual, expectation parameters for normal distribution are {{nowrap|1=η1 = μ}} and {{nowrap|1=η2 = μ2 + σ2}}.

11. The conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

of the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

of a normal distribution is another normal distribution. Specifically, if x1, …, xn are iid {{nowrap|N(μ, σ2)}} and the prior is {{nowrap|μ ~ N(μ0, σ{{su|p=2|b=0}})}}, then the posterior distribution for the estimator of μ will be

12. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution {{nowrap|N(μ, σ2)}} is the one with the maximum entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....

.

13. The family of normal distributions forms a manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

with constant curvature
Constant curvature
In mathematics, constant curvature in differential geometry is a concept most commonly applied to surfaces. For those the scalar curvature is a single number determining the local geometry, and its constancy has the obvious meaning that it is the same at all points...

−1. The same family is flat
Flat manifold
In mathematics, a Riemannian manifold is said to be flat if its curvature is everywhere zero. Intuitively, a flat manifold is one that "locally looks like" Euclidean space in terms of distances and angles, e.g. the interior angles of a triangle add up to 180°....

with respect to the (±1)-connections ∇(e) and ∇(m).

### Operations on a single random variable

If X is distributed normally with mean μ and variance σ2, then

• The exponential of X is distributed log-normally: {{nowrap|eX ~ lnN (μ, σ2)}}.
• The absolute value of X has folded normal distribution
Folded Normal Distribution
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some...

: {{nowrap|IXI ~ Nf (μ, σ2)}}. If {{nowrap|μ {{=}} 0}} this is known as the half-normal distribution
Half-normal distribution
The half-normal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e...

.
• The square of X/σ has the noncentral chi-squared distribution with one degree of freedom: {{nowrap|1= X22 ~ χ2122)}}. If μ = 0, the distribution is called simply chi-squared.
• The distribution of the variable X restricted to an interval [a, b] is called the truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

.
• (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.

#### Combination of two independent random variables

If X1 and X2 are two independent standard normal random variables, then

• Their sum and difference is distributed normally with mean zero and variance two: {{nowrap|X1 ± X2 ∼ N(0, 2)}}.
• Their product {{nowrap|Z {{=}} X1·X2}} follows the "product-normal" distribution with density function {{nowrap|fZ(z) {{=}} π−1K0({{!}}z{{!}}),}} where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero, unbounded at z = 0, and has the characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

{{nowrap|1= φZ(t) = (1 + t 2)−1/2}}.
• Their ratio follows the standard Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

: {{nowrap|X1 ÷ X2 ∼ Cauchy(0, 1)}}.
• Their Euclidean norm $\scriptstyle\sqrt{X_1^2\,+\,X_2^2}$ has the Rayleigh distribution, also known as the chi distribution with 2 degrees of freedom.

#### Combination of two or more independent random variables

• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the chi-squared distribution with n degrees of freedom: $\scriptstyle X_1^2 + \cdots + X_n^2\ \sim\ \chi_n^2$.
• If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their sample mean is independent from the sample standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

, which can be demonstrated using the Basu's theorem
Basu's theorem
In statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....

or Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

. The ratio of these two quantities will have the Student's t-distribution with n − 1 degrees of freedom:

• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized sums of squares will have the {{nowrap|F-distribution}} with (n, m) degrees of freedom:

#### Operations on the density function

The split normal distribution
Split normal distribution
In probability theory and statistics, the split normal distribution also known as the two-piece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances...

is most directly defined in terms of joining scaled sections of the density functions of different normal distributions and rescaling the density to integrate to one. The truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...

results from rescaling a section of a single density function.

### Extensions

The notion of normal distribution, being one of the most important distributions in probability theory, has been extended far beyond the standard framework of the univariate (that is one-dimensional) case (Case 1). All these extensions are also called normal or Gaussian laws, so a certain ambiguity in names exists.
• Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space
Euclidean space
In mathematics, Euclidean space is the Euclidean plane and three-dimensional space of Euclidean geometry, as well as the generalizations of these notions to higher dimensions...

. A vector {{nowrap|X ∈ Rk}} is multivariate-normally distributed if any linear combination of its components {{nowrap|∑{{su|p=k|b=j=1}}aj Xj}} has a (univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V.
• Rectified Gaussian distribution
Rectified Gaussian Distribution
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0...

a rectified version of normal distribution with all the negative elements reset to 0
• Complex normal distribution
Complex normal distribution
In probability theory, the family of complex normal distributions consists of complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix Γ, and the relation matrix C...

deals with the complex normal vectors. A complex vector {{nowrap|X ∈ Ck}} is said to be normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the relation matrix C.
• Matrix normal distribution
Matrix normal distribution
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrix-valued random variables.- Definition :...

describes the case of normally distributed matrices.
• Gaussian process
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...

es are the normally distributed stochastic process
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

es. These can be viewed as elements of some infinite-dimensional Hilbert space
Hilbert space
The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions...

H, and thus are the analogues of multivariate normal vectors for the case {{nowrap|k {{=}} ∞}}. A random element {{nowrap|h ∈ H}} is said to be normal if for any constant {{nowrap|a ∈ H}} the scalar product {{nowrap|(a, h)}} has a (univariate) normal distribution. The variance structure of such Gaussian random element can be described in terms of the linear covariance {{nowrap|operator K: H → H}}. Several Gaussian processes became popular enough to have their own names:
• Brownian motion
Wiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...

,
• Brownian bridge
Brownian bridge
A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...

,
• Ornstein–Uhlenbeck process.
• Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

is an abstract mathematical construction which represents a "q-analogue" of the normal distribution.
• the q-Gaussian is an analogue of the Gaussian distribution, in the sense that it maximises the Tsallis entropy
Tsallis entropy
In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated...

, and is one type of Tsallis distribution
Tsallis distribution
In q-analog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual...

. Note that this distribution is different from the Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...

above.

One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random variables encountered in practice. In such case a possible extension would be a richer family of distributions, having more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of such extensions are:
• Pearson distribution
Pearson distribution
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.- History :...

— a four-parametric family of probability distributions that extend the normal law to include different skewness and kurtosis values.

## Normality tests

{{Main|Normality tests}}

Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

H0 is that the observations are distributed normally with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for this problem, the more prominent of them are outlined below:
• "Visual" tests are more intuitively appealing but subjective at the same time, as they rely on informal human judgement to accept or reject the null hypothesis.
• Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

— is a plot of the sorted values from the data set against the expected values of the corresponding quantiles from the standard normal distribution. That is, it's a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal to pk = (k − α)/(n + 1 − 2α) and α is an adjustment constant which can be anything between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
• P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points (Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between (0, 0) and (1, 1).
• Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two quantities differ significantly.
• Normal probability plot
Normal probability plot
The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed....

(rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...

plot)

• Moment tests:
• D'Agostino's K-squared test
D'Agostino's K-squared test
In statistics, D’Agostino’s K2 test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...

• Jarque–Bera test
Jarque–Bera test
In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...

• Empirical distribution function tests:
• Lilliefors test
Lilliefors test
In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test...

(an adaptation of the Kolmogorov–Smirnov test)
• Anderson–Darling test

## Estimation of parameters

It is often the case that we don't know the parameters of the normal distribution, but instead want to estimate
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

them. That is, having a sample (x1, …, xn) from a normal {{nowrap|N(μ, σ2)}} population we would like to learn the approximate values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

method, which requires maximization of the log-likelihood function:

Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the maximum likelihood estimates:

Estimator $\scriptstyle\hat\mu$ is called the sample mean, since it is the arithmetic mean of all observations. The statistic $\scriptstyle\overline{x}$ is complete and sufficient for μ, and therefore by the Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...

, $\scriptstyle\hat\mu$ is the uniformly minimum variance unbiased (UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. This implies that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....

of $\scriptstyle\hat\mu$ is proportional to $\scriptstyle1/\sqrt{n}$, that is, if one wishes to decrease the standard error by a factor of 10, one must increase the number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...

s and the number of trials in Monte Carlo simulations.

From the standpoint of the asymptotic theory
Asymptotic theory (statistics)
In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...

, $\scriptstyle\hat\mu$ is consistent
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...

, that is, it converges in probability to μ as n → ∞. The estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator $\scriptstyle\hat\sigma^2$ is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another estimator is often used instead of the $\scriptstyle\hat\sigma^2$. This other estimator is denoted s2, and is also called the sample variance, which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The estimator s2 differs from $\scriptstyle\hat\sigma^2$ by having {{nowrap|(n − 1)}} instead of n in the denominator (the so called Bessel's correction
Bessel's correction
In statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...

):

The difference between s2 and $\scriptstyle\hat\sigma^2$ becomes negligibly small for large ns. In finite samples however, the motivation behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas $\scriptstyle\hat\sigma^2$ is biased. Also, by the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it the "best" estimator among all unbiased ones. However it can be shown that the biased estimator $\scriptstyle\hat\sigma^2$ is "better" than the s2 in terms of the mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

(MSE) criterion. In finite samples both s2 and $\scriptstyle\hat\sigma^2$ have scaled chi-squared distribution with {{nowrap|(n − 1)}} degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to {{nowrap|2σ4/(n−1)}}, which is slightly greater than the σσ-element of the inverse Fisher information matrix $\scriptstyle\mathcal{I}^{-1}$. Thus, s2 is not an efficient estimator for σ2, and moreover, since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist.

Applying the asymptotic theory, both estimators s2 and $\scriptstyle\hat\sigma^2$ are consistent, that is they converge in probability to σ2 as the sample size {{nowrap|n → ∞}}. The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ2.

By Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...

, for normal distribution the sample mean $\scriptstyle\hat\mu$ and the sample variance s2 are independent, which means there can be no gain in considering their joint distribution
Joint distribution
In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...

. There is also a reverse theorem: if in a sample the sample mean and sample variance are independent, then the sample must have come from the normal distribution. The independence between $\scriptstyle\hat\mu$ and s can be employed to construct the so-called t-statistic:

This quantity t has the Student's t-distribution with {{nowrap|(n − 1)}} degrees of freedom, and it is an ancillary statistic
Ancillary statistic
In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...

(independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct the confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence interval for σ2:

where tk,p and {{SubSup|χ|k,p|2}} are the pth quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

s of the t- and χ2-distributions respectively. These confidence intervals are of the level {{nowrap|1 − α}}, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice people usually take {{nowrap|α {{=}} 5%}}, resulting in the 95% confidence intervals. The approximate formulas in the display above were derived from the asymptotic distributions of $\scriptstyle\hat\mu$ and s2. The approximate formulas become valid for large values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not depend on n. In particular, the most popular value of {{nowrap|α {{=}} 5%}}, results in {{nowrap|{{!}}z0.025{{!}} {{=}} 1.96
1.96
1.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...

}}.

## Occurrence

The occurrence of normal distribution in practical problems can be loosely classified into three categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

; and
3. Distributions modeled as normal — the normal distribution being the distribution with maximum entropy
Principle of maximum entropy
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...

for a given mean and variance.

### Exact normality

Certain quantities in physics
Physics
Physics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...

are distributed normally, as was first demonstrated by James Clerk Maxwell
James Clerk Maxwell
James Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory...

. Examples of such quantities are:
• Velocities of the molecules in the ideal gas
Ideal gas
An ideal gas is a theoretical gas composed of a set of randomly-moving, non-interacting point particles. The ideal gas concept is useful because it obeys the ideal gas law, a simplified equation of state, and is amenable to analysis under statistical mechanics.At normal conditions such as...

. More generally, velocities of the particles in any system in thermodynamic equilibrium will have normal distribution, due to the maximum entropy principle.
• Probability density function of a ground state in a quantum harmonic oscillator
Quantum harmonic oscillator
The quantum harmonic oscillator is the quantum-mechanical analog of the classical harmonic oscillator. Because an arbitrary potential can be approximated as a harmonic potential at the vicinity of a stable equilibrium point, it is one of the most important model systems in quantum mechanics...

.
• The position of a particle which experiences diffusion
Diffusion
Molecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles...

. If initially the particle is located at a specific point (that is its probability distribution is the dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

), then after time t its location is described by a normal distribution with variance t, which satisfies the diffusion equation  {{nowrap|1={{frac2|∂|∂t}} f(x,t) = {{frac2|1|2}} {{frac2|∂2|∂x2}} f(x,t)}}. If the initial location is given by a certain density function g(x), then the density at time t is the convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

of g and the normal PDF.

### Approximate normality

Approximately normal distributions occur in many situations, as explained by the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. When the outcome is produced by a large number of small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where infinitely divisible
Infinite divisibility
The concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory...

and decomposable
Indecomposable distribution
In probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more non-constant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:...

distributions are involved, such as
• Binomial random variables, associated with binary response variables;
• Poisson random variables
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

, associated with rare events;
• Thermal light has a Bose–Einstein
Bose–Einstein statistics
In statistical mechanics, Bose–Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium.-Concept:...

distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

### Assumed normality

{{cquote|I can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first approximation, particularly in theoretical investigations. — {{harvtxt|Pearson|1901}}}}
There are statistical methods to empirically test that assumption, see the above Normality tests section.
• In biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

, the logarithm of various variables tend to have a normal distribution, that is, they tend to have a log-normal distribution (after separation on male/female subpopulations), with examples including:
• Measures of size of living tissue (length, height, skin area, weight);
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
• Certain physiological measurements, such as blood pressure of adult humans.
• In finance
Finance
"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created...

, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and stock market indices are assumed normal (these variables behave like compound interest, not like simple interest, and so are multiplicative). Some mathematicians such as Benoît Mandelbrot
Benoît Mandelbrot
Benoît B. Mandelbrot was a French American mathematician. Born in Poland, he moved to France with his family when he was a child...

have argued that log-Levy distributions which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market crash
Stock market crash
A stock market crash is a sudden dramatic decline of stock prices across a significant cross-section of a stock market, resulting in a significant loss of paper wealth. Crashes are driven by panic as much as by underlying economic factors...

es.
• Measurement errors
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...

in physical experiments are often modeled by a normal distribution. This use of a normal distribution does not imply that one is assuming the measurement errors are normally distributed, rather using the normal distribution produces the most conservative predictions possible given only knowledge about the mean and variance of the errors.

• In standardized testing, results can be made to have a normal distribution. This is done by either selecting the number and difficulty of questions (as in the IQ test
Intelligence quotient
An intelligence quotient, or IQ, is a score derived from one of several different standardized tests designed to assess intelligence. When modern IQ tests are constructed, the mean score within an age group is set to 100 and the standard deviation to 15...

), or by transforming the raw test scores into "output" scores by fitting them to the normal distribution. For example, the SAT
SAT
The SAT Reasoning Test is a standardized test for college admissions in the United States. The SAT is owned, published, and developed by the College Board, a nonprofit organization in the United States. It was formerly developed, published, and scored by the Educational Testing Service which still...

's traditional range of 200–800 is based on a normal distribution with a mean of 500 and a standard deviation of 100.
• Many scores are derived from the normal distribution, including percentile rank
Percentile rank
The percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....

s ("percentiles" or "quantiles"), normal curve equivalent
Normal curve equivalent
In educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...

s, stanine
Stanine
Stanine is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...

s, z-scores
Standard score
In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation...

, and T-scores. Additionally, a number of behavioral statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

procedures are based on the assumption that scores are normally distributed; for example, t-tests
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

and ANOVAs
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

. Bell curve grading
In education, grading on a curve is a statistical method of assigning grades designed to yield a pre-determined distribution of grades among the students in a class...

assigns relative grades based on a normal distribution of scores.
• In hydrology
Hydrology
Hydrology is the study of the movement, distribution, and quality of water on Earth and other planets, including the hydrologic cycle, water resources and environmental watershed sustainability...

the distribution of long duration river discharge or rainfall (e.g. monthly and yearly totals, consisting of the sum of 30 respectively 360 daily values) is often thought to be practically normal according to the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

. The blue picture illustrates an example of fitting the normal distribution to ranked October rainfalls showing the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis
Cumulative frequency analysis
Cumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the non-exceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent...

.

## Generating values from normal distribution

In computer simulations, especially in applications of the Monte-Carlo method, it is often desirable to generate values that are normally distributed. The algorithms listed below all generate the standard normal deviates, since a {{nowrap|N(μ, σ{{su|p=2}})}} can be generated as {{nowrap|X {{=}} μ + σZ}}, where Z is standard normal. All these algorithms rely on the availability of a random number generator U capable of producing uniform
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

random variates.

• The most straightforward method is based on the probability integral transform
Probability integral transform
In statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution...

property: if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal distribution. The drawback of this method is that it relies on calculation of the probit function Φ−1, which cannot be done analytically. Some approximate methods are described in {{harvtxt|Hart|1968}} and in the erf
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...

article.

• An easy to program approximate approach, that relies on the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

, is as follows: generate 12 uniform U(0,1) deviates, add them all up, and subtract 6 — the resulting random variable will have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a 12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a limited range of (−6, 6).

• The Box–Muller method uses two independent random numbers U and V distributed uniformly
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

on (0,1). Then the two random variables X and Y

will both have the standard normal distribution, and will be independent. This formulation arises because for a bivariate normal random vector (X Y) the squared norm {{nowrap|X2 + Y2}} will have the chi-squared distribution with two degrees of freedom, which is an easily generated exponential
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

random variable corresponding to the quantity −2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random variable V.

• Marsaglia polar method
Marsaglia polar method
The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables...

is a modification of the Box–Muller method algorithm, which does not require computation of functions sin and cos. In this method U and V are drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over, otherwise two quantities

are returned. Again, X and Y will be independent and standard normally distributed.

• The Ratio method is a rejection method. The algorithm proceeds as follows:
• Generate two independent uniform deviates U and V;
• Compute X = {{sqrt|8/e}} (V − 0.5)/U;
• If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
• If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
• If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.

• The ziggurat algorithm
Ziggurat algorithm
The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The...

{{harv|Marsaglia|Tsang|2000}} is faster than the Box–Muller transform and still exact. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.

• There is also some investigation{{Citation needed|date=June 2010}} into the connection between the fast Hadamard transform
The Hadamard transform is an example of a generalized class of Fourier transforms...

and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally distributed data.

## Numerical approximations for the normal CDF

The standard normal CDF
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

is widely used in scientific and statistical computing. The values Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration
Numerical integration
In numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations. This article focuses on calculation of...

, Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....

, asymptotic series and continued fractions. Different approximations are used depending on the desired level of accuracy.

• {{harvtxt|Abramowitz|Stegun|1964}} give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8 (algorithm 26.2.17):

where ϕ(x) is the standard normal PDF, and b0 = 0.2316419, b1 = 0.319381530, b2 = −0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.

• {{harvtxt|Hart|1968}} lists almost a hundred of rational function
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

approximations for the erfc function. His algorithms vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An algorithm by {{harvtxt|West|2009}} combines Hart's algorithm 5666 with a continued fraction
Continued fraction
In mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on...

approximation in the tail to provide a fast computation algorithm with a 16-digit precision.

• {{harvtxt|W. J. Cody|1969}} after recalling Hart68 solution is not suited for erf, gives a solution for both erf and erfc, with maximal relative error bound, via Rational Chebyshev Approximation
Rational function
In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions. Neither the coefficients of the polynomials nor the values taken by the function are necessarily rational.-Definitions:...

. (Cody, W. J. (1969). "Rational Chebyshev Approximations for the Error Function", paper here).

• {{harvtxt|Marsaglia|2004}} suggested a simple algorithmFor example, this algorithm is given in the article Bc programming language. based on the Taylor series expansion

for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes over 300 iterations to calculate the function with 16 digits of precision when {{nowrap|1=x = 10}}).

• The GNU Scientific Library
GNU Scientific Library
In computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science...

calculates values of the standard normal CDF using Hart's algorithms and approximations with Chebyshev polynomials.

### Development

Some authors attribute the credit for the discovery of the normal distribution to de Moivre
Abraham de Moivre
Abraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling...

, who in 1738

De Moivre first published his findings in 1733, in a pamphlet "Approximatio ad Summam Terminorum Binomii {{nowrap|(a + b)n}} in Seriem Expansi" that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original pamphlet was reprinted several times, see for example {{harvtxt|Walker|1985}}.

published in the second edition of his "The Doctrine of Chances
The Doctrine of Chances
The Doctrine of Chances was the first textbook on probability theory, written by 18th-century French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots...

" the study of the coefficients in the binomial expansion of {{nowrap|(a + b)n}}. De Moivre proved that the middle term in this expansion has the approximate magnitude of $\scriptstyle 2/\sqrt{2\pi n}$, and that "If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a Term distant from the middle by the Interval ℓ, has to the middle Term, is $\scriptstyle -\frac{2\ell\ell}{n}$." Although this theorem can be interpreted as the first obscure expression for the normal probability law, Stigler
Stephen Stigler
Stephen Mack Stigler is Ernest DeWitt Burton Distinguished Service Professor at the Department of Statistics of the University of Chicago. His research has focused on statistical theory of robust estimators and the history of statistics...

points out that de Moivre himself did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in particular de Moivre lacked the concept of the probability density function.

In 1809 Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

published his monograph "Theoria motus corporum coelestium in sectionibus conicis solem ambientium" where among other things he introduces several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution. Gauss used M, {{nobr|M′}}, {{nobr|M′′, …}} to denote the measurements of some unknown quantity V, and sought the "most probable" estimator: the one which maximizes the probability {{nobr|φ(M−V) · φ(M′−V) · φ(M′′−V) · …}} of obtaining the observed experimental results. In his notation φΔ is the probability law of the measurement errors of magnitude Δ. Not knowing what the function φ is, Gauss requires that his method should reduce to the well-known answer: the arithmetic mean of the measured values."It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations, made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not rigorously, yet very nearly at least, so that it is always most safe to adhere to it." — {{harvtxt|Gauss|1809|loc=section 177}} Starting from these principles, Gauss demonstrates that the only law which rationalizes the choice of arithmetic mean as an estimator of the location parameter, is the normal law of errors:


\varphi\mathit{\Delta} = \frac{h}{\surd\pi}\, e^{-\mathrm{hh}\Delta\Delta},

where h is "the measure of the precision of the observations". Using this normal law as a generic model for errors in the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.

Although Gauss was the first to suggest the normal distribution law, Laplace made significant contributions."My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two great astronomer mathematicians." quote from {{harvtxt|Pearson|1905|loc=p. 189}} It was Laplace who first posed the problem of aggregating several observations in 1774, although his own solution led to the Laplacian distribution. It was Laplace who first calculated the value of the integral {{nowrap
Gaussian integral
The Gaussian integral, also known as the Euler-Poisson integral or Poisson integral, is the integral of the Gaussian function e−x2 over the entire real line.It is named after the German mathematician and...

in 1782, providing the normalization constant for the normal distribution. Finally, it was Laplace who in 1810 proved and presented to the Academy the fundamental central limit theorem, which emphasized the theoretical importance of the normal distribution.

It is of interest to note that in 1809 an American mathematician Adrain
Robert Adrain was a scientist and mathematician, considered one of the most brilliant mathematical minds of the time in America....

published two derivations of the normal probability law, simultaneously and independently from Gauss. His works remained largely unnoticed by the scientific community, until in 1871 they were "rediscovered" by Abbe
Cleveland Abbe
Cleveland Abbe was an American meteorologist and advocate of time zones. While director of the Cincinnati Observatory in Cincinnati, Ohio, he developed a system of telegraphic weather reports, daily weather maps, and weather forecasts. Congress in 1870 established the U.S. Weather Bureau and...

.

In the middle of the 19th century Maxwell
James Clerk Maxwell
James Clerk Maxwell of Glenlair was a Scottish physicist and mathematician. His most prominent achievement was formulating classical electromagnetic theory. This united all previously unrelated observations, experiments and equations of electricity, magnetism and optics into a consistent theory...

demonstrated that the normal distribution is not just a convenient mathematical tool, but may also occur in natural phenomena: "The number of particles whose velocity, resolved in a certain direction, lies between x and x + dx is

### Naming

Since its introduction, the normal distribution has been known by many different names: the law of error, the law of facility of errors, Laplace's second law, Gaussian law, etc. By the end of the 19th century some authorsBesides those specifically referenced here, such use is encountered in the works of Peirce, Galton
Francis Galton
Sir Francis Galton /ˈfrɑːnsɪs ˈgɔːltn̩/ FRS , cousin of Douglas Strutt Galton, half-cousin of Charles Darwin, was an English Victorian polymath: anthropologist, eugenicist, tropical explorer, geographer, inventor, meteorologist, proto-geneticist, psychometrician, and statistician...

{{full}} and Lexis
Wilhelm Lexis
Wilhelm Lexis was an eminent German statistician, economist, and social scientist and a founder of the interdisciplinary study of insurance....

{{full}} approximately around 1875.{{Citation needed|date=June 2011}}
had started using the name normal distribution, where the word "normal" was used as an adjective — the term was derived from the fact that this distribution was seen as typical, common, normal. Peirce (one of those authors) once defined "normal" thus: "...the 'normal' is not the average (or any other kind of mean) of what actually occurs, but of what would, in the long run, occur under certain circumstances." Around the turn of the 20th century Pearson
Karl Pearson
Karl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....

popularized the term normal as a designation for this distribution.
{{cquote|Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another 'abnormal'. — {{harvtxt|Pearson|1920}}}}

Also, it was Pearson who first wrote the distribution in terms of the standard deviation σ as in modern notation. Soon after this, in year 1915, Fisher
Ronald Fisher
Sir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...

added the location parameter to the formula for normal distribution, expressing it in the way it is written nowadays:

The term "standard normal" which denotes the normal distribution with zero mean and unit variance came into general use around 1950s, appearing in the popular textbooks by P.G. Hoel (1947) "Introduction to mathematical statistics" and A.M. Mood (1950) "Introduction to the theory of statistics".

When the name is used, the "Gaussian distribution" was named after Carl Friedrich Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

, who introduced the distribution in 1809 as a way of rationalizing the method of least squares as outlined above. The related work of Laplace, also outlined above has led to the normal distribution being sometimes called Laplacian,{{Citation needed|date=October 2010}} especially in French-speaking countries. Among English speakers, both "normal distribution" and "Gaussian distribution" are in common use, with different terms preferred by different communities.

{{Portal|Statistics}}
• Behrens–Fisher problem—the long-standing problem of testing whether two normal samples with different variances have same means;
• Erdős–Kac theorem
Erdos–Kac theorem
In number theory, the Erdős–Kac theorem, named after Paul Erdős and Mark Kac, and also known as the fundamental theorem of probabilistic number theory, states that if ω is the number of distinct prime factors of n, then, loosely speaking, the probability distribution ofis the standard normal...

—on the occurrence of the normal distribution in number theory
Number theory
Number theory is a branch of pure mathematics devoted primarily to the study of the integers. Number theorists study prime numbers as well...

• Gaussian blur
Gaussian blur
A Gaussian blur is the result of blurring an image by a Gaussian function. It is a widely used effect in graphics software, typically to reduce image noise and reduce detail...

convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

which uses the normal distribution as a kernel
• Sum of normally distributed random variables
Sum of normally distributed random variables
In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...