All Topics  
Normal distribution

 

   Email Print
   Bookmark   Link






 

Normal distribution



 
 
The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distribution
Continuous probability distribution

In probability theory, a probability distribution is called continuous if its cumulative distribution function is continuous function. This is equivalent to saying that for random variables X with the distribution in question, Pr[X = a] = 0 for all real numbers a, i.e.: the probability that X attains the value a is zer...
s, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 ("average", µ) and variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
 (standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
 squared, s2) respectively. The standard normal distribution is the normal distribution with a mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 of zero and a variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
  of one (the red curves in the plots to the right). Carl Friedrich Gauss
Carl Friedrich Gauss

Johann Carl Friedrich Gauss. was a Germans mathematician and scientist who contributed significantly to many fields, including number theory, statistics, mathematical analysis, Differential geometry and topology, geodesy, electrostatics, astronomy and optics....
 became associated with this set of distributions when he analyzed astronomical data using them, and defined the equation of its probability density function. It is often called the bell curve because the graph of its probability density
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 resembles a bell
Bell (instrument)

A bell is a simple sound-making device. The bell is a percussion instrument and an idiophone. Its form is usually an open-ended hollow drum which resonates upon being struck....
.

The importance of the normal distribution as a model of quantitative phenomena in the natural
Natural science

In science, the term natural science refers to a methodological naturalism approach to the study of the universe, which is understood as obeying rules or law of nature origin....
 and behavioral sciences is due in part to the central limit theorem
Central limit theorem

The central limit theorem states that the re-averaged sum of a sufficiently large number of Independent and identically-distributed random variables Statistical independence random variables each with finite mean and variance will be approximately normal distribution ....
.






Discussion
Ask a question about 'Normal distribution'
Start a new discussion about 'Normal distribution'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distribution
Continuous probability distribution

In probability theory, a probability distribution is called continuous if its cumulative distribution function is continuous function. This is equivalent to saying that for random variables X with the distribution in question, Pr[X = a] = 0 for all real numbers a, i.e.: the probability that X attains the value a is zer...
s, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 ("average", µ) and variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
 (standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
 squared, s2) respectively. The standard normal distribution is the normal distribution with a mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 of zero and a variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
  of one (the red curves in the plots to the right). Carl Friedrich Gauss
Carl Friedrich Gauss

Johann Carl Friedrich Gauss. was a Germans mathematician and scientist who contributed significantly to many fields, including number theory, statistics, mathematical analysis, Differential geometry and topology, geodesy, electrostatics, astronomy and optics....
 became associated with this set of distributions when he analyzed astronomical data using them, and defined the equation of its probability density function. It is often called the bell curve because the graph of its probability density
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 resembles a bell
Bell (instrument)

A bell is a simple sound-making device. The bell is a percussion instrument and an idiophone. Its form is usually an open-ended hollow drum which resonates upon being struck....
.

The importance of the normal distribution as a model of quantitative phenomena in the natural
Natural science

In science, the term natural science refers to a methodological naturalism approach to the study of the universe, which is understood as obeying rules or law of nature origin....
 and behavioral sciences is due in part to the central limit theorem
Central limit theorem

The central limit theorem states that the re-averaged sum of a sufficiently large number of Independent and identically-distributed random variables Statistical independence random variables each with finite mean and variance will be approximately normal distribution ....
. Many measurements, ranging from psychological
Psychology

Psychology is an academic and applied science discipline involving the science study of human mental functions and behavior. Occasionally it also relies on symbolic hermeneutics and critical theory, although these traditions are less pronounced than in other social sciences such as sociology....
 to physi
Physics

Physics is the natural science which examines basic concepts such as energy, force, and spacetime and all that derives from these, such as mass, charge, matter and its Motion ....
cal phenomena (in particular, thermal noise) can be approximated, to varying degrees, by the normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified by assuming that many small, independent effects are additively contributing to each observation. The normal distribution is also important for its relationship to least-squares estimation
Least squares

The method of least squares or ordinary least squares is used to solve overdetermined systems. Least squares is often applied in statistical contexts, particularly regression analysis....
, one of the simplest and oldest methods of statistical estimation.

The normal distribution also arises in many areas of statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
. For example, the sampling distribution
Sampling distribution

In statistics, a sampling distribution is the probability distribution, under repeated sampling of the Statistical population, of a given statistic ....
 of the sample mean is approximately normal, even if the distribution of the population from which the sample is taken is not normal. In addition, the normal distribution maximizes information entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
 among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory
Probability theory

Probability theory is the branch of mathematics concerned with analysis of Statistical randomness phenomena. The central objects of probability theory are random variables, stochastic processes, and event s: mathematical abstractions of determinism events or measured quantities that may either be single occurrences or evolve over time in an a...
, normal distributions arise as the limiting distributions
Convergence of random variables

In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some Limit ing random variable is an important concept in probability theory, and its applications to statistics and stochastic processes....
 of several continuous and discrete families of distributions.

History


The normal distribution was first introduced by Abraham de Moivre
Abraham de Moivre

Abraham de Moivre was a France mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory....
 in an article in the year 1733, which was reprinted in the second edition of his The Doctrine of Chances
The Doctrine of Chances

The Doctrine of Chances was the first textbook on probability theory, written by 18th-century French mathematician Abraham de Moivre and first published in 1718....
, 1738 in the context of approximating certain binomial distribution
Binomial distribution

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n statistical independence yes/no experiments, each of which yields success with probability p....
s for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace.

Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Legendre in 1805. Gauss
Carl Friedrich Gauss

Johann Carl Friedrich Gauss. was a Germans mathematician and scientist who contributed significantly to many fields, including number theory, statistics, mathematical analysis, Differential geometry and topology, geodesy, electrostatics, astronomy and optics....
, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors. The fact the distribution is sometimes called Gaussian is an example of Stigler's Law.

The name "bell curve" goes back to Esprit Jouffret
Esprit Jouffret

Esprit Jouffret was a France artillery officer, insurance actuary and mathematician, author of Trait? ?l?mentaire de g?om?trie ? quatre dimensions , a popularization of Henri Poincar?'s Science and Hypothesis in which Jouffret described hypercubes and other complex Polyhedron in four dimensions and projected them onto the two-dimens...
 who first used the term "bell surface" in 1872 for a bivariate normal
Multivariate normal distribution

In probability theory and statistics, a multivariate normal distribution, sometimes also called a multivariate Gaussian distribution, is a generalization of the one-dimensional normal distribution to higher dimensions....
 with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton
Francis Galton

Sir Francis Galton Fellow of the Royal Society , Cousin#Half_cousins of Charles Darwin, was an England Victorian era polymath, anthropologist, Eugenics, tropical List of explorers, geographer, inventor, meteorologist, proto-geneticist, Psychometrics, and statistician....
 and Wilhelm Lexis
Wilhelm Lexis

Wilhelm Lexis was an eminent Germany statistician, economist, and social scientist and a founder of the interdisciplinary study of insurance....
 around 1875. Despite this terminology, other probability distributions may be more appropriate in some contexts; see the discussion of occurrence, below.

Characterization

There are various ways to characterize
Characterization (mathematics)

In the jargon of mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P....
 a probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
. The most visual is the probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 (PDF). Equivalent ways are the cumulative distribution function
Cumulative distribution function

In probability theory and statistics, the cumulative distribution function or just distribution function, completely describes the probability distribution of a real-valued random variable X....
, the moment
Moment (mathematics)

The concept of moment in mathematics evolved from the concept of moment in physics. The nth moment of a real-valued function f of a real variable about a value c is...
s, the cumulant
Cumulant

In probability theory and statistics, if a random variable X admits an expected value ? = E and a variance s2 = E, then these are the first two cumulants: ? = ?1 and s2 = ?2....
s, the characteristic function
Characteristic function (probability theory)

In probability theory, the characteristic function of any random variable completely defines its probability distribution. On the real number line it is given by the following formula, where X is any random variable with the distribution in question:...
, the moment-generating function
Moment-generating function

In probability theory and statistics, the moment-generating function of a random variable X iswherever this expected value exists.The moment-generating function is so called because, if it exists on an open interval around t = 0, then it is the ordinary generating function of the moment of the probability distribution:...
, the cumulant-generating function
Generating function

In mathematics a generating function is a formal power series whose coefficients encode information about a sequence an that is indexed by the natural numbers....
, and Maxwell's theorem
Maxwell's theorem

In probability theory, Maxwell's theorem, named in honor of James Clerk Maxwell, states that if the probability distribution of a vector space-valued random variable X = T is the same as the distribution of GX for every n×n orthogonal matrix G and the components are statistical independence, then the c...
. See probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
 for a discussion.

To indicate that a real-valued random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
 X is normally distributed with mean µ and variance s2 = 0, we write

While it is certainly useful for certain limit theorems (e.g. asymptotic normality of estimators
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
) and for the theory of Gaussian process
Gaussian process

In the mathematical theory of probability, a Gaussian process is a stochastic process t ?T for which any finite linear combination of sampling will be normal distribution ....
es to consider the probability distribution concentrated at µ (see Dirac measure
Dirac measure

In mathematics, a Dirac measure is a measure δx on a set X that gives the singleton set the measure 1, for a chosen element x ∈ X:...
) as a distribution with mean µ and variance s2 = 0, this degenerate case is often excluded from the considerations because no density with respect to the Lebesgue measure
Lebesgue measure

In mathematics, the Lebesgue measure, named after Henri Lebesgue, is the standard way of assigning a length, area or volume to subsets of Euclidean space....
 exists.

The normal distribution may also be parameterized using a precision
Precision

Precision has the following meanings:Concepts* Accuracy and precision, measurement deviation from true value and its scatter* arithmetic precision, the number of digits from which a value is expressed...
 parameter t, defined as the reciprocal of s2. This parameterization has an advantage in numerical applications where s2 is very close to zero and is more convenient to work with in analysis as t is a natural parameter
Exponential family

In theory of probability and statistics, an exponential family is a class of probability distributions sharing a certain form, specified below. It is said that such distributions belong to the exponential class of density functions....
 of the normal distribution.

Probability density function

The continuous probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 of the normal distribution is the Gaussian function
Gaussian function

In mathematics, a Gaussian function is a function of the form:for some real number constants a > 0, b, c > 0, and e ? 2.718281828 ....




where s > 0 is the standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
, the real parameter µ is the expected value
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
, and



is the density function of the "standard" normal distribution: i.e., the normal distribution with µ = 0 and s = 1. The integral
Integral

Integration is an important concept in mathematics, specifically in the field of calculus and, more broadly, mathematical analysis. Given a function ƒ of a Real number variable x and an interval [ab] of the real line, the integral...
 of over the real line
Real line

In mathematics, the real line is simply the set R of singleton real numbers.However, this term is usually used when R is to be treated as a space of some sort, such as a topological space or a vector space....
 is equal to one as shown in the Gaussian integral
Gaussian integral

The Gaussian integral, or probability integral, is the improper integral of the Gaussian function over the entire real line. It is named after the German mathematician and physicist Carl Friedrich Gauss, and the equation is:...
 article.

As a Gaussian function with the denominator of the exponent equal to 2, the standard normal density function is an eigenfunction
Eigenfunction

In mathematics, an eigenfunction of a linear operator, A, defined on some function space is any non-zero function f in that space that returns from the operator exactly as is, except for a multiplicative scaling factor....
 of the Fourier transform
Fourier transform

In mathematics, Fourier analysis is a subject area which grew out of the study of Fourier series. The subject began with trying to understand when it was possible to represent general functions by sums of simpler trigonometric functions....
.

The probability density function has notable properties including:
  • symmetry about its mean µ
  • the mode
    Mode (statistics)

    In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
     and median
    Median

    In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half....
     both equal the mean µ
  • the inflection point
    Inflection point

    In differential calculus, an inflection point, or point of inflection is a point on a curve at which the curvature changes Negative and non-negative numbers....
    s of the curve occur one standard deviation away from the mean, i.e. at µ − s and µ + s.


Cumulative distribution function

The cumulative distribution function
Cumulative distribution function

In probability theory and statistics, the cumulative distribution function or just distribution function, completely describes the probability distribution of a real-valued random variable X....
 (cdf) of a probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
, evaluated at a number (lower-case) x, is the probability of the event that a random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
 (capital) X with that distribution is less than or equal to x. The cumulative distribution function of the normal distribution is expressed in terms of the density function as follows:



The standard normal cdf is just the general cdf evaluated with µ = 0 and s = 1:

The standard normal cdf can be expressed in terms of a special function called the error function
Error function

In mathematics, the error function is a special function which occurs in probability, statistics, materials science, and partial differential equations....
, as

and the cdf itself can hence be expressed as

The complement of the standard normal cdf, , is often denoted , and is sometimes referred to simply as the Q-function, especially in engineering texts. This represents the tail probability of the Gaussian distribution. Other definitions of the Q-function, all of which are simple transformations of , are also used occasionally.

The inverse standard normal cumulative distribution function, or quantile function
Quantile function

In probability theory, a quantile function of aprobability distribution is the inverse function F −1 of its cumulative distribution function F....
, can be expressed in terms of the inverse error function:

and the inverse cumulative distribution function can hence be expressed as

This quantile function is sometimes called the probit
Probit

In probability theory and statistics, the probit function is the inverse function cumulative distribution function , or quantile function associated with the standard normal distribution....
 function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such an elementary primitive has been proven. Several accurate methods exist for approximating the quantile function for the normal distribution - see quantile function
Quantile function

In probability theory, a quantile function of aprobability distribution is the inverse function F −1 of its cumulative distribution function F....
 for a discussion and references.

The values F(x) may be approximated very accurately by a variety of methods, such as numerical integration
Numerical integration

In numerical analysis, numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical ordinary differential equations....
, Taylor series
Taylor series

In mathematics, the Taylor series is a representation of a function as an Series of terms calculated from the values of its derivatives at a single point....
, asymptotic series and continued fractions.

Strict lower and upper bounds for the cdf

For large x the standard normal cdf is close to 1 and is close to 0. The elementary bounds

in terms of the density are useful.

Using the substitution
Integration by substitution

In calculus, integration by substitution is a tool for finding antiderivatives and integrals. Using the fundamental theorem of calculus often requires finding an antiderivative....
 v = u˛/2, the upper bound is derived as follows:

Similarly, using and the quotient rule
Quotient rule

In calculus, the quotient rule is a method of finding the derivative of a function that is the quotient of two other functions for which derivatives exist....
,

Solving for provides the lower bound.

Generating functions


Moment generating function

The moment generating function is defined as the expected value
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 of exp(tX). For a normal distribution, the moment generating function is



as can be seen by completing the square
Completing the square

In elementary algebra, completing the square is a technique for converting a quadratic polynomial of the formto the formThe expression inside the parenthesis is of the form x − constant....
 in the exponent.

Cumulant generating function
The cumulant
Cumulant

In probability theory and statistics, if a random variable X admits an expected value ? = E and a variance s2 = E, then these are the first two cumulants: ? = ?1 and s2 = ?2....
 generating function is the logarithm of the moment generating function: g(t) = µt + s2t2/2. Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.

Characteristic function

The characteristic function
Characteristic function (probability theory)

In probability theory, the characteristic function of any random variable completely defines its probability distribution. On the real number line it is given by the following formula, where X is any random variable with the distribution in question:...
 is defined as the expected value
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 of exp(itX), where i is the imaginary unit
Imaginary unit

In mathematics, physics, and engineering, the imaginary unit is denoted by  or the Latin   or the Greek iota . It allows the real number system, to be extended to the complex number system,   Its precise definition is dependent upon the particular method of extension....
. So the characteristic function is obtained by replacing t with it in the moment-generating function.

For a normal distribution, the characteristic function is



Properties


Some properties of the normal distribution:

  1. If and and are real number
    Real number

    In mathematics, the real numbers may be described informally in several different ways. The real numbers include both rational numbers, such as 42 and −23/129, and irrational numbers, such as pi and the square root of two; or, a real number can be given by an infinite decimal representation, such as 2.4871773339...., where the digits co...
    s, then (see expected value
    Expected value

    In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
     and variance
    Variance

    In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
    ).
  2. If and are independent
    Statistical independence

    In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
     normal random variable
    Random variable

    In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
    s, then:
    • Their sum is normally distributed with (proof
      Sum of normally distributed random variables

      In probability theory, if X and Y are statistical independence random variables that are normal distribution, then X + Y is also normally distributed; i.e....
      ). Interestingly, the converse holds: if two independent random variables have a normally-distributed sum, then they must be normal themselves — this is known as Cramér's theorem
      Cramér's theorem

      In mathematics, Cram?r's theorem is the result that if X and Y are statistical independence real line random variables whose sum X + Y is a normal distribution, then both X and Y must be normal as well....
      .
    • Their difference is normally distributed with .
    • If the variances of X and Y are equal, then U and V are independent of each other.
    • The Kullback-Leibler divergence,
  3. If and are independent normal random variables, then:
    • Their product follows a distribution with density given by
    • where is a modified Bessel function of the second kind
      Bessel function

      In mathematics, Bessel functions, first defined by the mathematician Daniel Bernoulli and generalized by Friedrich Bessel, are Canonical#Mathematics solutions y of Bessel's differential equation:...
      .
    • Their ratio follows a Cauchy distribution
      Cauchy distribution

      The Cauchy?Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz,  is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as a Lorentz distribution, or a Lorentz function or the Breit?Wigner dis...
       with . Thus the Cauchy distribution is a special kind of ratio distribution
      Ratio distribution

      A ratio distribution is a statistical distribution constructed as the distribution of the ratio of random variables having two other distributions....
      .
  4. If are independent standard normal variables, then has a chi-square distribution
    Chi-square distribution

    In probability theory and statistics, the chi-square distribution is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests....
     with n degrees of freedom.
  5. If are independent standard normal variables, then the sample mean and sample variance are independent
    Statistical independence

    In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
    . This property characterizes
    Characterization (mathematics)

    In the jargon of mathematics, the statement that "Property P characterizes object X" means, not simply that X has property P, but that X is the only thing that has property P....
     normal distributions (and helps to explain why the F-test
    F-test

    An F-test is any statistical test in which the test statistic has an F-distribution if the null hypothesis is true. The name was coined by George W....
     is non-robust with respect to non-normality!)


Standardizing normal random variables


As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.

If ~ , then

is a standard normal random variable: ~ . An important consequence is that the cdf of a general normal distribution is therefore

Conversely, if is a standard normal distribution, ~ , then

is a normal random variable with mean and variance .

The standard normal distribution has been tabulated (usually in the form of value of the cumulative distribution function F), and the other normal distributions are the simple transformations, as described above, of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.

Moments

The first few moments
Moment (mathematics)

The concept of moment in mathematics evolved from the concept of moment in physics. The nth moment of a real-valued function f of a real variable about a value c is...
 of the normal distribution are:
Number Raw moment Central moment Cumulant
0 1 1  
1 0
2
3 0 0
4 0
5 0 0
6 0
7 0 0
8 0


All cumulant
Cumulant

In probability theory and statistics, if a random variable X admits an expected value ? = E and a variance s2 = E, then these are the first two cumulants: ? = ?1 and s2 = ?2....
s of the normal distribution beyond the second are zero.

Higher central moments (of order 2k with µ = 0) are given by the formula



The central limit theorem



Under certain conditions (such as being independent and identically-distributed
Independent and identically-distributed random variables

In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed if each has the same probability distribution as the others and all are mutually statistical independence....
 with finite variance), the sum of a large number of random variables is approximately normally distributed — this is the central limit theorem.

The practical importance of the central limit theorem is that the normal cumulative distribution function can be used as an approximation to some other cumulative distribution functions, for example:

  • A binomial distribution
    Binomial distribution

    In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n statistical independence yes/no experiments, each of which yields success with probability p....
     with parameters
    n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction
    Continuity correction

    In probability theory, if a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as the number of "successes" in n independent Bernoulli trials with probability p of success on each trial, then...
     should be applied).
    The approximating normal distribution has parameters µ =
    np, s2 = np(1 − p).


  • A Poisson distribution
    Poisson distribution

    In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and Statistical independence of the time since the last event....
     with parameter ? is approximately normal for large ?.
    The approximating normal distribution has parameters µ = s2 = ?.


Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution. A general upper bound of the approximation error of the cumulative distribution function is given by the Berry–Esséen theorem
Berry–Esséen theorem

The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normal distribution as the sample size is increased....
.

Infinite divisibility


The normal distributions are infinitely divisible
Infinite divisibility (probability)

The concepts of infinite divisibility and the Decomposable distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is....
 probability distributions: Given a mean
µ, a variance s 2 = 0, and a natural number n, the sum X1 + . . . + Xn of n independent random variables

has this specified normal distribution (to verify this, use characteristic functions or convolution
Sum of normally distributed random variables

In probability theory, if X and Y are statistical independence random variables that are normal distribution, then X + Y is also normally distributed; i.e....
 and mathematical induction
Mathematical induction

Mathematical induction is a method of mathematical proof typically used to establish that a given statement is true of all natural numbers. It is done by proving that the first statement in the infinite sequence of statements is true, and then proving that if any one statement in the infinite sequence of statements is true, then...
).

Stability


The normal distributions are strictly stable probability distributions.

Standard deviation and confidence intervals


About 68% of values drawn from a normal distribution are within one standard deviation s > 0 away from the mean µ; about 95% of the values are within two standard deviations and about 99.7% lie within three standard deviations. This is known as the "68-95-99.7 rule
68-95-99.7 rule

In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a normal distribution, almost all values lie within 3 standard deviations of the mean....
" or the "empirical rule."

To be more precise, the area under the bell curve between µ − 
ns and µ + ns in terms of the cumulative normal distribution function is given by

where erf is the error function
Error function

In mathematics, the error function is a special function which occurs in probability, statistics, materials science, and partial differential equations....
. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:
  
1  0.682689492137 
2 0.954499736104
3 0.997300203937
4 0.999936657516
5 0.999999426697
6 0.999999998027


The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area under the bell curve. These values are useful to determine (asymptotic) confidence interval
Confidence interval

In statistics, a confidence interval is an interval estimation of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given....
s of the specified levels based on normally distributed (or asymptotically normal
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
) estimator
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
s:

   
0.80  1.28155 
0.90 1.64485
0.95 1.95996
0.98 2.32635
0.99 2.57583
0.995 2.80703
0.998 3.09023
0.999 3.29052
0.9999 3.8906
0.99999 4.4172


where the value on the left of the table is the proportion of values that will fall within a given interval and
n is a multiple of the standard deviation that specifies the width of the interval.

Exponential family form


The Normal distribution is a two-parameter exponential family form
Exponential family

In theory of probability and statistics, an exponential family is a class of probability distributions sharing a certain form, specified below. It is said that such distributions belong to the exponential class of density functions....
 with natural parameters µ and 1/s2, and natural statistics
X and X2. The canonical form has parameters and and sufficient statistics and .

Complex Gaussian process


Consider complex Gaussian random variable,

where
X and Y are real and independent Gaussian variables with equal variances . The pdf of the joint variables is then

Because , the resulting pdf for the complex Gaussian variable
Z is

Related distributions

  • is a Rayleigh distribution
    Rayleigh distribution

    In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution. It can arise when a two-dimensional vector has elements that are normal distribution, are uncorrelated, and have equal variance....
     if where and are two independent normal distributions.
  • is a chi-square distribution
    Chi-square distribution

    In probability theory and statistics, the chi-square distribution is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests....
     with degrees of freedom
    Degrees of freedom (statistics)

    In statistics, the phrase degrees of freedom is used to describe the number of values in the final calculation of a statistic that are free to vary....
     if where for and are independent.
  • is a Cauchy distribution
    Cauchy distribution

    The Cauchy?Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz,  is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as a Lorentz distribution, or a Lorentz function or the Breit?Wigner dis...
     if for and are two independent
    Statistical independence

    In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
     normal distributions.


  • is a log-normal distribution
    Log-normal distribution

    In probability and statistics, the log-normal distribution is the single-tailed probability distribution of any random variable whose logarithm is normal distribution....
     if and .


  • Relation to stable distribution: if then .


  • Truncated normal distribution
    Truncated normal distribution

    In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above ....
    . If then truncating
    X below at and above at will lead to a random variable with mean where and is the probability density function
    Probability density function

    In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
     of a standard normal random variable.


  • If is a random variable with a normal distribution, and , then has a folded normal distribution
    Folded Normal Distribution

    The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean ? and variance s2, the random variable Y = |X| has a folded normal distribution....
    .


Descriptive and inferential statistics


Scores

Many scores are derived from the normal distribution, including percentile rank
Percentile rank

The percentile rank of a score is the percentage of scores in its frequency distribution which are lower or equal to it. For example, a test score which is greater than or equal to 85% of the scores of people taking the test is said to be at the 85th percentile....
s ("percentiles" or "quantiles"), normal curve equivalent
Normal curve equivalent

A normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education.Normal curve equivalent : A normalized standardized score with a mean of 50 and a standard deviation of 21.06 re...
s, stanine
Stanine

Stanine is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two .Some web sources attribute stanines to the United States Army Air Forces during World War II....
s, z-scores
Standard score

In statistics, a standard score is a dimensionless number derived by subtracting the population mean from an individual raw score and then dividing the difference by the statistical population standard deviation....
, and T-scores. Additionally, a number of behavioral statistical
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 procedures are based on the assumption that scores are normally distributed; for example, t-test
Student's t-test

A t-test is any statistical hypothesis testing in which the test statistic has a Student's t-distribution if the null hypothesis is true. It is applied when the population is assumed to be normal distribution but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies...
s and ANOVA
Analysis of variance

In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables....
s (see below). Bell curve grading
Bell curve grading

In education, grading on a bell curve is a method of assigning grades designed to yield a desired distribution of grades among the students in a class....
 assigns relative grades based on a normal distribution of scores.

Normality tests


Normality tests check a given set of data for similarity to the normal distribution. The null hypothesis
Null hypothesis

In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
 is that the data set is similar to the normal distribution, therefore a sufficiently small P-value
P-value

In statistics hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true....
 indicates non-normal data.

  • Kolmogorov–Smirnov test
  • Lilliefors test
    Lilliefors test

    In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov?Smirnov test....
  • Anderson–Darling test
  • Ryan–Joiner test
  • Shapiro–Wilk test
  • Normal probability plot
    Normal probability plot

    The normal probability plot is a graphical technique for assessing whether or not a data set is approximately normal distribution.The data are plotted against a theoretical normal distribution in such a way that the points should form an approximate straight line....
     (rankit
    Rankit

    In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data....
     plot)
  • Jarque–Bera test
  • Spiegelhalter's omnibus test


Estimation of parameters


Maximum likelihood estimation of parameters

Suppose

are independent
Statistical independence

In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
 and each is normally distributed with expectation
µ and variance s 2 > 0. In the language of statisticians, the observed values of these n random variables make up a "sample of size n from a normally distributed population." It is desired to estimate the "population mean" µ and the "population standard deviation" s, based on the observed values of this sample. The continuous joint probability density function of these n independent random variables is

As a function of
µ and s, the likelihood function
Likelihood function

In statistics, the likelihood function is a function of the parameters of a statistical model that plays a key role in statistical inference. In non-technical usage, "likelihood" is a synonym for "probability", but throughout this article only the technical definition is used....
 based on the observations
X1, ..., Xn is

with some constant
C > 0 (which in general would be even allowed to depend on X1, ..., Xn, but will vanish anyway when partial derivatives of the log-likelihood function with respect to the parameters are computed, see below).

In the method of maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
, the values of
µ and s that maximize the likelihood function are taken as estimates of the population parameters µ and s.

Usually in maximizing a function of two variables, one might consider partial derivative
Partial derivative

In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables with the others held constant ....
s. But here we will exploit the fact that the value of
µ that maximizes the likelihood function with s fixed does not depend on s. Therefore, we can find that value of µ, then substitute it for µ in the likelihood function, and finally find the value of s that maximizes the resulting expression.

It is evident that the likelihood function is a decreasing function of the sum

So we want the value of
µ that minimizes this sum. Let

be the "sample mean" based on the
n observations. Observe that

Only the last term depends on
µ and it is minimized by

That is the maximum-likelihood estimate of
µ based on the n observations X1, ..., Xn. When we substitute that estimate for µ into the likelihood function, we get

It is conventional to denote the "log-likelihood function", i.e., the logarithm of the likelihood function, by a lower-case
l, and we have

and then

This derivative is positive, zero, or negative according as
s2 is between 0 and

or equal to that quantity, or greater than that quantity. (If there is just one observation, meaning that
n = 1, or if X1 = ... = Xn, which only happens with probability zero, then by this formula, reflecting the fact that in these cases the likelihood function is unbounded as s decreases to zero.)

Consequently this average of squares of residuals
Errors and residuals in statistics

In statistics and Optimization , statistical errors and residuals are two closely related and easily confused measures of "deviation of a sample from the mean": the error of a sample is the deviation of the sample from the population mean or actual function, while the residual of a sample is the difference between the sa...
 is the maximum-likelihood estimate of
s2, and its square root is the maximum-likelihood estimate of s based on the n observations. This estimator is biased, but has a smaller mean squared error
Mean squared error

In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated....
 than the usual unbiased estimator, which is
n/(n − 1) times this estimator.

Surprising generalization

The derivation of the maximum-likelihood estimator of the covariance matrix
Covariance matrix

In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar -valued random variable....
 of a multivariate normal distribution
Multivariate normal distribution

In probability theory and statistics, a multivariate normal distribution, sometimes also called a multivariate Gaussian distribution, is a generalization of the one-dimensional normal distribution to higher dimensions....
 is subtle. It involves the spectral theorem
Spectral theorem

In mathematics, particularly linear algebra and functional analysis, the spectral theorem is any of a number of results about linear operators or about matrix_....
 and the reason it can be better to view a scalar
Scalar (mathematics)

In linear algebra, real numbers are called scalars and relate to vectors in a vector space through the operation of scalar multiplication, in which a vector can be multiplied by a number to produce another vector....
 as the trace
Trace (linear algebra)

In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e.,...
 of a 1×1 matrix
Matrix (mathematics)

In mathematics, a matrix is a rectangular array of numbers, as shown at the right. In addition to a number of elementary, entrywise operations such as matrix addition a key notion is matrix multiplication....
 than as a mere scalar. See estimation of covariance matrices
Estimation of covariance matrices

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimation theory. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the Joint probability distribution....
.

Unbiased estimation of parameters
The maximum likelihood estimator of the population mean
μ from a sample is an unbiased estimator of the mean. The maximum likelihood estimator of the variance is unbiased if we assume the population is known a priori, but in practice that does not happen. However, if we are faced with a sample and have no knowledge of the mean or the variance of the population from which it is drawn, as assumed in the maximum likelihood derivation above, then the maximum likelihood estimator of the variance is biased. An unbiased estimator of the variance σ2 is:

This "sample variance" follows a Gamma distribution
Gamma distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. It has a scale parameter θ and a shape parameter k....
 if all
Xi are independent and identically-distributed
Independent and identically-distributed random variables

In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed if each has the same probability distribution as the others and all are mutually statistical independence....
:

with mean and variance

The maximum likelihood estimate of the standard deviation is the square root of the maximum likelihood estimate of the variance. However, neither this nor the square root of the sample variance provides an unbiased estimate for standard deviation: see unbiased estimation of standard deviation
Unbiased estimation of standard deviation

The question of unbiased estimation of a standard deviation arises in statistics mainly as question in statistical theory. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence inter...
 for formulae particular to the normal distribution.

Occurrence


Approximately normal distributions occur in many situations, as explained by the central limit theorem
Central limit theorem

The central limit theorem states that the re-averaged sum of a sufficiently large number of Independent and identically-distributed random variables Statistical independence random variables each with finite mean and variance will be approximately normal distribution ....
. When there is reason to suspect the presence of a large number of small effects
acting additively and independently, it is reasonable to assume that observations will be normal. There are statistical methods to empirically test that assumption, for example the Kolmogorov-Smirnov test
Kolmogorov-Smirnov test

In statistics, the Andrey Kolmogorov–Vladimir Ivanovich Smirnov test is a form of minimum distance estimation used as a nonparametric statistics of equality of one-dimensional probability distributions used to compare a random sample with a reference probability distribution , or to compare two samples ....
.

Effects can also act as
multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm
Logarithm

In mathematics, the logarithm of a number to a given base is the Power or exponent to which the base must be raised in order to produce the number....
 of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal
Log-normal distribution

In probability and statistics, the log-normal distribution is the single-tailed probability distribution of any random variable whose logarithm is normal distribution....
.

Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see below).

To summarize, here is a list of situations where approximate normality is sometimes assumed. For a fuller discussion, see below.
  • In counting problems, where the central limit theorem
    Central limit theorem

    The central limit theorem states that the re-averaged sum of a sufficiently large number of Independent and identically-distributed random variables Statistical independence random variables each with finite mean and variance will be approximately normal distribution ....
     includes a discrete-to-continuum approximation and where infinitely divisible
    Infinite divisibility

    The concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory , and probability theory . One may speak of infinite divisibility, or the lack thereof, of matter, space, time, money, or abstract mathematical objects....
     and decomposable
    Indecomposable distribution

    In probability theory, an indecomposable distribution is any probability distribution that cannot be represented as the distribution of the sum of two or more non-constant statistical independence random variables....
     distributions are involved, such as
    • Binomial random variables
      Binomial distribution

      In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n statistical independence yes/no experiments, each of which yields success with probability p....
      , associated with yes/no questions;
    • Poisson random variables
      Poisson distribution

      In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and Statistical independence of the time since the last event....
      , associated with rare events;
  • In physiological measurements of biological specimens:
    • The logarithm of measures of size of living tissue (length, height, skin area, weight);
    • The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
    • Other physiological measures may be normally distributed, but there is no reason to expect that a priori;
  • Measurement errors are often assumed to be normally distributed, and any deviation from normality is considered something which should be explained;
  • Financial variables, in the Black–Scholes model
    • Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables behave like compound interest, not like simple interest, and so are multiplicative;
    • While the Black–Scholes model assumes normality, in reality these variables exhibit heavy tails, as seen in stock market crash
      Stock market crash

      A stock market crash is a sudden dramatic decline of stock prices across a significant cross-section of a stock market. Crashes are driven by panic as much as by underlying economic factors....
      es;


    • Other financial variables may be normally distributed, but there is no reason to expect that a priori;
  • Light intensity
    • The intensity of laser light is normally distributed;
    • Thermal light has a Bose–Einstein
      Bose–Einstein statistics

      In statistical mechanics, Satyendra Nath Bose?Albert Einstein Particle statistics determines the statistical distribution of identical identical particles bosons over the energy states in thermal equilibrium....
       distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.


Of relevance to biology and economics is the fact that complex systems tend to display power law
Power law

A power law is a special kind of mathematical relationship between two quantities. If one quantity is the frequency of an event, the relationship is a power-law distribution, and the frequencies decrease very slowly as the size of the event increases....
s rather than normality.

Photon counting


Light intensity from a single source varies with time, as thermal fluctuations can be observed if the light is analyzed at sufficiently high time resolution. Quantum mechanics interprets measurements of light intensity as photon
Photon

In physics, the photon is an elementary particle, the quantum of the electromagnetic field and the basic unit of light and all other forms of electromagnetic radiation....
 counting, where the natural assumption is to use the Poisson distribution
Poisson distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and Statistical independence of the time since the last event....
. When light intensity is integrated over large times longer than the coherence time, the Poisson-to-normal approximation is appropriate.

Measurement errors


Normality is the
central assumption of the mathematical theory of errors. Similarly, in statistical model-fitting, an indicator of goodness of fit is that the residuals
Errors and residuals in statistics

In statistics and Optimization , statistical errors and residuals are two closely related and easily confused measures of "deviation of a sample from the mean": the error of a sample is the deviation of the sample from the population mean or actual function, while the residual of a sample is the difference between the sa...
 (as the errors are called in that setting) be independent and normally distributed. The assumption is that any deviation from normality needs to be explained. In that sense, both in model-fitting and in the theory of errors, normality is the only observation that need not be explained, being expected. However, if the original data are not normally distributed (for instance if they follow a Cauchy distribution
Cauchy distribution

The Cauchy?Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz,  is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as a Lorentz distribution, or a Lorentz function or the Breit?Wigner dis...
), then the residuals will also not be normally distributed. This fact is usually ignored in practice.

Repeated measurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Whether this assumption is valid is debatable.

A famous and oft-quoted remark attributed to Gabriel Lippmann
Gabriel Lippmann

Jonas Ferdinand Gabriel Lippmann was a France-Luxembourgish physicist and inventor, and Nobel Prize in Physics in physics for his method of reproducing colours photographically based on the phenomenon of interference, later known as the Lippmann plate....
 says: "Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental fact; and the experimenters, because they suppose it is a theorem of mathematics." Another source may be .

Physical characteristics of biological specimens


The sizes of full-grown animals is approximately lognormal. The evidence and an explanation based on models of growth was first published in the 1932 book Problems of Relative Growth by Julian Huxley
Julian Huxley

Sir Julian Sorell Huxley Fellow of the Royal Society was an English evolutionary biologist, Humanist and Internationalism . He was a proponent of natural selection, and a leading figure in the mid-twentieth century evolutionary synthesis....
.

Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the distribution of sizes deviate from lognormality.

The assumption that linear size of biological specimens is normal (rather than lognormal) leads to a non-normal distribution of weight (since weight or volume is roughly proportional to the 2nd or 3rd power of length, and Gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.

On the other hand, there are some biological measures where normality is assumed, such as blood pressure of adult humans. This is supposed to be normally distributed, but only after separating males and females into different populations (each of which is normally distributed).

Financial variables

.]]

Already in 1900 Louis Bachelier
Louis Bachelier

Louis Jean-Baptiste Alphonse Bachelier was a French mathematician at the turn of the 20th century. He is credited with being the first person to model Brownian motion, which was part of his PhD thesis The Theory of Speculation, ....
 proposed representing price changes of stock
STOCK

Software for fixed assets management and stock control developed in 2004. Stocktaking process is carried using a hand-held mobile terminal equipped with barcode reader or RFID technology....
s using the normal distribution. This approach has since been modified slightly. Because of the multiplicative nature of compounding
Compound interest

Compound interest is the concept of adding accumulated interest back to the principal, so that interest is earned on interest from that moment on....
 of returns, financial indicators such as stock
STOCK

Software for fixed assets management and stock control developed in 2004. Stocktaking process is carried using a hand-held mobile terminal equipped with barcode reader or RFID technology....
 values and commodity
Commodity

A commodity is anything for which there is demand, but which is supplied without qualitative product differentiation across a market. It is a product that is the same no matter who produces it, such as petroleum, notebook paper, or milk....
 price
Price

Price in economics and business is the result of an exchange and from that trade we assign a numerical monetary Value to a product , Service or asset....
s exhibit "multiplicative behavior". As such, their periodic changes (e.g., yearly changes) are not normal, but rather lognormal - i.e. logarithmic returns as opposed to values are normally distributed. This is still the most commonly used hypothesis in finance
Finance

The field of finance refers to the concepts of time, money and risk and how they are interrelated. Banks are the main facilitators of funding through the provision of credit, although private equity, mutual funds, hedge funds, and other organizations have become important....
, in particular in option pricing in the Black–Scholes model.

However, in reality financial variables exhibit heavy tails, and thus the assumption of normality understates the probability of extreme events such as stock market crashes. Corrections to this model have been suggested by mathematicians such as Benoît Mandelbrot
Benoît Mandelbrot

Beno?t B. Mandelbrot is a French people mathematics, best known as the father of fractal. He is Sterling Professor of Mathematical Sciences, Emeritus at Yale University; IBM Fellow Emeritus at the Thomas J....
, who observed that the changes in logarithm over short periods (such as a day) are approximated well by distributions that do not have a finite variance, and therefore the central limit theorem does not apply. Rather, the sum of many such changes gives log-Levy distribution
Levy skew alpha-stable distribution

In probability theory, a random variable is said to be stable if it has the property that a linear combination of two independent copies of the variable has the same probability distribution, up to location parameter and scale parameter parameters....
s.

Distribution in testing and intelligence


Sometimes, the difficulty and number of questions on an IQ
Intelligence quotient

An Intelligence Quotient or IQ is a score derived from one of several different standardized tests attempting to measure intelligence. The term "IQ," a calque of the German language Intelligenz-Quotient, was coined by the German psychologist William Stern in 1912 as a proposed method of scoring early modern children's intelligenc...
 test is selected in order to yield normal distributed results. Or else, the raw test scores are converted to IQ values by fitting them to the normal distribution. In either case, it is the deliberate result of test construction or score interpretation that leads to IQ scores being normally distributed for the majority of the population. However, the question whether intelligence itself is normally distributed is more involved, because intelligence is a latent variable
Latent variable

In statistics, latent variables , are variables that are not directly observed but are rather inferred from other variables that are observed and directly measured....
, therefore its distribution cannot be observed directly.

Diffusion equation

The probability density function of the normal distribution is closely related to the (homogeneous and isotropic) diffusion equation
Diffusion equation

The diffusion equation is a partial differential equation which describes density fluctuations in a material undergoing diffusion. It is also used to describe processes exhibiting diffusive-like behaviour, for instance the 'diffusion' of alleles in a population in population genetics....
 and therefore also to the heat equation
Heat equation

The heat equation is an important partial differential equation which describes the distribution of heat in a given region over time. For a function u of three spatial variables and the time variable t, the heat equation is...
. This partial differential equation
Partial differential equation

In mathematics, partial differential equations are a type of differential equation, i.e., a Relation involving an unknown Function of several independent variables and its partial derivatives with respect to those variables....
 describes the time evolution of a mass-density function under diffusion
Diffusion

Molecular diffusion, often called simply diffusion, is a net transport of molecules from a region of higher concentration to one of lower concentration by random molecular motion....
. In particular, the probability density function



for the normal distribution with expected value 0 and variance t satisfies the diffusion equation:



If the mass-density at time t = 0 is given by a Dirac delta, which essentially means that all mass is initially concentrated in a single point, then the mass-density function at time t will have the form of the normal probability density function with variance linearly growing with t. This connection is no coincidence: diffusion is due to Brownian motion
Brownian motion

Brownian motion is the seemingly random movement of particles suspended in a liquid or gas or the mathematical model used to describe such random movements, often called a particle theory....
 which is mathematically described by a Wiener process
Wiener process

In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called Brownian motion, after Robert Brown ....
, and such a process at time t will also result in a normal distribution with variance linearly growing with t.

More generally, if the initial mass-density is given by a function f(x), then the mass-density at time t will be given by the convolution
Convolution

In mathematics and, in particular, functional analysis, convolution is a mathematical operator on two function s f and g, producing a third function that is typically viewed as a modified version of one of the original functions....
 of f and a normal probability density function.

Use in computational statistics


Generating values for normal random variables

For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal cdf. More efficient methods are also known, one such method being the Box-Muller transform
Box-Muller transform

A Box-Muller transform is a method of generating pairs of statistical independence standard normal distribution random numbers, given a source of uniform distribution random numbers....
. An even faster algorithm is the ziggurat algorithm
Ziggurat algorithm

The ziggurat algorithm is an algorithm to generate random numbers from a non-uniform distribution . It belongs to the class of rejection sampling algorithms and can be used for choosing values from a Monotonic function probability distribution....
. These are discussed below. A simple approach that is easy to program is as follows. Simply sum 12 uniform (0,1) deviates and subtract 6 (half of 12). This is quite usable in many applications. The sum over these 12 values has an Irwin-Hall distribution
Irwin-Hall distribution

In probability theory and statistics, the Irwin-Hall distribution is a continuous probability distribution of the sum of n i.i.d. Uniform distribution random variables:...
; 12 is chosen to give the sum a variance of exactly one. The resulting random deviates are limited to the range (−6, 6) and have a density which is a 12-section eleventh-order polynomial approximation to the normal distribution.

The Box-Muller method
Box-Muller transform

A Box-Muller transform is a method of generating pairs of statistical independence standard normal distribution random numbers, given a source of uniform distribution random numbers....
 says that, if you have two independent random numbers U and V uniformly distributed
Uniform distribution

Uniform distribution can refer to:...
 on (0, 1], (e.g. the output from a random number generator), then two independent standard normally distributed random variables are X and Y, where:

This formulation arises because the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential
Exponential distribution

In probability theory and statistics, the exponential distributions are a class of continuous probability distributions. They describe the times between events in a Poisson process, i.e....
 random variable (which corresponds to the quantity lnU in these equations). Thus an angle is chosen uniformly around the circle via the random variable V, a radius is chosen to be exponential and then transformed to (normally distributed) x and y coordinates.

A method that is much faster than the Box-Muller transform but which is still exact is the so-called Ziggurat algorithm
Ziggurat algorithm

The ziggurat algorithm is an algorithm to generate random numbers from a non-uniform distribution . It belongs to the class of rejection sampling algorithms and can be used for choosing values from a Monotonic function probability distribution....
 developed by George Marsaglia
George Marsaglia

George Marsaglia is a mathematician and computer scientist. He is perhaps best known for establishing the lattice structure of congruential random number generators in the paper "Random numbers fall mainly in the planes",...
. In about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the "core of the ziggurat" a kind of rejection sampling using logarithms, exponentials and more uniform random numbers has to be employed.

There is also some investigation into the connection between the fast Hadamard transform
Hadamard transform

The Hadamard transform is an example of a generalized class of Fourier transforms. It is named for the France mathematician Jacques Solomon Hadamard, the German-American mathematician Hans Adolph Rademacher, and the American mathematician Joseph Leonard Walsh....
 and the normal distribution, since the transform employs just addition and subtraction and by the central limit theorem random numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally-distributed data.

Numerical approximations of the normal distribution and its cdf


The normal distribution function is widely used in scientific and statistical computing. Therefore, it has been implemented in various ways.

The GNU Scientific Library
GNU Scientific Library

In computing, the GNU Scientific Library is a software library written in the C for numerical calculations in applied mathematics and science....
 calculates values of the standard normal cdf using piecewise
Piecewise

In mathematics, a piecewise-defined function is a function whose definition is dependent on the value of the independent variable. Mathematically, a real number-valued function f of a real variable x is a relationship whose definition is given differently on disjoint subsets of its domain ....
 approximations by rational function
Rational function

In mathematics, a rational function is any function which can be written as the ratio of two polynomial functions....
s. Another approximation method uses third-degree polynomials on intervals. The article on the bc programming language
Bc programming language

bc is "an arbitrary precision calculator language" with syntax similar to the C . It is generally used by typing the command bc on a Unix command prompt and entering a mathematical expression, such as * 2, whereupon 8 will be output....
 gives an example of how to compute the cdf in Gnu bc.

For a more detailed discussion of how to calculate the normal distribution, see Knuth
Donald Knuth

Donald Ervin Knuth is a renowned computer science and Emeritus of the Art of Computer Programming at Stanford University.Author of the seminal multi-volume work The Art of Computer Programming , Knuth has been called the "father" of the run-time analysis, contributing to the development of, and systematizing formal mathematical techn...
's The Art of Computer Programming
The Art of Computer Programming

The Art of Computer Programming is a comprehensive monograph written by Donald Knuth that covers many kinds of programming algorithms and their analysis....
, section 3.4.1C.

See also

  • Behrens–Fisher problem
  • Bell curve grading
    Bell curve grading

    In education, grading on a bell curve is a method of assigning grades designed to yield a desired distribution of grades among the students in a class....
  • Central limit theorem
    Central limit theorem

    The central limit theorem states that the re-averaged sum of a sufficiently large number of Independent and identically-distributed random variables Statistical independence random variables each with finite mean and variance will be approximately normal distribution ....
     - re-averaged sum of a sufficiently large number of identically distributed independent random variables each with finite mean and variance will be approximately normally distributed
  • Chi square distribution
  • Data transformation (statistics)
    Data transformation (statistics)

    In statistics, data transformation is carried in order to Transformation the data and ensure that it has a normal distribution . This is also known as transformation to linearity....
     - simple techniques to transform data into normal distribution
  • Erdos-Kac theorem, on the occurrence of the normal distribution in number theory
    Number theory

    Number theory is the branch of pure mathematics concerned with the properties of numbers in general, and integers in particular, as well as the wider classes of problems that arise from their study....
  • Gaussian blur
    Gaussian blur

    Gaussian blur describes blurring an image by a Gaussian function. It is a widely used effect in graphics software, typically to reduce and reduce detail....
    , convolution
    Convolution

    In mathematics and, in particular, functional analysis, convolution is a mathematical operator on two function s f and g, producing a third function that is typically viewed as a modified version of one of the original functions....
     using the normal distribution as a kernel
  • Gaussian function
    Gaussian function

    In mathematics, a Gaussian function is a function of the form:for some real number constants a > 0, b, c > 0, and e ? 2.718281828 ....
  • Gaussian process
    Gaussian process

    In the mathematical theory of probability, a Gaussian process is a stochastic process t ?T for which any finite linear combination of sampling will be normal distribution ....
    • Wiener process
      Wiener process

      In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called Brownian motion, after Robert Brown ....
    • Brownian bridge
      Brownian bridge

      A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0....
    • Ornstein-Uhlenbeck process
      Ornstein-Uhlenbeck process

      In mathematics, the Ornstein?Uhlenbeck process , also known as the mean-reverting process, is a stochastic process rt given by the following stochastic differential equation:...
  • Iannis Xenakis
    Iannis Xenakis

    Iannis Xenakis was a Greeks modernist composer, musical theoretician, and architect. He is regarded as an important and influential composer of the twentieth century....
    , Gaussian distribution in music
    Music

    Music is an art form whose media is sound organized in time. Common elements of music are pitch , rhythm , dynamics , and the sonic qualities of timbre and texture ....
    .
  • Inverse Gaussian distribution
    Inverse Gaussian distribution

    In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on ....
  • Logit
    Logit

    The logit function is the inverse of the "sigmoid", or logistic function used in mathematics, especially in statistics. The logit of a number p between 0 and 1 is given by the formula:...
     function
  • Lognormal distribution
  • Multivariate normal distribution
    Multivariate normal distribution

    In probability theory and statistics, a multivariate normal distribution, sometimes also called a multivariate Gaussian distribution, is a generalization of the one-dimensional normal distribution to higher dimensions....
  • Matrix normal distribution
    Matrix normal distribution

    The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrix-valued random variables....
  • Normal-gamma distribution
    Normal-gamma distribution

    In probability theory and statistics, the normal-gamma distribution is a four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision....
  • Normally distributed and uncorrelated does not imply independent
    Normally distributed and uncorrelated does not imply independent

    In probability theory, two random variables being correlation does not imply their statistical independence. In some contexts, uncorrelatedness implies at least pairwise independence ....
     (an example of two normally distributed uncorrelated random variables that are not independent; this cannot happen in the presence of joint normality
    Multivariate normal distribution

    In probability theory and statistics, a multivariate normal distribution, sometimes also called a multivariate Gaussian distribution, is a generalization of the one-dimensional normal distribution to higher dimensions....
    )
  • Pearson distribution
    Pearson distribution

    The Pearson distribution is a family of continuous probability distribution probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics....
     Generalized family of probability distributions that extend the Gaussian distribution to include different skewness and kurtosis values
  • Probit function
  • Sample size
    Sample size

    The sample size of a statistical sample is the number of observations that constitute it. It is typically denoted n, a positive integer ....
  • Skew normal distribution
    Skew normal distribution

    In probability theory and statistics, the skew normal distribution is a continuous probability distribution that generalises the normal distribution to allow for non-zero skewness....
  • Student's t-distribution
    Student's t-distribution

    In probability and statistics, Student's t-distribution is a probability distribution that arises in the problem of estimating the expected value of a normal distribution Statistical population when the sample size is small....
  • Truncated normal distribution
    Truncated normal distribution

    In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above ....
  • Tweedie distributions
    Tweedie distributions

    In probability and statistics, the Tweedie distributions are a family of probability distributions which include continuous distributions such as the normal distribution and gamma distribution, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-Gamma distributions which have positive mass at zero, but are...


External links

The normal distribution
  • .


Online results and applications
  • – Calculates probabilities and critical values for normal, t
    Student's t-distribution

    In probability and statistics, Student's t-distribution is a probability distribution that arises in the problem of estimating the expected value of a normal distribution Statistical population when the sample size is small....
    , chi-square
    Chi-square distribution

    In probability theory and statistics, the chi-square distribution is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests....
     and F-distribution
    F-distribution

    In probability theory and statistics, the F-distribution is a continuous probability distribution probability distribution. It is also known as Snedecor's F distribution or the Fisher-Snedecor distribution ....
    .
  • .
  • from Daniel Soper's Free Statistics Calculators website.
  • Quickly Visualize the one and two-tailed area of the Standard Normal Curve


Algorithms and approximations
  • , sitmo.com
  • by Peter J. Acklam – has examples for several programming language
    Programming language

    A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
    s
  • , gatech.edu
  • , Abramowitz and Stegun