In
probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single...
, a
probability mass,
probability density, or
probability distribution is a function that describes the
probabilityProbability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
of a
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
taking certain values.
For a more precise definition one needs to distinguish between
discrete and
continuous random variables. In the discrete case, one can easily assign a probability to each possible value: when throwing a , each of the six values
1 to
6 has the probability 1/6. In contrast, when a random variable takes values from a continuum, probabilities are nonzero only if they refer to finite intervals: in quality control one might demand that the probability of a "500 g" package containing between 500 g and 510 g should be no less than 98%.
If
total orderIn set theory, a total order, linear order, simple order, or ordering is a binary relation on some set X. The relation is transitive, antisymmetric, and total...
is defined for the random variable, the
cumulative distribution function gives the probability that the random variable is not larger than a given value; it is the
integralIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...
of the noncumulative distribution.
Terminology
As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing. The following terms are used for noncumulative probability distribution functions:
 Probability mass, Probability mass function, p.m.f.: for discrete random variables.
 Categorical distribution: for discrete random variables with a finite set of values.
 Probability density, Probability density function, p.d.f: Most often reserved for continuous random variables.
The following terms are somewhat ambiguous as they can refer to noncumulative or cumulative distributions, depending on authors' preferences:
 Probability distribution function: Continuous or discrete, noncumulative or cumulative.
 Probability function: Even more ambiguous, can mean any of the above, or anything else.
Finally,
 Probability distribution: Either the same as probability distribution function. Or understood as something more fundamental underlying an actual mass or density function.
Basic terms
 Mode: most frequently occurring value in a distribution
 Tail: region of least frequently occurring values in a distribution
Discrete probability distribution
A
discrete probability distribution shall be understood as a
probability distribution characterized by a
probability mass functionIn probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
. Thus, the distribution of a
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
X is discrete, and
X is then called a
discrete random variable, if
as
u runs through the set of all possible values of
X. It follows that such a random variable can assume only a
finite or countably infinite number of values.
In cases more frequently considered, this set of possible values is a topologically discrete set in the sense that all its points are
isolated pointIn topology, a branch of mathematics, a point x of a set S is called an isolated point of S, if there exists a neighborhood of x not containing other points of S.In particular, in a Euclidean space ,...
s. But there are discrete random variables for which this countable set is
denseIn topology and related areas of mathematics, a subset A of a topological space X is called dense if any point x in X belongs to A or is a limit point of A...
on the real line (for example, a distribution over
rational numberIn mathematics, a rational number is any number that can be expressed as the quotient or fraction a/b of two integers, with the denominator b not equal to zero. Since b may be equal to 1, every integer is a rational number...
s).
Among the most wellknown discrete probability distributions that are used for statistical modeling are the
Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
, the
Bernoulli distribution, the
binomial distribution, the
geometric distribution, and the
negative binomial distributionIn probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
. In addition, the
discrete uniform distribution is commonly used in computer programs that make equalprobability random selections between a number of choices.
Cumulative density
Equivalently to the above, a discrete random variable can be defined as a random variable whose
cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(cdf) increases only by jump discontinuities—that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take. The number of such jumps may be finite or countably infinite. The set of locations of such jumps need not be topologically discrete; for example, the cdf might jump at each
rational numberIn mathematics, a rational number is any number that can be expressed as the quotient or fraction a/b of two integers, with the denominator b not equal to zero. Since b may be equal to 1, every integer is a rational number...
.
Deltafunction representation
Consequently, a discrete probability distribution is often represented as a generalized
probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
involving
Dirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...
s, which substantially unifies the treatment of continuous and discrete distributions. This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.
Indicatorfunction representation
For a discrete random variable
X, let
u_{0},
u_{1}, ... be the values it can take with nonzero probability. Denote
These are disjoint sets, and by formula (1)
It follows that the probability that
X takes any value except for
u_{0},
u_{1}, ... is zero, and thus one can write
X as
except on a set of probability zero, where
is the
indicator function of
A. This may serve as an alternative definition of discrete random variables.
Continuous probability distribution
A
continuous probability distribution shall be understood as a
probability distribution that has a
probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
. Mathematicians also call such distribution
absolutely continuous, since its
cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
is
absolutely continuousIn mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...
with respect to the
Lebesgue measureIn measure theory, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ndimensional Euclidean space. For n = 1, 2, or 3, it coincides with the standard measure of length, area, or volume. In general, it is also called...
λ. If the distribution of
X is continuous, then
X is called a
continuous random variable. There are many examples of continuous probability distributions:
normal,
uniformIn probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...
, chisquared, and others.
Intuitively, a continuous random variable is the one which can take a continuous range of values — as opposed to a discrete distribution, where the set of possible values for the random variable is at most
countableIn mathematics, a countable set is a set with the same cardinality as some subset of the set of natural numbers. A set that is not countable is called uncountable. The term was originated by Georg Cantor...
. While for a discrete distribution an
eventIn probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
with
probabilityProbability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
zero is impossible (e.g. rolling 3½ on a standard die is impossible, and has probability zero), this is not so in the case of a continuous random variable. For example, if one measures the width of an oak leaf, the result of 3½ cm is possible, however it has probability zero because there are uncountably many other potential values even between 3 cm and 4 cm. Each of these individual outcomes has probability zero, yet the probability that the outcome will fall into the
intervalIn mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...
is nonzero. This apparent
paradoxSimilar to Circular reasoning, A paradox is a seemingly true statement or group of statements that lead to a contradiction or a situation which seems to defy logic or intuition...
is resolved by the fact that the probability that
X attains some value within an infinite set, such as an interval,
cannot be found by naively addingIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...
the probabilities for individual values. Formally, each value has an
infinitesimalInfinitesimals have been used to express the idea of objects so small that there is no way to see them or to measure them. The word infinitesimal comes from a 17th century Modern Latin coinage infinitesimus, which originally referred to the "infiniteth" item in a series.In common speech, an...
ly small probability, which
statistically is equivalentIn probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
to zero.
Formally, if
X is a continuous random variable, then it has a
probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
ƒ(
x), and therefore its probability to fall into a given interval, say is given by the integral

In particular, the probability for X to take any single value a (that is ) is zero, because an integralIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...
with coinciding upper and lower limits is always equal to zero.
The definition states that a continuous probability distribution must possess a density, or equivalently, its cumulative distribution function be absolutely continuous. This requirement is stronger than simple continuity of the cdf, and there is a special class of distributions, singular distributionIn probability, a singular distribution is a probability distribution concentrated on a set of Lebesgue measure zero, where the probability of each point in that set is zero. These distributions are sometimes called singular continuous distributions...
s, which are neither continuous nor discrete nor their mixture. An example is given by the Cantor distribution. Such singular distributions however are never encountered in practice.
Note on terminology: some authors use the term"continuous distribution" to denote the distribution with continuous cdf. Thus, their definition includes both the (absolutely) continuous and singular distributions.
By one convention, a probability distribution is called continuous if its cumulative distribution function is continuousIn mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...
and, therefore, the probability measure of singletons for all .
Another convention reserves the term continuous probability distribution for absolutely continuousIn mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...
distributions. These distributions can be characterized by a probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
: a nonnegative Lebesgue integrableIn mathematics, Lebesgue integration, named after French mathematician Henri Lebesgue , refers to both the general theory of integration of a function with respect to a general measure, and to the specific case of integration of a function defined on a subset of the real line or a higher...
function defined on the real numbers such that
Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.
Probability distributions of realvalued random variables
Because a probability distribution Pr on the real line is determined by the probability of a realvalued random variable X being in a halfopen interval (∞, x], the probability distribution is completely characterized by its cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
:

Terminology
The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be understood as the points or elements that are actual members of the distribution.
Some properties
 The probability density function of the sum of two independent random variables is the convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to crosscorrelation...
of each of their density functions.
 The probability density function of the difference of two independent random variables is the crosscorrelation
In signal processing, crosscorrelation is a measure of similarity of two waveforms as a function of a timelag applied to one of them. This is also known as a sliding dot product or sliding innerproduct. It is commonly used for searching a longduration signal for a shorter, known feature...
of their density functions.
 Probability distributions are not a vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...
– they are not closed under linear combinationIn mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...
s, as these do not preserve nonnegativity or total integral 1 – but they are closed under convex combinationIn convex geometry, a convex combination is a linear combination of points where all coefficients are nonnegative and sum up to 1....
, thus forming a convex subset of the space of functions (or measures).
Random number generation
A frequent problem in statistical simulations (Monte Carlo methodMonte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...
) is the generation of pseudorandom numbersA pseudorandom process is a process that appears to be random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process...
that are distributed in a given way. Most algorithms are based on a pseudorandom number generatorA pseudorandom number generator , also known as a deterministic random bit generator , is an algorithm for generating a sequence of numbers that approximates the properties of random numbers...
that produces numbers X that are uniformly distributed in the interval [0,1). These X are then transformed to some u(X) that satisfy a given distribution f(u).
Kolmogorov definition
In the measuretheoretic formalization of probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single...
, a random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
is defined as a measurable functionIn mathematics, particularly in measure theory, measurable functions are structurepreserving functions between measurable spaces; as such, they form a natural context for the theory of integration...
X from a probability spaceIn probability theory, a probability space or a probability triple is a mathematical construct that models a realworld process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...
to measurable space . A probability distribution is the pushforward measureIn measure theory, a pushforward measure is obtained by transferring a measure from one measurable space to another using a measurable function.Definition:...
X_{*}P = PX^{ −1} on .
Applications
The concept of the probability distribution and the random variables which they describe underlies the mathematical discipline of probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single...
, and the science of statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are made with some intrinsic error; in physicsPhysics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...
many processes are described probabilistically, from the kinetic properties of gasesThe kinetic theory of gases describes a gas as a large number of small particles , all of which are in constant, random motion. The rapidly moving particles constantly collide with each other and with the walls of the container...
to the quantum mechanical description of fundamental particles. For these and many other reasons, simple numberA number is a mathematical object used to count and measure. In mathematics, the definition of number has been extended over the years to include such numbers as zero, negative numbers, rational numbers, irrational numbers, and complex numbers....
s are often inadequate for describing a quantity, while probability distributions are often more appropriate.
As a more specific example of an application, the cache language modelsA cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution...
and other statistical language models used in natural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
to assign probabilities to the occurrence of particular words and word sequences do so by means of probability distributions.
Common probability distributions
The following is a list of some of the most common probability distributions, grouped by the type of process that they are related to. For a more complete list, see list of probability distributions, which groups by the nature of the outcome being considered (discrete, continuous, multivariate, etc.)
Note also that all of the univariate distributions below are singly peaked; that is, it is assumed that the values cluster around a single point. In practice, actually observed quantities may cluster around multiple values. Such quantities can be modeled using a mixture distribution.
Related to realvalued quantities that grow linearly (e.g. errors, offsets)
 Normal distribution (Gaussian distribution), for a single such quantity; the most common continuous distribution
Related to positive realvalued quantities that grow exponentially (e.g. prices, incomes, populations)
 Lognormal distribution, for a single such quantity whose log is normally distributed
 Pareto distribution, for a single such quantity whose log is exponentially
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
distributed; the prototypical power lawA power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...
distribution
Related to realvalued quantities that are assumed to be uniformly distributed over a (possibly unknown) region
 Discrete uniform distribution, for a finite set of values (e.g. the outcome of a fair die)
 Continuous uniform distribution, for continuously distributed values
Related to Bernoulli trials (yes/no events, with a given probability)
 Basic distributions:
 Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no)
 Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total number of independent occurrences
 Negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
, for binomialtype observations but where the quantity of interest is the number of failures before a given number of successes occurs
 Geometric distribution, for binomialtype observations but where the quantity of interest is the number of failures before the first success; a special case of the negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
 Related to sampling schemes over a finite population:
 Hypergeometric distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, using sampling without replacement
 Betabinomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, sampling using a Polya urn scheme (in some sense, the "opposite" of sampling without replacement)
Related to categorical outcomes (events with K possible outcomes, with a given probability for each outcome)
 Categorical distribution
In probability theory and statistics, a categorical distribution is a probability distribution that describes the result of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified...
, for a single categorical outcome (e.g. yes/no/maybe in a survey); a generalization of the Bernoulli distribution
 Multinomial distribution, for the number of each type of catergorical outcome, given a fixed number of total outcomes; a generalization of the binomial distribution
 Multivariate hypergeometric distribution, similar to the multinomial distribution, but using sampling without replacement; a generalization of the hypergeometric distribution
Related to events in a Poisson process (events that occur independently with a given rate)
 Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
, for the number of occurrences of a Poissontype event in a given period of time
 Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
, for the time before the next Poissontype event occurs
Useful for hypothesis testing related to normally distributed outcomes
 Chisquared distribution, the distribution of a sum of squared standard normal variables; useful e.g. for inference regarding the sample variance of normally distributed samples (see chisquared test)
 Student's t distribution, the distribution of the ratio of a standard normal variable and the square root of a scaled chi squared variable; useful for inference regarding the mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
of normally distributed samples with unknown variance (see Student's ttestA ttest is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...
)
 Fdistribution, the distribution of the ratio of two scaled chi squared variables; useful e.g. for inferences that involve comparing variances or involving Rsquared (the squared correlation coefficient
In statistics, the Pearson productmoment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
)
Useful as conjugate prior distributions in Bayesian inference
 Beta distribution, for a single probability (real number between 0 and 1); conjugate to the Bernoulli distribution and binomial distribution
 Gamma distribution, for a nonnegative scaling parameter; conjugate to the rate parameter of a Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
or exponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
, the precisionIn statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....
(inverse varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
) of a normal distribution, etc.
 Dirichlet distribution, for a vector of probabilities that must sum to 1; conjugate to the categorical distribution
In probability theory and statistics, a categorical distribution is a probability distribution that describes the result of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified...
and multinomial distribution; generalization of the beta distribution
 Wishart distribution, for a symmetric nonnegative definite matrix; conjugate to the inverse of the covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...
of a multivariate normal distribution; generalization of the gamma distribution
See also
 Momentgenerating function
In probability theory and statistics, the momentgenerating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...
 Copula (statistics)
In probability theory and statistics, a copula can be used to describe the dependence between random variables. Copulas derive their name from linguistics....
 Histogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
 Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
 List of statistical topics
 Riemann–Stieltjes integral application to probability theory
External links