ProbabilityProbability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
theory is the branch of
mathematicsMathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
concerned with analysis of
randomA numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll, or the digits of π exhibit statistical randomness....
phenomena. The central objects of probability theory are
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s,
stochastic processIn probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
es, and
eventIn probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
s: mathematical abstractions of
nondeterministicDeterminism is the general philosophical thesis that states that for everything that happens there are conditions such that, given them, nothing else could happen. There are many versions of this thesis. Each of them rests upon various alleged connections, and interdependencies of things and...
events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion. If an individual coin toss or the roll of
diceA die is a small throwable object with multiple resting positions, used for generating random numbers...
is considered to be a random event, then if repeated many times the sequence of random events will exhibit certain patterns, which can be studied and predicted. Two representative mathematical results describing such patterns are the
law of large numbersIn probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...
and the
central limit theoremIn probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
.
As a mathematical foundation for
statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, probability theory is essential to many human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, as in
statistical mechanicsStatistical mechanics or statistical thermodynamicsThe terms statistical mechanics and statistical thermodynamics are used interchangeably...
. A great discovery of twentieth century
physicsPhysics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...
was the probabilistic nature of physical phenomena at atomic scales, described in
quantum mechanicsQuantum mechanics, also known as quantum physics or quantum theory, is a branch of physics providing a mathematical description of much of the dual particlelike and wavelike behavior and interactions of energy and matter. It departs from classical mechanics primarily at the atomic and subatomic...
.
History
The mathematical theory of
probabilityProbability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
has its roots in attempts to analyze
games of chanceA game of chance is a game whose outcome is strongly influenced by some randomizing device, and upon which contestants may or may not wager money or anything of monetary value...
by
Gerolamo CardanoGerolamo Cardano was an Italian Renaissance mathematician, physician, astrologer and gambler...
in the sixteenth century, and by
Pierre de FermatPierre de Fermat was a French lawyer at the Parlement of Toulouse, France, and an amateur mathematician who is given credit for early developments that led to infinitesimal calculus, including his adequality...
and
Blaise Pascal Blaise Pascal , was a French mathematician, physicist, inventor, writer and Catholic philosopher. He was a child prodigy who was educated by his father, a tax collector in Rouen...
in the seventeenth century (for example the "
problem of pointsThe problem of points, also called the problem of division of the stakes, is a classical problem in probability theory. One of the famous problems that motivated the beginnings of modern probability theory in the 17th century, it led Blaise Pascal to the first explicit reasoning about what today is...
").
Christiaan Huygens published a book on the subject in 1657.
Initially, probability theory mainly considered
discrete events, and its methods were mainly
combinatorialCombinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include counting the structures of a given kind and size , deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria ,...
. Eventually,
analyticalMathematical analysis, which mathematicians refer to simply as analysis, has its beginnings in the rigorous formulation of infinitesimal calculus. It is a branch of pure mathematics that includes the theories of differentiation, integration and measure, limits, infinite series, and analytic functions...
considerations compelled the incorporation of
continuous variables into the theory.
This culminated in modern probability theory, on foundations laid by Andrey Nikolaevich Kolmogorov. Kolmogorov combined the notion of
sample space, introduced by Richard von Mises, and
measure theory and presented his axiom system for probability theory in 1933. Fairly quickly this became the mostly undisputed axiomatic basis for modern probability theory but alternatives exist, in particular the adoption of finite rather than countable additivity by
Bruno de FinettiBruno de Finetti was an Italian probabilist, statistician and actuary, noted for the "operational subjective" conception of probability...
.
Treatment
Most introductions to probability theory treat discrete probability distributions and continuous probability distributions separately. The more mathematically advanced measure theory based treatment of probability covers both the discrete, the continuous, any mix of these two and more.
Motivation
Consider an experiment that can produce a number of outcomes. The collection of all results is called the
sample space of the experiment. The
power set of the sample space is formed by considering all different collections of possible results. For example, rolling a die produces one of six possible results. One collection of possible results corresponds to getting an odd number. Thus, the subset {1,3,5} is an element of the power set of the sample space of die rolls. These collections are called
events. In this case, {1,3,5} is the event that the die falls on some odd number. If the results that actually occur fall in a given event, that event is said to have occurred.
Probability is a
way of assigningIn mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
every "event" a value between zero and one, with the requirement that the event made up of all possible results (in our example, the event {1,2,3,4,5,6}) be assigned a value of one. To qualify as a
probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
, the assignment of values must satisfy the requirement that if you look at a collection of mutually exclusive events (events that contain no common results, e.g., the events {1,6}, {3}, and {2,4} are all mutually exclusive), the probability that at least one of the events will occur is given by the sum of the probabilities of all the individual events.
The probability that any one of the events {1,6}, {3}, or {2,4} will occur is 5/6. This is the same as saying that the probability of event {1,2,3,4,6} is 5/6. This event encompasses the possibility of any number except five being rolled. The mutually exclusive event {5} has a probability of 1/6, and the event {1,2,3,4,5,6} has a probability of 1  absolute certainty. For convenience's sake, we ignore the possibility that the die, once rolled, will be obliterated before it can hit the table.
Discrete probability distributions
Discrete probability theory deals with events that occur in countable sample spaces.
Examples: Throwing
diceA die is a small throwable object with multiple resting positions, used for generating random numbers...
, experiments with decks of cards, and
random walkA random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...
.
Classical definition:
Initially the probability of an event to occur was defined as number of cases favorable for the event, over the number of total outcomes possible in an equiprobable sample space: see
Classical definition of probabilityThe classical definition of probability is identified with the works of PierreSimon Laplace. As stated in his Théorie analytique des probabilités,This definition is essentially a consequence of the principle of indifference...
.
For example, if the event is "occurrence of an even number when a die is rolled", the probability is given by
, since 3 faces out of the 6 have even numbers and each face has the same probability of appearing.
Modern definition:
The modern definition starts with a
finite or countable setIn mathematics, a countable set is a set with the same cardinality as some subset of the set of natural numbers. A set that is not countable is called uncountable. The term was originated by Georg Cantor...
called the
sample space, which relates to the set of all
possible outcomes in classical sense, denoted by
. It is then assumed that for each element
, an intrinsic "probability" value
is attached, which satisfies the following properties:
That is, the probability function
f(
x) lies between zero and one for every value of
x in the sample space
Ω, and the sum of
f(
x) over all values
x in the sample space
Ω is equal to 1. An
eventIn probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
is defined as any
subsetIn mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...
of the sample space
. The
probability of the event
is defined as
So, the probability of the entire sample space is 1, and the probability of the null event is 0.
The function
mapping a point in the sample space to the "probability" value is called a
probability mass functionIn probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
abbreviated as
pmf. The modern definition does not try to answer how probability mass functions are obtained; instead it builds a theory that assumes their existence.
Continuous probability distributions
Continuous probability theory deals with events that occur in a continuous sample space.
Classical definition:
The classical definition breaks down when confronted with the continuous case. See
Bertrand's paradoxThe Bertrand paradox is a problem within the classical interpretation of probability theory. Joseph Bertrand introduced it in his work Calcul des probabilités as an example to show that probabilities may not be well defined if the mechanism or method that produces the random variable is not...
.
Modern definition:
If the outcome space of a random variable
X is the set of real numbers (
) or a subset thereof, then a function called the
cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(or
cdf)
exists, defined by
. That is,
F(
x) returns the probability that
X will be less than or equal to
x.
The cdf necessarily satisfies the following properties.
 is a monotonically nondecreasing
In mathematics, a monotonic function is a function that preserves the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order theory....
, rightcontinuous function;
If
is absolutely continuous, i.e., its derivative exists and integrating the derivative gives us the cdf back again, then the random variable
X is said to have a
probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
or
pdf or simply
density
For a set
, the probability of the random variable
X being in
is
In case the probability density function exists, this can be written as
Whereas the
pdf exists only for continuous random variables, the
cdf exists for all random variables (including discrete random variables) that take values in
These concepts can be generalized for
multidimensionalIn physics and mathematics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...
cases on
and other continuous sample spaces.
Measuretheoretic probability theory
The
raison d'être of the measuretheoretic treatment of probability is that it unifies the discrete and the continuous cases, and makes the difference a question of which measure is used. Furthermore, it covers distributions that are neither discrete nor continuous nor mixtures of the two.
An example of such distributions could be a mix of discrete and continuous distributions—for example, a random variable that is 0 with probability 1/2, and takes a random value from a normal distribution with probability 1/2. It can still be studied to some extent by considering it to have a pdf of
, where
is the
Dirac delta functionThe Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...
.
Other distributions may not even be a mix, for example, the
Cantor distribution has no positive probability for any single point, neither does it have a density. The modern approach to probability theory solves these problems using measure theory to define the
probability spaceIn probability theory, a probability space or a probability triple is a mathematical construct that models a realworld process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...
:
Given any set
, (also called
sample space) and a
σalgebraIn mathematics, a σalgebra is a technical concept for a collection of sets satisfying certain properties. The main use of σalgebras is in the definition of measures; specifically, the collection of sets over which a measure is defined is a σalgebra...
on it, a
measureIn mathematical analysis, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, and volume...
defined on
is called a
probability measure if
If
is the
Borel σalgebraIn mathematics, a Borel set is any set in a topological space that can be formed from open sets through the operations of countable union, countable intersection, and relative complement...
on the set of real numbers, then there is a unique probability measure on
for any cdf, and vice versa. The measure corresponding to a cdf is said to be
induced by the cdf. This measure coincides with the pmf for discrete variables, and pdf for continuous variables, making the measuretheoretic approach free of fallacies.
The
probability of a set
in the σalgebra
is defined as
where the integration is with respect to the measure
induced by
Along with providing better understanding and unification of discrete and continuous probabilities, measuretheoretic treatment also allows us to work on probabilities outside
, as in the theory of
stochastic processIn probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
es. For example to study
Brownian motionBrownian motion or pedesis is the presumably random drifting of particles suspended in a fluid or the mathematical model used to describe such random movements, which is often called a particle theory.The mathematical model of Brownian motion has several realworld applications...
, probability is defined on a space of functions.
Probability distributions
Certain random variables occur very often in probability theory because they well describe many natural or physical processes. Their distributions therefore have gained
special importance in probability theory. Some fundamental
discrete distributions are the
discrete uniform,
Bernoulli,
binomial,
negative binomialIn probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
,
PoissonIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
and
geometric distributions. Important
continuous distributions include the
continuous uniformIn probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...
,
normal,
exponentialIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
,
gamma and
beta distributions.
Convergence of random variables
In probability theory, there are several notions of convergence for
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s. They are listed below in the order of strength, i.e., any subsequent notion of convergence in the list implies convergence according to all of the preceding notions.
 Weak convergence: A sequence of random variables converges weakly to the random variable if their respective cumulative distribution functions converge to the cumulative distribution function of , wherever is continuous
In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...
. Weak convergence is also called convergence in distribution.

 Most common short hand notation:
 Convergence in probability: The sequence of random variables is said to converge towards the random variable in probability if for every ε > 0.

 Most common short hand notation:
 Strong convergence: The sequence of random variables is said to converge towards the random variable strongly if . Strong convergence is also known as almost sure convergence.

 Most common short hand notation:
As the names indicate, weak convergence is weaker than strong convergence. In fact, strong convergence implies convergence in probability, and convergence in probability implies weak convergence. The reverse statements are not always true.
Law of large numbers
Common intuition suggests that if a fair coin is tossed many times, then
roughly half of the time it will turn up
heads, and the other half it will turn up
tails. Furthermore, the more often the coin is tossed, the more likely it should be that the ratio of the number of
heads to the number of
tails will approach unity. Modern probability provides a formal version of this intuitive idea, known as the
law of large numbers. This law is remarkable because it is nowhere assumed in the foundations of probability theory, but instead emerges out of these foundations as a theorem. Since it links theoretically derived probabilities to their actual frequency of occurrence in the real world, the law of large numbers is considered as a pillar in the history of statistical theory.
The
law of large numbers (LLN) states that the sample average
of a sequence of independent and
identically distributed random variables
converges towards their common expectation
, provided that the expectation of
is finite.
It is in the different forms of
convergence of random variablesIn probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes...
that separates the
weak and the
strong law of large numbers
It follows from the LLN that if an event of probability
p is observed repeatedly during independent experiments, the ratio of the observed frequency of that event to the total number of repetitions converges towards
p.
For example, if
are independent
Bernoulli random variables taking values 1 with probability
p and 0 with probability 1
p, then
for all
i, so that
converges to
p almost surelyIn probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
.
Central limit theorem
"The central limit theorem (CLT) is one of the great results of mathematics." (Chapter 18 in )
It explains the ubiquitous occurrence of the
normal distribution in nature.
The theorem states that the
averageIn mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
of many independent and identically distributed random variables with finite variance tends towards a normal distribution
irrespective of the distribution followed by the original random variables. Formally, let
be independent random variables with
meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
and
varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
Then the sequence of random variables
converges in distribution to a standard normal random variable.
See also
 Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
and VarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
 Fuzzy logic
Fuzzy logic is a form of manyvalued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have twovalued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
and Fuzzy measure theoryFuzzy measure theory considers a number of special classes of measures, each of which is characterized by a special property. Some of the measures used in this theory are plausibility and belief measures, fuzzy set membership function and the classical probability measures...
 Glossary of probability and statistics
The following is a glossary of terms. It is not intended to be allinclusive. Concerned fields :*Probability theory*Algebra of random variables *Statistics*Measure theory*Estimation theory Glossary :...
 Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
 List of probability topics
 Catalog of articles in probability theory
This page lists articles related to Probability theory. In particular, it lists many articles corresponding to specific probability distributions. Such articles are marked here by a code of the form , which refers to number of random variables involved and the type of the distribution. For example ...
 List of publications in statistics
 List of statistical topics
 Probabilistic proofs of nonprobabilistic theorems
Probability theory routinely uses results from other fields of mathematics . The opposite cases, collected below, are relatively rare; however, probability theory is used systematically in combinatorics via the probabilistic method. They are particularly used for nonconstructive proofs.Analysis:*...
 Notation in probability
Probability theory and statistics has some commonly used conventions of its own, in addition to standard mathematical notation and mathematical symbols.Probability theory:* Random variables are usually written in upper case roman letters: X, Y, etc....
 Predictive modelling
Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an...
 Probabilistic logic
The aim of a probabilistic logic is to combine the capacity of probability theory to handle uncertainty with the capacity of deductive logic to exploit structure. The result is a richer and more expressive formalism with a broad range of possible application areas...
– A combination of probability theory and logic
 Probability axioms
In probability theory, the probability P of some event E, denoted P, is usually defined in such a way that P satisfies the Kolmogorov axioms, named after Andrey Kolmogorov, which are described below....
 Probability interpretations
The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we...
 Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
 Subjective logic
Subjective logic is a type of probabilistic logic that explicitly takes uncertainty and belief ownership into account. In general, subjective logic is suitable for modeling and analysing situations involving uncertainty and incomplete knowledge...
External Links