In
probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single...
, the expected value (or expectation, or mathematical expectation, or mean, or the first moment) of a
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
is the weighted average of all possible values that this random variable can take on. The weights used in computing this average correspond to the
probabilitiesIn probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
in case of a discrete random variable, or
densitiesIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
in case of a continuous random variable. From a rigorous theoretical standpoint, the expected value is the integral of the random variable with respect to its
probability measureIn mathematics, a probability measure is a realvalued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...
.
The expected value may be intuitively understood by the
law of large numbersIn probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...
: The expected value, when it exists, is
almost surelyIn probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
the limit of the sample mean as sample size grows to infinity. More informally, it can be interpreted as the longrun average of the results of many independent repetitions of an experiment (e.g. a dice roll). The value may not be expected in the ordinary sense—the "expected value" itself may be unlikely or even impossible (such as having 2.5 children), just like the sample mean.
The expected value does not exist for some distributions with large "tails", such as the
Cauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
.
It is possible to construct an expected value equal to the probability of an event by taking the expectation of an
indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.
Discrete random variable, finite case
Suppose
random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
X can take value x
_{1} with probability p
_{1}, value x
_{2} with probability p
_{2}, and so on, up to value x
_{k} with probability p
_{k}. Then the expectation of this random variable X is defined as

Since all probabilities p_{i} add up to one: p_{1} + p_{2} + ... + p_{k} = 1, the expected value can be viewed as the weighted average, with p_{i}’s being the weights:

If all outcomes x_{i} are equally likely (that is, p_{1} = p_{2} = ... = p_{k}), then the weighted average turns into the simple averageIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
. This is intuitive: the expected value of a random variable is the average of all values it can take; thus the expected value is what you expect to happen on average. If the outcomes x_{i} are not equiprobable, then the simple average ought to be replaced with the weighted average, which takes into account the fact that some outcomes are more likely than the others. The intuition however remains the same: the expected value of X is what you expect to happen on average.
Example 1. Let X represent the outcome of a roll of a sixsided . More specifically, X will be the number of pips showing on the top face of the after the toss. The possible values for X are 1, 2, 3, 4, 5, 6, all equally likely (each having the probability of ). The expectation of X is

If you roll the n times and compute the average (meanIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
) of the results, then as n grows, the average will almost surelyIn probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
converge to the expected value, a fact known as the strong law of large numbers. One example sequence of ten rolls of the is 2, 3, 1, 2, 5, 6, 2, 2, 2, 6, which has the average of 3.1, with the distance of 0.4 from the expected value of 3.5. The convergence is relatively slow: the probability that the average falls within the range is 21.6% for ten rolls, 46.1% for a hundred rolls and 93.7% for a thousand rolls. See the figure for an illustration of the averages of longer sequences of rolls of the and how they converge to the expected value of 3.5. More generally, the rate of convergence can be roughly quantified by e.g. Chebyshev's inequalityIn probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...
and the BerryEsseen theorem.
Example 2. The rouletteRoulette is a casino game named after a French diminutive for little wheel. In the game, players may choose to place bets on either a single number or a range of numbers, the colors red or black, or whether the number is odd or even....
game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable X represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability ), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be

Discrete random variable, countable case
Let X be a discrete random variable taking values x, x, ... with probabilities p, p, ... respectively. Then the expected value of this random variable is the infinite sum

provided that this series converges absolutelyIn mathematics, a series of numbers is said to converge absolutely if the sum of the absolute value of the summand or integrand is finite...
(that is, the sum must remain finite if we were to replace all xs with their absolute values). If this series does not converge absolutely, we say that the expected value of X does not exist.
For example, suppose random variable X takes values 1, −2, 3, −4, ..., with respective probabilities , , , , ..., where is a normalizing constant that ensures the probabilities sum up to one. Then the infinite sum

converges and its sum is equal to . However it would be incorrect to claim that the expected value of X is equal to this number—in fact E[X] does not exist, as this series does not converge absolutely (see harmonic seriesIn mathematics, the harmonic series is the divergent infinite series:Its name derives from the concept of overtones, or harmonics in music: the wavelengths of the overtones of a vibrating string are 1/2, 1/3, 1/4, etc., of the string's fundamental wavelength...
).
Univariate continuous random variable
If the probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
of X admits a probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
f(x), then the expected value can be computed as

General definition
In general, if X is a random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
defined on a probability spaceIn probability theory, a probability space or a probability triple is a mathematical construct that models a realworld process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...
, then the expected value of X, denoted by E[X], , X or E[X], is defined as Lebesgue integralIn mathematics, Lebesgue integration, named after French mathematician Henri Lebesgue , refers to both the general theory of integration of a function with respect to a general measure, and to the specific case of integration of a function defined on a subset of the real line or a higher...
When this integral exists, it is defined as the expectation of X. Note that not all random variables have a finite expected value, since the integral may not converge absolutely; furthermore, for some it is not defined at all (e.g., Cauchy distributionThe Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
). Two variables with the same probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
will have the same expected value, if it is defined.
It follows directly from the discrete case definition that if X is a constant random variable, i.e. for some fixed real numberIn mathematics, a real number is a value that represents a quantity along a continuum, such as 5 , 4/3 , 8.6 , √2 and π...
b, then the expected value of X is also b.
The expected value of an arbitrary function of X, g(X), with respect to the probability density function ƒ(x) is given by the inner product of ƒ and g:
This is sometimes called the law of the unconscious statisticianIn probability theory and statistics, the law of the unconscious statistician is a theorem used to calculate the expected value of a function g of a random variable X when one knows the probability distribution of X but one does not explicitly know the distribution of g.The form of the law can...
. Using representations as Riemann–Stieltjes integral and integration by partsIn calculus, and more generally in mathematical analysis, integration by parts is a rule that transforms the integral of products of functions into other integrals...
the formula can be restated as
As a special case let α denote a positive real number, then
In particular, for , this reduces to:
if , where F is the cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a realvalued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
of X.
Conventional terminology
 When one speaks of the "expected price", "expected height", etc. one means the expected value of a random variable that is a price, a height, etc.
 When one speaks of the "expected number of attempts needed to get one successful attempt", one might conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected value of the geometric distribution.
Constants
The expected value of a constant is equal to the constant itself; i.e., if c is a constant, then .
Monotonicity
If X and Y are random variables such that almost surelyIn probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
, then .
Linearity
The expected value operator (or expectation operator) E is linear in the sense that
Note that the second result is valid even if X is not statistically independent of Y.
Combining the results from previous three equations, we can see that
for any two random variables X and Y (which need to be defined on the same probability space) and any real numbers and .
Iterated expectation for discrete random variables
For any two discrete random variables X, Y one may define the conditional expectationIn probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
:
which means that E[XY](y) is a function of y.
Then the expectation of X satisfies
















Hence, the following equation holds:
that is,
The right hand side of this equation is referred to as the iterated expectation and is also sometimes called the tower rule or the tower property. This proposition is treated in law of total expectationThe proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, the smoothing theorem, among other names, states that if X is an integrable random variable The proposition in probability theory known as the law of total expectation, ...
.
Iterated expectation for continuous random variables
In the continuousIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
case, the results are completely analogous. The definition of conditional expectation would use inequalities, density functions, and integrals to replace equalities, mass functions, and summations, respectively. However, the main result still holds:
Inequality
If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or equal to that of Y:
If , then .
In particular, if we set Y to X we know and . Therefore we know and . From the linearity of expectation we know .
Therefore the absolute value of expectation of a random variable is less than or equal to the expectation of its absolute value:
Nonmultiplicativity
If one considers the joint probability density function of X and Y, say j(x,y), then the expectation of XY is
In general, the expected value operator is not multiplicative, i.e. E[XY] is not necessarily equal to E[X]·E[Y]. In fact, the amount by which multiplicativity fails is called the covarianceIn probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical. Definition :...
:
Thus multiplicativity holds precisely when , in which case X and Y are said to be uncorrelatedIn probability theory and statistics, two realvalued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...
(independentIn probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
variables are a notable case of uncorrelated variables).
Now if X and Y are independent, then by definition where ƒ and g are the marginal PDFs for X and Y. Then
and .
Observe that independence of X and Y is required only to write , and this is required to establish the second equality above. The third equality follows from a basic application of the FubiniTonelli theoremIn mathematical analysis Fubini's theorem, named after Guido Fubini, is a result which gives conditions under which it is possible to compute a double integral using iterated integrals. As a consequence it allows the order of integration to be changed in iterated integrals.Theorem...
.
Functional noninvariance
In general, the expectation operator and functionsIn mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
of random variables do not commute; that is
A notable inequality concerning this topic is Jensen's inequalityIn mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...
, involving expected values of convex (or concave) functions.
Uses and applications
The expected values of the powers of X are called the momentsIn mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
of X; the moments about the mean of X are expected values of powers of . The moments of some random variables can be used to specify their distributions, via their moment generating functions.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic meanIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residualsIn statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...
(the sum of the squared differences between the observations and the estimateIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
). The law of large numbersIn probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...
demonstrates (under fairly mild conditions) that, as the sizeSample size determination is the act of choosing the number of observations to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample...
of the sample gets larger, the varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
of this estimateIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
gets smaller.
This property is often exploited in a wide variety of applications, including general problems of statistical estimation and machine learningMachine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. where is the indicator function for set , i.e. .
In classical mechanicsIn physics, classical mechanics is one of the two major subfields of mechanics, which is concerned with the set of physical laws describing the motion of bodies under the action of a system of forces...
, the center of massIn physics, the center of mass or barycenter of a system is the average location of all of its mass. In the case of a rigid body, the position of the center of mass is fixed in relation to the body...
is an analogous concept to expectation. For example, suppose X is a discrete random variable with values x_{i} and corresponding probabilities p_{i}. Now consider a weightless rod on which are placed weights, at locations x_{i} along the rod and having masses p_{i} (whose sum is one). The point at which the rod balances is E[X].
Expected values can also be used to compute the varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
, by means of the computational formula for the variance
A very important application of the expectation value is in the field of quantum mechanicsQuantum mechanics, also known as quantum physics or quantum theory, is a branch of physics providing a mathematical description of much of the dual particlelike and wavelike behavior and interactions of energy and matter. It departs from classical mechanics primarily at the atomic and subatomic...
. The expectation value of a quantum mechanical operator operating on a quantum state vector is written as . The uncertaintyIn quantum mechanics, the Heisenberg uncertainty principle states a fundamental limit on the accuracy with which certain pairs of physical properties of a particle, such as position and momentum, can be simultaneously known...
in can be calculated using the formula
.
Expectation of matrices
If is an matrixIn mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
, then the expected value of the matrix is defined as the matrix of expected values:
This is utilized in covariance matricesIn probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...
.
Discrete distribution taking only nonnegative integer values
When a random variable takes only values in we can use the following formula
for computing its expectation (even when the expectation is infinite):
Proof:
interchanging the order of summation, we have
as claimed. This result can be a useful computational shortcut. For example, suppose we toss a coin where the probability of heads is p. How many tosses can we expect until the first heads (not including the heads itself)? Let X be this number. Note that we are counting only the tails and not the heads which ends the experiment; in particular, we can have X = 0. The expectation of X may be computed by . This is because the number of tosses is at least i exactly when the first i tosses yielded tails. This matches the expectation of a random variable with an Exponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
.
We used the formula for Geometric progressionIn mathematics, a geometric progression, also known as a geometric sequence, is a sequence of numbers where each term after the first is found by multiplying the previous one by a fixed nonzero number called the common ratio. For example, the sequence 2, 6, 18, 54, ... is a geometric progression...
:
Continuous distribution taking nonnegative values
Analogously with the discrete case above, when a continuous random variable X takes only nonnegative values, we can use the following formula for computing its expectation (even when the expectation is infinite):
Proof: It is first assumed that X has a density . We present two techniques:
 Using integration by parts (a special case of Section 1.4 above):
and the bracket vanishes because as .
 Using an interchange in order of integration
In calculus, interchange of the order of integration is a methodology that transforms iterated integrals of functions into other, hopefully simpler, integrals by changing the order in which the integrations are performed...
:
In case no density exists, it is seen that
History
The idea of the expected value originated in the middle of the 17th century from the study of the socalled problem of pointsThe problem of points, also called the problem of division of the stakes, is a classical problem in probability theory. One of the famous problems that motivated the beginnings of modern probability theory in the 17th century, it led Blaise Pascal to the first explicit reasoning about what today is...
. This problem is: how to divide the stakes in a fair way between two players who have to end their game before it's properly finished? This problem had been debated for centuries, and many conflicting proposals and solutions had been suggested over the years, when it was posed in 1654 to Blaise Pascal Blaise Pascal , was a French mathematician, physicist, inventor, writer and Catholic philosopher. He was a child prodigy who was educated by his father, a tax collector in Rouen...
by a French nobleman chevalier de MéréAntoine Gombaud, Chevalier de Méré was a French writer, born at Poitou in 1607, and died on December 29, 1684. Although he was not a nobleman, he adopted the title Chevalier for the character in his dialogues who represented his own views...
. de Méré claimed that this problem couldn't be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, got provoked and determined to solve the problem once and for all. He began to discuss the problem in a now famous series of letters to Pierre de FermatPierre de Fermat was a French lawyer at the Parlement of Toulouse, France, and an amateur mathematician who is given credit for early developments that led to infinitesimal calculus, including his adequality...
. Soon enough they both independently came up with a solution. They solved the problem in different computational ways but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come absolutely natural to both of them. They were very pleased by the fact that they had found essentially the same solution and this in turn made them absolutely convinced they had solved the problem conclusively. However, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.
Three years later, in 1657, a Dutch mathematician Christiaan Huygens, who had just visited Paris, published a treatise (see ) "De ratiociniis in ludo aleæ" on probability theory. In this book he considered the problem of points and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens also extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players). In this sense this book can be seen as the first successful attempt of laying down the foundations of the theory of probability.
In the foreword to his book, Huygens wrote: "It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs." (cited by ). Thus, Huygens learned about de Méré's problem in 1655 during his visit to France; later on in 1656 from his correspondence with Carcavi he learned that his method was essentially the same as Pascal's; so that before his book went to press in 1657 he knew about Pascal's priority in this subject.
Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes: "That my Chance or Expectation to win any thing is worth just such a Sum, as wou'd procure me in the same Chance and Expectation at a fair Lay. ... If I expect a or b, and have an equal Chance of gaining them, my Expectation is worth ." More than a hundred years later, in 1814, PierreSimon LaplacePierreSimon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...
published his tract "Théorie analytique des probabilités", where the concept of expected value was defined explicitly:
The use of letter E to denote expected value goes back to W.A. Whitworth (1901) "Choice and chance". The symbol has become popular since for English writers it meant "Expectation", for Germans "Erwartungswert", and for French "Espérance mathématique".
See also
 Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
 An inequality on location and scale parameters
 Expected value is also a key concept in economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...
, finance"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created...
, and many other subjects
 The general term expectation
In the case of uncertainty, expectation is what is considered the most likely to happen. An expectation, which is a belief that is centered on the future, may or may not be realistic. A less advantageous result gives rise to the emotion of disappointment. If something happens that is not at all...
 Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
 Expectation value (quantum mechanics)
In quantum mechanics, the expectation value is the predicted mean value of the result of an experiment. Despite the name, it is not the most probable value of a measurement...
 Wald's equation
In probability theory, Wald's equation, Wald's identity or Wald's lemma is an important identity that simplifies the calculation of the expected value of the sum of a random number of random quantities...
for calculating the expected value of a random number of random variables
Literature