All Topics  
Bias of an estimator

 

   Email Print
   Bookmark   Link






 

Bias of an estimator



 
 
In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, the difference between an estimator
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
's expected value
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased.

Although the term bias sounds pejorative, it is not necessarily used in that way in statistics. Biased estimators may have desirable properties. Not only do they sometimes have a smaller mean squared error
Mean squared error

In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated....
 than any unbiased estimator, but in some cases the only unbiased estimators are not even within the convex hull
Convex hull

In mathematics, the convex hull or convex envelope for a Set of points X in a real vector space V is the minimal convex set containing X....
 of the parameter space, so their meaning is absurd.

Definition
Suppose we are trying to estimate the parameter using an estimator
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
  (that is, some function of the observed data).






Discussion
Ask a question about 'Bias of an estimator'
Start a new discussion about 'Bias of an estimator'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, the difference between an estimator
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
's expected value
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased.

Although the term bias sounds pejorative, it is not necessarily used in that way in statistics. Biased estimators may have desirable properties. Not only do they sometimes have a smaller mean squared error
Mean squared error

In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated....
 than any unbiased estimator, but in some cases the only unbiased estimators are not even within the convex hull
Convex hull

In mathematics, the convex hull or convex envelope for a Set of points X in a real vector space V is the minimal convex set containing X....
 of the parameter space, so their meaning is absurd.

Definition


Suppose we are trying to estimate the parameter using an estimator
Estimator

In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter ; an estimate is the result from the actual application of the function to a particular Sampling_ of data....
  (that is, some function of the observed data). Then the bias of is defined to be

In words, this would be "the expected value of the estimator minus the true value ." This may be rewritten as

which would read "the expected value of the difference between the estimator and the true value" (the expected value of is precisely ). An estimator is said to be unbiased if the bias is zero.

Examples


Estimating variance


Suppose X1, ..., Xn are independent and identically distributed (i.i.d) normal random variables with expectation
Expected value

In probability theory and statistics, the expected value of a random variable is the Lebesgue integral of the random variable with respect to its probability measure....
 µ and variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
 s2. Let

be the "sample average", and let

be a "sample variance".

Then S2 is a "biased estimator" of s2 because

In other words, the expected value of the sample variance does not equal the population variance, unless multiplied by the normalization factor.

The reason that the above version of the sample variance is biased is that the sample mean is generally somewhat closer to the observations in the sample than the population mean is to those observations. This is so because the sample mean is, by definition, in the middle of the sample, while the population mean may even lie outside the sample. So the deviations to the sample mean will often be smaller than the deviations to the population mean, and so, if the same formula is applied to both, then this variance estimate will on average be somewhat smaller in the sample than in the population.

Estimating a Poisson probability

A far more extreme case of a biased estimator being better than any unbiased estimator is well-known: Suppose X has a Poisson distribution
Poisson distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and Statistical independence of the time since the last event....
 with expectation ?. It is desired to estimate

(For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and ? is the average number of calls per minute, then e−2? is the probability that no calls arrive in the next two minutes.)

Since the expectation of an unbiased estimator is equal to the estimand, i.e. ,

the only function of the data constituting an unbiased estimator is

. To see this, note that when decomposing from the above expression for expectation, the sum that is left is a Taylor Series
Taylor series

In mathematics, the Taylor series is a representation of a function as an Series of terms calculated from the values of its derivatives at a single point....
 expansion of as well, yielding (see Characterizations of the exponential function
Characterizations of the exponential function

In mathematics, the exponential function can be characterization in many ways. The following characterizations are most common. This article discusses why each characterization makes sense, and why the characterizations are independent of and equivalent to each other....
).

If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is obviously very likely to be near 0, which is the opposite extreme. And if X is observed to be 101, then the estimate is even more absurd: it is −1, although the quantity being estimated obviously must be positive.

The (biased) maximum likelihood estimator
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....


is far better than this unbiased estimator. Not only is its value always positive, but it is also more accurate in the sense that its mean squared error
Mean squared error

In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated....
 (MSE)

is smaller; compare the unbiased estimator's MSE of

The MSEs are functions of the true value ?. The bias of the maximum-likelihood estimator is:

Maximum of a discrete uniform distribution

The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximum-likelihood estimator of n is X, even though the expectation of X is only (n + 1)/2; we can only be certain that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1.

Effect of transformations

Note that when a transformation is applied to an unbiased estimator, the result is not necessarily itself an unbiased estimate of its corresponding population statistic. That is, for a non-linear function f and an unbiased estimator U of a parameter p, f(U) is usually not an unbiased estimator of f(p). For example the square root
Square root

In mathematics, a square root of a number x is a number r such that r2 = x, or, in other words, a number r whose square is x....
 of the unbiased estimator of the population variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
 is not an unbiased estimator of the population standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
.

See also

  • Omitted-variable bias
    Omitted-variable bias

    Omitted-variable bias is the estimator bias that appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable that should be in the model....
  • Consistency (statistics)
    Consistency (statistics)

    * Consistent estimator* Consistent test: see Statistical hypothesis testing#Definition of terms* Fisher consistency...


External links