Mean squared error - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the mean squared error (MSE) of an estimator

Estimator

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function

Risk function

In decision theory and estimation theory, the risk function R of a decision rule, δ, is the expected value of a loss function L:...

, corresponding to the expected value of the squared error loss or quadratic loss. MSE measures the average

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the squares of the "errors." The error is the amount by which the value implied by the estimator differs from the quantity to be estimated. The difference occurs because of randomness

Randomness

Randomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....

or because the estimator doesn't account for information

Omitted-variable bias

In statistics, omitted-variable bias occurs when a model is created which incorrectly leaves out one or more important causal factors. The 'bias' is created when the model compensates for the missing factor by over- or under-estimating one of the other factors.More specifically, OVB is the bias...

that could produce a more accurate estimate.

The MSE is the second moment

Moment (mathematics)

In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...

(about the origin) of the error, and thus incorporates both the variance

Variance

In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

of the estimator and its bias

Bias of an estimator

In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...

. For an unbiased estimator, the MSE is the variance. Like the variance, MSE has the same units of measurement as the square of the quantity being estimated. In an analogy to standard deviation

Standard deviation

Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

, taking the square root of MSE yields the root mean square error or root mean square deviation
Root mean square deviation
The root-mean-square deviation is the measure of the average distance between the atoms of superimposed proteins...

(RMSE or RMSD), which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard deviation

Standard deviation

Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

Definition and basic properties

The MSE of an estimator

Estimator

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

with respect to the estimated parameter

is defined as

The MSE is equal to the sum of the variance

Variance

and the squared bias

Bias of an estimator

of the estimator

The MSE thus assesses the quality of an estimator in terms of its variation and unbiasedness. Note that the MSE is not equivalent to the expected value of the absolute error.

Since MSE is an expectation, it is not a random variable. It may be a function of the unknown parameter

, but it does not depend on any random quantities. However, when MSE is computed for a particular estimator of

the true value of which is not known, it will be subject to estimation error. In a Bayesian sense, this means that there are cases in which it may be treated as a random variable.

Alternative usages

In regression analysis, the term mean squared error is sometimes used to refer to the estimate of error variance: residual sum of squares

Residual sum of squares

In statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...

divided by the number of degrees of freedom

Degrees of freedom

Degrees of freedom can mean:* Degrees of freedom , independent displacements and/or rotations that specify the orientation of the body or system...

. This is an observed quantity given a particular sample (and hence is sample-dependent), whereas the definition above is a function of the parameters of the probability distribution of an unknown parameter. For more details, see errors and residuals in statistics

Errors and residuals in statistics

In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

.

Also in regression analysis, "mean squared error", often referred to as "out-of-sample mean squared error", can refer to the mean value of the squared deviations of the predictions from the true values, over an out-of-sample test space, generated by a model estimated over a particular sample space. This also is an observed quantity, and it varies by sample and by out-of-sample test space.

Examples

Suppose we have a random sample of size n from a population,

. The usual estimator for the mean is the sample average

which has an expected value of μ (so it is unbiased) and a mean square error of

For a Gaussian distribution this is the best unbiased estimator (that is, it has the lowest MSE among all unbiased estimators), but not, say, for a uniform distribution

Uniform distribution

-Probability theory:* Discrete uniform distribution* Continuous uniform distribution-Other:* "Uniform distribution modulo 1", see Equidistributed sequence*Uniform distribution , a type of species distribution* Distribution of military uniforms...

.

The usual estimator for the variance is

This is unbiased (its expected value is

), and its MSE is

where

is the fourth central moment

Central moment

In probability theory and statistics, central moments form one set of values by which the properties of a probability distribution can be usefully characterised...

of the distribution or population and

is the excess kurtosis.

However, one can use other estimators for

which are proportional to

, and an appropriate choice can always give a lower mean square error. If we define

then the MSE is

This is minimized when

For a Gaussian distribution, where

, this means the MSE is minimized when dividing the sum by

, whereas for a Bernoulli distribution with p = 1/2 (a coin flip),

, the MSE is minimized for

. (Note that this particular case of the Bernoulli distribution has the lowest possible excess kurtosis; this can be proved by Jensen's inequality

Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...

as follows. The fourth central moment

Central moment

In probability theory and statistics, central moments form one set of values by which the properties of a probability distribution can be usefully characterised...

is an upper bound for the square of variance, so that the least value for their ratio is one, therefore, the least value for the excess kurtosis is -2, achieved, for instance, by a Bernoulli with p=1/2.) So no matter what the kurtosis, we get a "better" estimate (in the sense of having a lower MSE) by scaling down the unbiased estimator a little bit. Even among unbiased estimators, if the distribution is not Gaussian the best (minimum mean square error) estimator of the variance may not be

The following table gives several estimators of the true parameters of the population, μ and σ², for the Gaussian case.

True value	Estimator	Mean squared error
θ = μ	= the unbiased estimator of the population mean,
θ = σ²	= the unbiased estimator of the population variance,
θ = σ²	= the biased estimator of the population variance,
θ = σ²	= the biased estimator of the population variance,

Note that:

The MSEs shown for the variance estimators assume i.i.d. so that . The result for follows easily from the variance that is .
Unbiased estimators may not produce estimates with the smallest total variation (as measured by MSE): the MSE of is larger than that of or .
Estimators with the smallest total variation may produce biased estimates: typically underestimates σ² by

Interpretation

An MSE of zero, meaning that the estimator

predicts observations of the parameter

with perfect accuracy, is the ideal, but is practically never possible.

Values of MSE may be used for comparative purposes. Two or more statistical model

Statistical model

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

s may be compared using their MSEs as a measure of how well they explain a given set of observations: The unbiased model with the smallest MSE is generally interpreted as best explaining the variability in the observations and is called the best unbiased estimator or MVUE (Minimum Variance Unbiased Estimator).

Both linear regression

Linear regression

In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

techniques such as analysis of variance

Analysis of variance

In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

estimate the MSE as part of the analysis and use the estimated MSE to determine the statistical significance

Statistical significance

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....

of the factors or predictors under study. The goal of experimental design is to construct experiments in such a way that when the observations are analyzed, the MSE is close to zero relative to the magnitude of at least one of the estimated treatment effects.

MSE is also used in several stepwise regression

Stepwise regression

In statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure...

techniques as part of the determination as to how many predictors from a candidate set to include in a model for a given set of observations.

Applications

Minimizing MSE is a key criterion in selecting estimators:see Minimum mean-square error
Minimum mean-square error
In statistics and signal processing, a minimum mean square error estimator describes the approach which minimizes the mean square error , which is a common measure of estimator quality....

. Among unbiased estimators, the minimal MSE is equivalent to minimizing the variance, and is obtained by the MVUE. However, a biased estimator may have lower MSE; see estimator bias.

In statistical modelling the MSE, representing the difference between the actual observations and the response predicted by the model, is used to determine whether the model does not fit the data or whether the model can be simplified by removing terms.

As a loss function

Squared error loss is one of the most widely used loss function

Loss function

In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

s in statistics, though its widespread use stems more from mathematical convenience than considerations of actual loss in applications. Carl Friedrich Gauss

Carl Friedrich Gauss

Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

, who introduced the use of mean squared error, was aware of its arbitrariness and was in agreement with objections to it on these grounds. The mathematical benefits of mean squared error are particularly evident in its use at analyzing the performance of linear regression

Linear regression

, as it allows one to partition the variation in a dataset into variation explained by the model and variation explained by randomness.

Criticism
The use of mean squared error without question has been criticized by the decision theorist

Decision theory

Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

James Berger. Mean squared error is the negative of the expected value of one specific utility function, the quadratic utility function, which may not be the appropriate utility function to use under a given set of circumstances. There are, however, some scenarios where mean squared error can serve as a good approximation to a loss function occurring naturally in an application.

Like variance

Variance

, mean squared error has the disadvantage of heavily weighting outliers. This is a result of the squaring of each term, which effectively weights large errors more heavily than small ones. This property, undesirable in many applications, has led researchers to use alternatives such as the mean absolute error

Mean absolute error

In statistics, the mean absolute error is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by...

, or those based on the median

Median

In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

Definition and basic properties

Alternative usages

Examples

Interpretation

Applications

As a loss function

See also