In

statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the

**fraction of variance unexplained (FVU)** in the context of a

regression taskIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

is the fraction of variance of the regressand

*Y* which cannot be explained, i.e., which is not correctly predicted, by the explanatory variables

*X*.

For a more general definition of explained/unexplained variation/randomness/variance, see the article

explained variationIn statistics, explained variation or explained randomness measures the proportion to which a mathematical model accounts for the variation of a given data set...

.

## Formal definition

Given a regression function

*ƒ* yielding for each

*y*_{i}, 1 ≤

*i* ≤

*N*, an estimate

, we have:

where

*R*^{2} is the

coefficient of determinationIn statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...

, and

*SS*_{err} (the sum of squared predictions errors, equivalently the

residual sum of squaresIn statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...

),

*SS*_{tot} (the

total sum of squaresIn statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...

), and

*SS*_{reg} (the sum of squares of the regression, equivalently the

explained sum of squaresIn statistics, the explained sum of squares is a quantity used in describing how well a model, often a regression model, represents the data being modelled...

) are given by

Alternatively, the fraction of variance unexplained can be defined as:

where MSE(

*f*) is the

mean squared errorIn statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

of the regression function

*ƒ*.

## Explanation

It is useful to consider the second definition to get the idea behind FVU. When trying to predict

*Y*, the most naïve regression function that we can think of is the constant function predicting the mean of

*Y*, i.e.,

. It follows that the MSE of this function equals the variance of

*Y*; that is,

*SS*_{E} =

*SS*_{T}, and

*SS*_{R} = 0. In this case, no variation in

*Y* can be accounted for, and the FVU then has its maximum value of 1.

More generally, the FVU will be 1 if the explanatory variables

*X* tell us nothing about

*Y* in the sense that the predicted values of

*Y* do not

covaryIn probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...

with

*Y*. But as prediction gets better and the MSE can be reduced, the FVU goes down. In the case of perfect prediction where

, the MSE is 0,

*SS*_{E} = 0,

*SS*_{R} = SS_{T}, and the FVU is 0.

## See also

- Coefficient of determination
In statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...

- Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

- Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

- Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...