Explained sum of squares - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the explained sum of squares (ESS) is a quantity used in describing how well a model, often a regression model

Regression analysis

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

, represents the data being modelled. In particular, the explained sum of squares measures how much variation there is in the modelled values and this is compared to the total sum of squares

Total sum of squares

In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...

, which measures how much variation there is in the observed data, and to the residual sum of squares

Residual sum of squares

In statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...

, which measures the variation in the modelling errors.

Definition

The explained sum of squares (ESS) is the sum of the squares of the deviations of the predicted values from the mean value of a response variable, in a standard regression model — for example, , where y_i is the i ^th observation of the response variable, x_ji is the i ^th observation of the j ^th explanatory variable, a and b_i are coefficient

Coefficient

In mathematics, a coefficient is a multiplicative factor in some term of an expression ; it is usually a number, but in any case does not involve any variables of the expression...

s, i indexes the observations from 1 to n, and ε_i is the i ^th value of the error term. In general, the greater the ESS, the better the estimated model performs.

If

and

are the estimated coefficient

Coefficient

In mathematics, a coefficient is a multiplicative factor in some term of an expression ; it is usually a number, but in any case does not involve any variables of the expression...

s, then

is the i^th predicted value of the response variable. The ESS is the sum of the squares of the differences of the predicted values and the mean value of the response variable:

In general: total sum of squares

Total sum of squares

In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...

= explained sum of squares + residual sum of squares

Residual sum of squares

Partitioning in simple linear regression

The following equality, stating that the total sum of squares equals the residual sum of squares plus the explained sum of squares, is generally true in simple linear regression:

Simple derivation

Square both sides and sum over all i:

Simple linear regression

In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...

gives

. What follows depends on this.

Again simple linear regression

Simple linear regression

gives

Partitioning in the general OLS model

The general regression model with n observations and k explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is

where y is an n × 1 vector of dependent variable observations, each column of the n × k matrix X is a vector of observations on one of the k explanators,

is a k × 1 vector of true coefficients, and e is an n× 1 vector of the true underlying errors. The ordinary least squares

Ordinary least squares

In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

estimator for

The residual vector

, so the residual sum of squares

is, after simplification,

Denote as

the constant vector all of whose elements are the sample mean

of the dependent variable values in the vector y. Then the total sum of squares is

The explained sum of squares, defined as the sum of squared deviations of the predicted values from the observed mean of y, is

Using

in this, and simplifying to obtain

, gives the result that TSS = ESS + RSS if and only if

. The left side of this is

times the sum of the elements of y, and the right side is

times the sum of the elements of

, so the condition is that the sum of the elements of y equals the sum of the elements of

, or equivalently that the sum of the prediction errors (residuals)

is zero. This can be seen to be true by noting the well-known OLS property that the k × 1 vector

: since the first column of X is a vector of ones, the first element of this vector

is the sum of the residuals and is equal to zero. This proves that the condition holds for the result that TSS = ESS + RSS.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.