Lack-of-fit sum of squares
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

, used in the numerator in an F-test
F-test
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ...

 of the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

 that says that a proposed model fits well.

Sketch of the idea

In order to have a lack-of-fit sum of squares one observes more than one
Replication (statistics)
In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "the repetition of the set of all the treatment combinations to be compared in...

 value of the response variable for each value of the set of predictor variables. For example, consider fitting a line


by the method of least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

. One takes as estimates of α and β the values that minimize the sum of squares of residuals
Residual sum of squares
In statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...

, i.e., the sum of squares of the differences between the observed y-value and the fitted y-value. To have a lack-of-fit sum of squares, one observes more than one y-value for each x-value. One then partitions the "sum of squares due to error", i.e., the sum of squares of residuals, into two components:
sum of squares due to error = (sum of squares due to "pure" error) + (sum of squares due to lack of fit).


The sum of squares due to "pure" error is the sum of squares of the differences between each observed y-value and the average of all y-values corresponding to the same x-value.

The sum of squares due to lack of fit is the weighted sum of squares of differences between each average of y-values corresponding to the same x-value and corresponding fitted y-value, the weight in each case being simply the number of observed y-values for that x-value.


In order that these two sums be equal, it is necessary that the vector whose components are "pure errors" and the vector of lack-of-fit components be orthogonal to each other, and one may check that they are orthogonal by doing some algebra.

Mathematical details

Consider fitting a line, where i is an index of each unique x value, and j is an index of an observation for a given x value. The value of each observation can be represented by


Let


be the least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

 estimates of the unobservable parameters α and β based on the observed values of x i and Y i j.

Let


be the fitted values of the response variable. Then


are the residuals
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

, which are observable estimates of the unobservable values of the error term ε ij. Because of the nature of the method of least squares, the whole vector of residuals, with


scalar components, necessarily satisfies the two constraints



It is thus constrained to lie in an (N − 2)-dimensional subspace of R N, i.e. there are N − 2 "degrees of freedom
Degrees of freedom (statistics)
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the...

 for error".

Now let


be the average of all Y-values associated with a particular x-value.

We partition the sum of squares due to error into two components:

Sums of squares

Suppose the error terms
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

 ε i j are independent
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

 and normally distributed with expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 0 and variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 σ2. We treat x i as constant rather than random. Then the response variables Y i j are random only because the errors ε i j are random.

It can be shown to follow that if the straight-line model is correct, then the sum of squares due to error divided by the error variance,


has a chi-squared distribution with N − 2 degrees of freedom.

Moreover:
  • The sum of squares due to pure error, divided by the error variance σ2, has a chi-squared distribution with N − n degrees of freedom;
  • The sum of squares due to lack of fit, divided by the error variance σ2, has a chi-squared distribution with n − 2 degrees of freedom;
  • The two sums of squares are probabilistically independent.

The test statistic

It then follows that the statistic


has an F-distribution with the corresponding number of degrees of freedom in the numerator and the denominator, provided that the straight-line model is correct. If the model is wrong, then the probability distribution of the denominator is still as stated above, and the numerator and denominator are still independent. But the numerator then has a noncentral chi-squared distribution, and consequently the quotient as a whole has a non-central F-distribution.

One uses this F-statistic to test the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

 that the straight-line model is right. Since the non-central F-distribution is stochastically larger than the (central) F-distribution, one rejects the null hypothesis if the F-statistic is too big. How big is too big—the critical value—depends on the level of the test and is a percentage point of the F-distribution.

The assumptions of normal distribution of errors and statistical independence
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

 can be shown to entail that this lack-of-fit test
Lack-of-fit test
In statistics, a lack-of-fit test is any of many tests of a null hypothesis that a proposed statistical model fits well. See:* Goodness of fit* Lack-of-fit sum of squares...

 is the likelihood-ratio test
Likelihood-ratio test
In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which is a special case of the other . The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other...

of this null hypothesis.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK