Principal component regression
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, principal component regression (PCR) is a regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

 that uses principal component analysis when estimating regression coefficients. It is a procedure used to overcome problems which arise when the exploratory variables are close to being colinear.

In PCR instead of regressing the dependent variable on the independent variables directly, the principal components of the independent variables are used.
One typically only uses a subset of the principal components in the regression, making a kind of regularized
Regularization (mathematics)
In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to solve an ill-posed problem or to prevent overfitting...

 estimation.

Often the principal components with the highest variance are selected.
However, the low-variance principal components may also be important, — in some cases even more important.

The principle

PCR (principal components regression) is a regression method that can be divided into three steps:
  1. The first step is to run a principal components analysis
    Principal components analysis
    Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...

     on the table of the explanatory variables,
  2. The second step is to run an ordinary least squares
    Ordinary least squares
    In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

     regression (linear regression
    Linear regression
    In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

    ) on the selected components: the factors that are most correlated with the dependent variable will be selected
  3. Finally the parameters of the model are computed for the selected explanatory variables.

See also

  • Canonical correlation
    Canonical correlation
    In statistics, canonical correlation analysis, introduced by Harold Hotelling, is a way of making sense of cross-covariance matrices. If we have two sets of variables, x_1, \dots, x_n and y_1, \dots, y_m, and there are correlations among the variables, then canonical correlation analysis will...

  • Partial least squares regression
    Partial least squares regression
    Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the...

  • Total sum of squares
    Total sum of squares
    In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK