Least-angle regression - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, least-angle regression (LARS) is a regression

Regression analysis

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

algorithm for high-dimensional data, developed by Bradley Efron

Bradley Efron

Bradley Efron is an American statistician best known for proposing the bootstrap resampling technique, which has had a major impact in the field of statistics and virtually every area of statistical application...

, Trevor Hastie, Iain Johnstone and Robert Tibshirani.

Suppose we expect a response variable to be determined by a linear combination of a subset of potential covariates. Then the LARS algorithm provides a means of producing an estimate of which variables to include, as well as their coefficients.

Instead of giving a vector result, the LARS solution consists of a curve denoting the solution for each value of the L1 norm of the parameter vector. The algorithm is similar to forward stepwise regression

Stepwise regression

In statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure...

, but instead of including variables at each step, the estimated parameters are increased in a direction equiangular to each one's correlations with the residual.

The advantages of the LARS method are:

It is computationally just as fast as forward selection.
It produces a full piecewise linear solution path, which is useful in cross-validation or similar attempts to tune the model.
If two variables are almost equally correlated with the response, then their coefficients should increase at approximately the same rate. The algorithm thus behaves as intuition would expect, and also is more stable.
It is easily modified to produce solutions for other estimators, like the Lasso
Regularization (machine learning)
In statistics and machine learning, regularization is any method of preventing overfitting of data by a model. It is used for solving ill-conditioned parameter-estimation problems...

.
It is effective in contexts where p >> n (IE, when the number of dimensions is significantly greater than the number of points).

The disadvantages of the LARS method include:

With any amount of noise in the dependent variable and with high dimensional multicollinear independent variables, there is no reason to believe that the selected variables will have a high probability of being the actual underlying causal variables. This problem is not unique to LARS, as it is a general problem with variable selection approaches that seek to find underlying deterministic components. Yet, because LARS is based upon an iterative refitting of the residuals, it would appear to be especially sensitive to the effects of noise. This problem is discussed in detail by Weisberg in the discussion section of the Efron et al. (2004) Annals of Statistics article. Weisberg provides an empirical example based upon re-analysis of data originally used to validate LARS that the variable selection appears to have problems with highly correlated variables.
Since almost all high dimensional data in the real world will just by chance exhibit some fair degree of collinearity across at least some variables, the problem that LARS has with correlated variables may limit its application to high dimensional data.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.