Sliced inverse regression - AbsoluteAstronomy.com

Sliced inverse regression (SIR) is a tool for dimension reduction in the field of multivariate statistics

Multivariate statistics

Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...

.

In statistics

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, regression analysis

Regression analysis

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

is a popular way of studying the relationship between a response variable y and its explanatory variable

, which is a p-dimensional vector. There are several approaches which come under the term of regression. For example parametric methods include multiple linear regression; non-parametric techniques include local smoothing.

With high-dimensional data (as p grows), the number of observations needed to use local smoothing methods escalates exponentially. Reducing the number of dimensions makes the operation computable. Dimension reduction aims to show only the most important directions of the data. SIR uses the inverse regression curve,

to perform a weighted principal component analysis, with which one identifies the effective dimension reducing directions.

This article first introduces the reader to the subject of dimension reduction and how it is performed using the model here. There is then a short review on inverse regression, which later brings these pieces together.

Model

Given a response variable

and a (random) vector

of explanatory variables, SIR is based on the model

where

are unknown projection vectors.

is an unknown number (the dimensionality of the space we try to reduce our data to) and, of course, as we want to reduce dimension, smaller than

is an unknown function on

, as it only depends on

arguments, and

is the error with

and finite variance

. The model describes an ideal solution, where

depends on

only through a

dimensional subspace. I.e. one can reduce to dimension of the explanatory variable from

to a smaller number

without losing any information.

An equivalent version of

is: the conditional distribution of

given

depends on

only through the

dimensional variable

. This perfectly reduced variable can be seen as informative as the original

in explaining

.

The unknown

are called the effective dimension reducing directions (EDR-directions). The space that is spanned by these vectors is denoted the effective dimension reducing space (EDR-space).

Relevant linear algebra background

To be able to visualize the model in our mind's eye, note a short review on vecor spaces:

For the definition of a vector space and some further properties I will refer to the article Linear Algebra and Gram-Schmidt Orthogonalization or any textbook in linear algebra and mention only the most important facts for understanding the model.

As the EDR-space is a

dimensional subspace, we need to know what a subspace is. A subspace of

is defined as a subset

, if it holds that

Given

, then

, the set of all linear combinations of these vectors, is called a linear subspace and is therefore a vector space. One says, the vectors

span

. But the vectors that span a space

are not unique. This leads us to the concept of a basis and the dimension of a vector space:

A set

of linear independent vectors of a vector space

is called basis of

, if it holds that

The dimension of

is equal to the maximum number of linearly independent vectors in

. A set of

linear independent vectors of

set up a basis of

. The dimension of a vector space is unique, as the basis itself is not. Several bases can span the same space.
Of course also dependent vectors span a space, but the linear combinations of the latter can give only rise to the set of vectors lying on a straight line. As we are searcing for a

dimensional subspace, we are interested in finding

linearly independent vectors that span the

dimensional subspace we want to project our data on.

Curse of dimensionality

The reason why we want to reduce the dimension of the data is due to the "curse of dimensionality

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing high-dimensional spaces that do not occur in low-dimensional settings such as the physical space commonly modeled with just three dimensions.There are multiple phenomena referred to by this name in...

" and of course, for graphical purposes. The curse of dimensionality is due to rapid increase in volume adding more dimensinos to a (mathematical) space. For example, consider 100 observations from support

, which cover the intervall quite well, and compare it to 100 observations from the corresponding

dimensional unit hypersquare, which are isolated points in a vast empty space. It is easy to draw inferences about the underlying properties of the data in the first case, whereas in the latter, it is not. For more information about the curse of dimensionality, see Curse of dimensionality

Curse of dimensionality

Inverse regression

Computing the inverse regression curve (IR) means instead of looking for

, which is a curve in

we calculate

, which is also a curve in , but consisting of one dimensional regressions.

The center of the inverse regression curve is located at

. Therefore, the centered inverse regression curve is

which is a

dimensional curve in

. In what follows we will consider this centered inverse regression curve and we will see that it lies on a

dimensional subspace spanned by

.

But before seeing that this holds true, we will have a look at how the inverse regression curve is computed within the SIR-Algorithm, which will be introduced in detail later. What comes is the "sliced" part of SIR. We estimate the inverse regression curve by dividing the range of

into

nonoverlapping intervalls (slices), to afterwards compute the sample means

of each slice. These sample means are used as a crude estimate of the IR-curve, denoted as

. There are several ways to define the slices, either in a way that in each slice are equally much observations, or we define a fixed range for each slice, so that we then get different proportions of the

that fall into each slice.

Inverse regression versus dimension reduction

As mentioned a second before, the centered inverse regression curve lies on a

dimensional subspace spanned by

(and therefore also the crude estimate we compute). This is the connection between our Model and Inverse Regression. We shall see that this is true, with only one condition on the design distribution that must hold. This condition is, that:

I.e. the conditional expectation is linear in

, that is, for some constants

. This condition is satisfied when the distribution of

is elliptically symmetric (e.g. the normal distribution). This seems to be a pretty strong requirement. It could help, for example, to closer examine the distribution of the data, so that outliers can be removed or clusters can be separated before analysis

Given this condition and

, it is indeed true that the centered inverse regression curve

is contained in the linear subspace spanned by

, where

. The proof is provided by Duan and Li in Journal of the American Statistical Association (1991).

Estimation of the EDR-directions

After having had a look at all the theoretical properties, our aim is now to estimate the EDR-directions. For that purpose, we conduct a (weighted) principal component analysis for the sample means

, after having standardized

. Corresponding to the theorem above, the IR-curve

lies in the space spanned by

, where

. (Due to the terminology introduced before, the

are called the standardized effective dimension reducing directions.) As a consequence, the covariance matrix

is degenerate in any direction orthogonal to the

. Therefore, the eigenvectors

associated with the

largest eigenvalues are the standardized EDR-directions.

Back to PCA. That is, we calculate the estimate for

and identify the eigenvalues

and the eigenvectors

, which are the standardized EDR-directions. (For more details about that see next section: Algorithm.) Remember that the main idea of PC transformation is to find the most informative projections that maximize variance!

Note that in some situations SIR does not find the EDR-directions. One can overcome this difficulty by considering the conditional covariance

. The principle remains the same as before, but one investigates the IR-curve with the conditional covariance instead of the conditional expectation. For further details and an example where SIR fails, see Härdle and Simar (2003).

Algorithm

The algorithm to estimate the EDR-directions via SIR is as follows. It is taken from the textbook Applied Multivariate Statistical Analysis (Härdle and Simar 2003)

1. Let

be the covariance matrix of

. Standardize

(We can therefore rewrite

where

For the standardized variable Z it holds that

and

.)

2. Divide the range of

into

nonoverlapping slices

is the number of observations within each slice and

the indicator function for this slice:

3. Compute the mean of

over all slices, which is a crude estimate

of the inverse regression curve

4. Calculate the estimate for

5. Identify the eigenvalues

and the eigenvectors

, which are the standardized EDR-directions.

6. Transform the standardized EDR-directions back to the original scale. The estimates for the EDR-directions are given by:

(which are not necessarily orthogonal)

For examples, see the book by Härdle and Simar (2003).

External links

Sliced Inverse Regression: Literature Collection

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.