Imputation (statistics) - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analysed using standard techniques for complete data. The analysis should ideally take into account that there is a greater degree of uncertainty than if the imputed values had actually been observed, however, and this generally requires some modification of the standard complete-data analysis methods. Many imputation techniques are available.

A once-common method of imputation was hot-deck imputation where a missing value was imputed from a randomly selected similar record. The term "hot deck" dates back to the storage of data on punched card

Punched card

A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...

s, and indicates that the information donors come from the same dataset as the recipients. The stack of cards was "hot" because it was currently being processed.

Cold-deck imputation, by contrast, selects donors from another dataset. Since computer power has advanced rapidly and punched cards are no longer used, more sophisticated methods of imputation have generally superseded the original random and sorted hot deck imputation techniques, such as the nearest neighbour hot deck imputation and the approximate Bayesian bootstrap.

Since standard analysis techniques do not reflect the additional uncertainty due to imputing for missing data, further adjustments (such as multiple imputation or a Rao–Shao correction) are necessary to account for this.

Alternatives to imputing missing data

Imputation is not the only method available for handling missing data. It usually gives better results than listwise deletion (in which all subjects with any missing values are omitted from the analysis) and may be competitive with a maximum likelihood

Maximum likelihood

In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

approach in many circumstances. The expectation-maximization algorithm

Expectation-maximization algorithm

In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...

is a method for finding maximum likelihood estimates that has been widely applied to missing data problems. Other successful methods include computational intelligence methods.

In machine learning, it is sometimes possible to train a classifier directly over the original data without imputing it first. That was shown to yield better performance in cases where the missing data is structurally absent, rather than missing due to measurement noise.

External links

Missing Data: Instrument-Level Heffalumps and Item-Level Woozles
Multiple-imputation.com
Multiple imputation FAQs, Penn State U
A description of hot deck imputation from Statistics Finland.
Paper extending Rao-Shao approach and discussing problems with multiple imputation.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Alternatives to imputing missing data

See also

External links