Calibration (statistics)
Encyclopedia
There are two main uses of the term calibration in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 that denote special types of statistical inference problems. Thus "calibration" can mean
  • A reverse process to regression
    Linear regression
    In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

    , where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable.
  • Procedures in statistical classification to determine class membership probabilities
    Class membership probabilities
    In general proplems of classification, class membership probabilities reflect the uncertainty with which a given indivual item can be assigned to any given class. Although statistical classification methods by definition generate such probabilities, applications of classification in machine...

     which assess the uncertainty of a given new observation belonging to each of the already established classes.

In addition, "calibration" is used in statistics with the usual general meaning of calibration
Calibration
Calibration is a comparison between measurements – one of known magnitude or correctness made or set with one device and another measurement made in as similar a way as possible with a second device....

. For example, model calibration can be also used to refer to Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

 about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model.

In regression

The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable. This can be known as "inverse regression": see also sliced inverse regression.

One example is that of dating objects, using observable evidence such as tree
Tree
A tree is a perennial woody plant. It is most often defined as a woody plant that has many secondary branches supported clear of the ground on a single main stem or trunk with clear apical dominance. A minimum height specification at maturity is cited by some authors, varying from 3 m to...

 rings for dendrochronology
Dendrochronology
Dendrochronology or tree-ring dating is the scientific method of dating based on the analysis of patterns of tree-rings. Dendrochronology can date the time at which tree rings were formed, in many types of wood, to the exact calendar year...

 or carbon-14
Carbon-14
Carbon-14, 14C, or radiocarbon, is a radioactive isotope of carbon with a nucleus containing 6 protons and 8 neutrons. Its presence in organic materials is the basis of the radiocarbon dating method pioneered by Willard Libby and colleagues , to date archaeological, geological, and hydrogeological...

 for radiometric dating
Radiometric dating
Radiometric dating is a technique used to date materials such as rocks, usually based on a comparison between the observed abundance of a naturally occurring radioactive isotope and its decay products, using known decay rates...

. The observation is cause
Causality
Causality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....

d by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem
Operational definition
An operational definition defines something in terms of the specific process or set of validation tests used to determine its presence and quantity. That is, one defines something in terms of the operations that count as measuring it. The term was coined by Percy Williams Bridgman and is a part of...

 is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation
Extrapolation
In mathematics, extrapolation is the process of constructing new data points. It is similar to the process of interpolation, which constructs new points between known points, but the results of extrapolations are often less meaningful, and are subject to greater uncertainty. It may also mean...

 at some distance from the known results.

In classification

Calibration in classification, see Classification (machine learning)
and Statistical classification, is used to transform classifier scores into class membership probabilities
Class membership probabilities
In general proplems of classification, class membership probabilities reflect the uncertainty with which a given indivual item can be assigned to any given class. Although statistical classification methods by definition generate such probabilities, applications of classification in machine...

.
An overview of calibration methods for two-class
Binary classification
Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are...

 and multi-class
Multiclass classification
In machine learning, multiclass or multinomial classification is the problem of classifying instances into more than two classes.While some classification algorithms naturally permit the use of more than two classes, others are by nature binary algorithms; these can, however, be turned into...

 classification tasks is given by Gebel (2009).

The following univariate calibration methods exist for transforming classifier scores into class membership probabilities
Class membership probabilities
In general proplems of classification, class membership probabilities reflect the uncertainty with which a given indivual item can be assigned to any given class. Although statistical classification methods by definition generate such probabilities, applications of classification in machine...

 in the two-class case:
  • Assignment value approach, see Garczarek (2002)
  • Bayes approach, see Bennett (2002)
  • Isotonic regression, see Zadrozny and Elkan (2002)
  • Logistic regression
    Logistic regression
    In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

    , see Platt (1999)


The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities
Class membership probabilities
In general proplems of classification, class membership probabilities reflect the uncertainty with which a given indivual item can be assigned to any given class. Although statistical classification methods by definition generate such probabilities, applications of classification in machine...

in the case with classes count greater than two:
  • Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998)
  • Dirichlet calibration, see Gebel (2009)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK