Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Calibration (statistics)

Calibration (statistics)

Overview
There are two main uses of the term calibration in statistics
Statistics
Statistics is a branch of mathematics concerned with collecting and interpreting data. According to other definitions, it is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Statisticians improve the quality of data with the...

 that denote special types of statistical inference problems. Thus "calibration" can mean
  • A reverse process to regression
    Linear regression
    In statistics, linear regression refers to any approach to modeling the relationship between one or more variables denoted y and one or more variables denoted X, such that the model depends linearly on the unknown parameters to be estimated from the data...

    , where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable.
  • Procedures in statistical classification
    Statistical classification
    Statistical classification is a supervised machine learning procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items.Note: in community ecology, the term...

     that enable determination of estimated probabilities that a given new observation belongs to each of the already established classes.

In addition, "calibration" is used in statistics with the usual general meaning of calibration
Calibration
Calibration is the set of operations that establish, under specified conditions, the relationship between the values of quantities indicated by a measuring instrument and the corresponding values realized by standards...

.
Discussion
Ask a question about 'Calibration (statistics)'
Start a new discussion about 'Calibration (statistics)'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
There are two main uses of the term calibration in statistics
Statistics
Statistics is a branch of mathematics concerned with collecting and interpreting data. According to other definitions, it is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Statisticians improve the quality of data with the...

 that denote special types of statistical inference problems. Thus "calibration" can mean
  • A reverse process to regression
    Linear regression
    In statistics, linear regression refers to any approach to modeling the relationship between one or more variables denoted y and one or more variables denoted X, such that the model depends linearly on the unknown parameters to be estimated from the data...

    , where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable.
  • Procedures in statistical classification
    Statistical classification
    Statistical classification is a supervised machine learning procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items.Note: in community ecology, the term...

     that enable determination of estimated probabilities that a given new observation belongs to each of the already established classes.

In addition, "calibration" is used in statistics with the usual general meaning of calibration
Calibration
Calibration is the set of operations that establish, under specified conditions, the relationship between the values of quantities indicated by a measuring instrument and the corresponding values realized by standards...

. For example, model calibration can be also used to refer to Bayesian inference
Bayesian inference
Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process...

 about the value of a model's parameters, given some data set, or more generally to ant type of fitting of a statistical model.

In regression


The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable.

One example is that of dating objects, using observable evidence such as tree
Tree
A tree is a perennial woody plant. It is most often defined as a woody plant that has many secondary branches supported clear of the ground on a single main stem or trunk with clear apical dominance. A minimum height specification at maturity is cited by some authors, varying from 3 m to...

 rings for dendrochronology
Dendrochronology
Dendrochronology or tree-ring dating is the scientific method of dating based on the analysis of patterns of tree-rings. Dendrochronology can date the time at which tree rings were formed, in many types of wood, to the exact calendar year...

 or carbon-14
Carbon-14
Carbon-14, 14C, or radiocarbon, is a radioactive isotope of carbon discovered on February 27, 1940, by Martin Kamen and Sam Ruben at the University of California Radiation Laboratory in Berkeley, though its existence had been suggested already in 1934 by Franz Kurie. Its nucleus contains...

 for radiometric dating
Radiometric dating
Radiometric dating is a technique used to date materials, usually based on a comparison between the observed abundance of a naturally occurring radioactive isotope and its decay products, using known decay rates...

. The observation is cause
Causality
Causality is the relationship between an event and a second event , where the second event is a direct consequence of the first....

d by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem
Operational definition
An operational definition is a demonstration of a process – such as a variable, term, or object – in terms of the specific process or set of validation tests used to determine its presence and quantity. The term was coined by Percy Williams Bridgman...

 is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation
Extrapolation
In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points. It is similar to the process of interpolation, which constructs new points between known points, but the results of extrapolations are often less meaningful, and are subject to...

 at some distance from the known results.

In classification


Calibration in classification, see Classification (machine learning)
and Statistical classification
Statistical classification
Statistical classification is a supervised machine learning procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items.Note: in community ecology, the term...

, is used to transform classifier scores into class membership probabilities
Class membership probabilities
Class membership probabilitiesreflect the assessment uncertainty in classification ....

.
A thorough overview of calibration methods for two-class and multi-class classification tasks is given by Gebel (2009).

The following univariate calibration methods exist for transforming classifier scores into class membership probabilities
Class membership probabilities
Class membership probabilitiesreflect the assessment uncertainty in classification ....

 in the two-class case:
  • Assignment value approach, see Garczarek (2002)
  • Bayes approach, see Bennett (2002)
  • Isotonic regression
    Isotonic regression
    In numerical analysis, isotonic regression involves finding a weighted least-squares fit to a vector with weights vector subject to a set of monotonicity constraints giving a simple or partial order over the variables. The monotonicity constraints define a directed acyclic graph over the...

    , see Zadrozny and Elkan (2002)
  • Logistic regression
    Logistic regression
    In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. It is a generalized linear model used for binomial regression...

    , see Platt (1999)


The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities
Class membership probabilities
Class membership probabilitiesreflect the assessment uncertainty in classification ....

in the case with classes count greater than two:
  • Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998)
  • Dirichlet calibration, see Gebel (2009)