Applicability Domain - AbsoluteAstronomy.com

The Applicability Domain (AD) of a QSAR is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.

The purpose of AD is to state whether the model's assumptions are met. In general, this is the case for interpolation

Interpolation

In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....

rather than for extrapolation

Extrapolation

In mathematics, extrapolation is the process of constructing new data points. It is similar to the process of interpolation, which constructs new points between known points, but the results of extrapolations are often less meaningful, and are subject to greater uncertainty. It may also mean...

. Although up to now there is no single generally accepted algorithm for determining the AD, there exists a rather systematic approach for defining interpolation regions. The process involves the removal of outliers and a probability density distribution method using kernel-weighted sampling. A recent rigorous benchmarking study of several AD algorithms identified standard-deviation of models as the most reliable approach .

To investigate the AD of a training set of chemicals one can directly analyse properties of the multivariate descriptor space of the training compounds or more indirectly via distance

Distance

Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...

(or similarity) metrics. When using distance metrics care should be taken to use an orthogonal and significant vector space. This can be achieved by different means of feature selection and successive principal components analysis

Principal components analysis

Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.