In
statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the
absolute deviation of an element of a
data setA data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
is the
absolute differenceThe absolute difference of two real numbers x, y is given by |x − y|, the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y...
between that element and a given point. Typically the point from which the deviation is measured is a measure of
central tendencyIn statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...
, most often the
medianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
or sometimes the
meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
of the data set.
where
- Di is the absolute deviation,
- xi is the data element
- and m(X) is the chosen measure of central tendency
In statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...
of the data set—sometimes the meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
(
), but most often the medianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
.
Measures of dispersion
Several measures of
statistical dispersionIn statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
are defined in terms of the absolute deviation.
Average absolute deviation
The
average absolute deviation, or simply
average deviation of a data set is the
averageIn mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
of the absolute deviations and is a
summary statisticIn descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...
of
statistical dispersionIn statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
or variability. It is also called the
mean absolute deviation, but this is easily confused with the
median absolute deviationIn statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....
.
The average absolute deviation of a set {
x1,
x2, ...,
xn} is
The choice of measure of
central tendencyIn statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...
,

, has a marked effect on the value of the average deviation. For example, for the data set {2, 2, 3, 4, 14}:
Measure of central tendency  |
Average absolute deviation |
| Mean = 5 |
. Thus if X is a normally distributed random variable with expected value 0 then
-

In other words, for a Gaussian, mean absolute deviation is about 0.8 times the standard deviation.
Mean absolute deviation
The mean absolute deviation (MAD), also referred to as the mean deviation, is the mean of the absolute deviations of a set of data about the data’s mean. In other words, it is the average distance of the data set from its mean during certain number of time periods.
The equation for MAD is as follows:
MAD = 1/n ∑(|ei|) , where ei = Fi - Di
This method forecast accuracy is very closely related to the mean squared error (MSE) method which is just the average squared error of the forecasts. Although these methods are very closely related MAD is more commonly used because it does not require squaring.
The equation for MSE is as follows:
MSE = 1/n Σ(ei2) , where ei = Fi - Di
Median absolute deviation (MAD)
The median absolute deviation is the median of the absolute deviation from the median. It is a robust estimator of dispersion.
For the example {2, 2, 3, 4, 14}: 3 is the median, so the absolute deviations from the median are {1, 1, 0, 1, 11} (reordered as {0, 1, 1, 1, 11}) with a median of 1, in this case unaffected by the value of the outlier 14, so the median absolute deviation (also called MAD) is 1.
Maximum absolute deviation
The maximum absolute deviation about a point is the maximum of the absolute deviations of a sample from that point. It is realized by the sample maximum or sample minimum and cannot be less than half the rangeIn the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data... .
Minimization
The measures of statistical dispersion derived from absolute deviation characterize various measures of central tendency as minimizing dispersion:
The median is the measure of central tendency most associated with the absolute deviation, in that
L2 norm statistics: just as the mean minimizes the standard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average... ,
L1 norm statistics: the median minimizes average absolute deviation,
L∞ norm statistics: the mid-range minimizes the maximum absolute deviation, and
trimmed L∞ norm statistics: for example, the midhingeIn statistics, the midhinge is the average of the first and third quartiles and is thus a measure of location.Equivalently, it is the 25% trimmed mid-range; it is an L-estimator.... (average of first and third quartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled... s) which minimizes the median absolute deviation of the whole distribution, also minimizes the maximum absolute deviation of the distribution after the top and bottom 25% have been trimmed off.
Estimation
The mean absolute deviation of a sample is a biased estimator of the mean absolute deviation of the population.
In order for the absolute deviation to be an unbiased estimator, the expected value (average) of all the sample absolute deviations must equal the population absolute deviation. However, it does not. For the population 1,2,3 the population absolute deviation is 2/3. The average of all the sample standard deviations of size 3 that can be drawn from the population is 40/81. Therefore the absolute deviation is a biased estimator.
See also
- Deviation (statistics)
In mathematics and statistics, deviation is a measure of difference for interval and ratio variables between the observed value and the mean. The sign of deviation , reports the direction of that difference...
- Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...
- Least absolute deviations
Least absolute deviations , also known as Least Absolute Errors , Least Absolute Value , or the L1 norm problem, is a mathematical optimization technique similar to the popular least squares technique that attempts to find a function which closely approximates a set of data...
- Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...
- Median absolute deviation
In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....
External links
|