In

probability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

and

statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a

**median** is described as the numerical value separating the higher half of a sample, a

populationA statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

, or a

probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

, from the lower half. The

*median* of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the

meanIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

of the two middle values.

In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size), and, if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless, the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample, is the

medoidMedoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always members of the data set...

.

At most, half the population have values less than the

*median*, and, at most, half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if

*a* <

*b* <

*c*, then the median of the list {

*a*,

*b*,

*c*} is

*b*, and, if

*a* <

*b* <

*c* <

*d*, then the median of the list {

*a*,

*b*,

*c*,

*d*} is the mean of

*b* and

*c*; i.e., it is (

*b* +

*c*)/2.

The median can be used as a measure of

locationIn statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

when a distribution is

skewedIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

, when end-values are not known, or when one requires reduced importance to be attached to

outlierIn statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

s, e.g., because they may be measurement errors.

## Notation

The median of some variable

*x* is denoted either as

or as

## Measures of statistical dispersion

When the

*median* is used as a

location parameterIn statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

in descriptive statistics, there are several choices for a measure of variability: the

rangeIn the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

, the

interquartile rangeIn descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles...

, the mean

absolute deviationIn statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

, and the

median absolute deviationIn statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....

. Since the median is the same as the

*second quartile*, its calculation is illustrated in the article on

quartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...

s.

## Medians of probability distributions

For any

probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

on the

realIn mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

line with

cumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

*F*, regardless of whether it is any kind of continuous probability distribution, in particular an

absolutely continuous distributionIn mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...

(and therefore has a

probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

), or a discrete probability distribution, a median

*m* satisfies the inequalities

or

in which a Lebesgue–Stieltjes integral is used. For an absolutely continuous probability distribution with

probability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

*ƒ*, we have

### Medians of particular distributions

The medians of certain types of distributions can be easily calculated from their parameters:

- The median of a normal distribution with mean μ and variance σ
^{2} is μ. In fact, for a normal distribution, mean = median = mode.
- The median of a uniform distribution
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

in the interval [*a*, *b*] is (*a* + *b*) / 2, which is also the mean.
- The median of a Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

with location parameter *x*_{0} and scale parameter *y* is *x*_{0}, the location parameter.
- The median of an exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

with rate parameter *λ* is the natural logarithm of 2 divided by the rate parameter: *λ*^{−1}ln 2.
- The median of a Weibull distribution with shape parameter
*k* and scale parameter *λ* is *λ*(ln 2)^{1/k}.

## Medians in descriptive statistics

The median is used primarily for

skewedIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

distributions, which it summarizes differently than the

arithmetic meanIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

. Consider the

multisetIn mathematics, the notion of multiset is a generalization of the notion of set in which members are allowed to appear more than once...

{ 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, as is the

modeIn statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, and it might be seen as a better indication of

central tendencyIn statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...

than the

arithmetic meanIn mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

of 4.

Calculation of medians is a popular technique in

summary statisticsIn descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...

and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of

outlierIn statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

values than is the

meanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

.

### An optimality property

A median is also a central point that minimizes the arithmetic mean of the absolute deviations. In the above example, the median value of 2 minimizes the arithmetic mean of the absolute deviations (1 + 0 + 0 + 0 + 1 + 12) / 6 = 2.33; in contrast, the mean value of 4 minimizes the arithmetic mean of the squares (9 + 4 + 4 + 4 + 1 + 100) / 6 = 20.33. In the language of statistics, a value of

*c* that minimizes

is a median of the probability distribution of the

random variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

*X*.

However, a median

*c* need not be uniquely defined. Where exactly one median exists, statisticians speak of "the median" correctly; even when no unique median exists, some statisticians speak of "the median" informally.

See also

*k*-medians clusteringIn statistics and machine learning, k-medians clustering is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median...

.

### An inequality relating means and medians

For continuous probability distributions, the difference between the median and the mean is never more than one

standard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. See an inequality on location and scale parameters.

### Efficient computation of the sample median

Even though

sortingIn computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

*n* items requires

*O*In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(

*n* log

*n*) operations,

selection algorithmIn computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list . This includes the cases of finding the minimum, maximum, and median elements. There are O, worst-case linear time, selection algorithms...

s can compute the

*k*^{th}-smallest of *n* itemsIn statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....

(e.g., the median) with only

OIn mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(

*n*) operations.

#### For an odd number of values

As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7.

Start by sorting the values: 1, 2, 5, 7, 8.

In this case, the median is 5 since it is the middle observation in the ordered list.

The median is the ((

*n* + 1)/2)th item, where

*n* is the number of values. For example, for the list {1, 2, 5, 7, 8}, we have

*n* = 5, so the median is the ((5 + 1)/2)th item.

- median = (6/2)th item
- median = 3rd item
- median = 5

#### For an even number of values

As an example, we will calculate the sample median for the following set of observations: 1, 6, 2, 8, 7, 2.

Start by sorting the values: 1, 2, 2, 6, 7, 8.

In this case, the arithmetic mean of the two middlemost terms is (2 + 6)/2 = 4. Therefore, the median is 4 since it is the arithmetic mean of the middle observations in the ordered list.

We also use this formula MEDIAN = {(n+1)/2} th item . n= Number of values

As above example 1, 2, 2, 6, 7, 8

n=6 Median={(6+1)/2}th item =3.5 th item

3rd item is 2 Median = {2+(0.5*(difference of 3rd and 4th item)}

Median = {2+(0.5*(6-2)}

Median = {2+(0.5*4)}

Median = (2+2)

Median = 4

## Other estimates of the median

If data are represented by a

statistical modelA statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

specifying a particular family of

probability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s, then estimates of the median can be obtained by fitting that family of probability distributions to the data and calculating the theoretical median of the fitted distribution. See, for example

Pareto interpolationPareto interpolation is a method of estimating the median and other properties of a population that follows a Pareto distribution. It is used in economics when analysing the distribution of incomes in a population, when one must base estimates on a relatively small random sample taken from the...

.

## Median-unbiased estimators, and bias with respect to loss functions

Any

*mean*-unbiased estimatorIn statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...

minimizes the

riskRisk is the potential that a chosen action or activity will lead to a loss . The notion implies that a choice having an influence on the outcome exists . Potential losses themselves may also be called "risks"...

(expected loss) with respect to the squared-error

loss functionIn statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

, as observed by

GaussGauss may refer to:*Carl Friedrich Gauss, German mathematician and physicist*Gauss , a unit of magnetic flux density or magnetic induction*GAUSS , a software package*Gauss , a crater on the moon...

. A

*median*-unbiased estimator minimizes the risk with respect to the

absolute-deviationIn statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

loss function, as observed by Laplace. Other loss functions are used in

statistical theoryThe theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...

, particularly in

robust statisticsRobust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

.

The theory of median-unbiased estimators was revived by

George W. Brown in 1947:

An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if, for fixed θ, the median of the distribution of the estimate is at the value θ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. [page 584]

Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. Besides being invariant under

one-to-one transformationsIn mathematics, an injective function is a function that preserves distinctness: it never maps distinct elements of its domain to the same element of its codomain. In other words, every element of the function's codomain is mapped to by at most one element of its domain...

, median-unbiased estimators have surprising

robustnessRobust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

.

## In image processing

In

monochromeMonochrome describes paintings, drawings, design, or photographs in one color or shades of one color. A monochromatic object or image has colors in shades of limited colors or hues. Images using only shades of grey are called grayscale or black-and-white...

raster images there is a type of noise, known as the

salt and pepper noiseSalt and pepper noise is a form of noise typically seen on images. It represents itself as randomly occurring white and black pixels. An effective noise reduction method for this type of noise involves the usage of a median filter, morphological filter or a contra harmonic mean filter.Salt and...

, when each pixel independently become black (with some small probability) or white (with some small probability), and is unchanged otherwise (with the probability close to 1).

An image constructed of median values of neighborhoods (like 3×3 square) can effectively

reduce noiseNoise reduction is the process of removing noise from a signal.All recording devices, both analogue or digital, have traits which make them susceptible to noise...

in this case.

## History

Gustav Fechner popularized the median into the formal analysis of data, although it had been used previously by Laplace.

## See also

- Order statistic
In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....

- Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

- A median is the 2nd quartile
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...

, 5th decile* In descriptive statistics, any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population* In astrology, an aspect of 36 degrees-See also:*Percentile*Quantile*Quartile*Summary statistics...

, and 50th percentileIn statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value below which 20 percent of the observations may be found...

.
- A sample-median is
*median*-unbiased but can be a *mean*-biasedIn statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...

estimator.
- Absolute deviation
In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

- Concentration of measure
In mathematics, concentration of measure is a principle that is applied in measure theory, probability and combinatorics, and has consequences for other fields such as Banach space theory. Informally, it states that "A random variable that depends in a Lipschitz way on many independent variables ...

for Lipschitz functions

- Standard deviation as a bound on the difference between the mean and the median
- Median voter theory
The median voter theory, also known as the median voter theorem or Black's theorem, is a famous voting theorem. It posits that in a majority election, if voter policy preferences can be represented as a point along a single dimension, if all voters vote deterministically for the politician who...

- Median graph
In mathematics, and more specifically graph theory, a median graph is an undirected graph in which any three vertices a, b, and c have a unique median: a vertex m that belongs to shortest paths between any two of a, b, and c.The concept of median graphs has long been studied, for instance by or ...

- Geometric median
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher...

, a generalization of the median for data in higher dimensions
- Centerpoint
In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space...

, another generalization to higher dimensions
- Median search
- Hinges (statistics)
- Theil–Sen estimator
In non-parametric statistics, the Theil–Sen estimator, also known as Sen's slope estimator, slope selection, the single median method, or the Kendall robust line-fit method, is a method for robust linear regression that chooses the median slope among all lines through pairs of two-dimensional...

, a method for robustRobust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

linear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

based on finding medians of slopeIn mathematics, the slope or gradient of a line describes its steepness, incline, or grade. A higher slope value indicates a steeper incline....

s.

## External links