All Topics  
Median

 

   Email Print
   Bookmark   Link






 

Median



 
 
In probability theory
Probability theory

Probability theory is the branch of mathematics concerned with analysis of Statistical randomness phenomena. The central objects of probability theory are random variables, stochastic processes, and event s: mathematical abstractions of determinism events or measured quantities that may either be single occurrences or evolve over time in an a...
 and statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, a median is described as the number separating the higher half of a sample, a population, or a probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, the median is not unique, so one often takes the mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 of the two middle values. At most half the population have values less than the median and at most half have values greater than the median.






Discussion
Ask a question about 'Median'
Start a new discussion about 'Median'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In probability theory
Probability theory

Probability theory is the branch of mathematics concerned with analysis of Statistical randomness phenomena. The central objects of probability theory are random variables, stochastic processes, and event s: mathematical abstractions of determinism events or measured quantities that may either be single occurrences or evolve over time in an a...
 and statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, a median is described as the number separating the higher half of a sample, a population, or a probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, the median is not unique, so one often takes the mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
 of the two middle values. At most half the population have values less than the median and at most half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list is b, and if a < b < c < d, then the median of the list is the mean of b and c, i.e. it is (b + c)/2.

The median can be used when a distribution is skewed or when end values are not known. A disadvantage is the difficulty of handling it theoretically.

Notation

The median of some variable is denoted either as or as

Measures of statistical dispersion

When the median is used as a location parameter
Location parameter

In statistics, a location family is a class of probability distributions parametrized by a scalar- or vector-valued parameter ?, which determines the "location" or shift of the distribution....
 in descriptive statistics, there are several choices for a measure of variability: the range
Range (statistics)

In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion....
, the interquartile range
Interquartile range

In descriptive statistics, the interquartile range , also called the midspread, middle fifty and middle of the #s, is a measure of statistical dispersion, being equal to the difference between the third and first quartiles....
, the mean absolute deviation
Absolute deviation

In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set....
, and the median absolute deviation
Median absolute deviation

In statistics, the median absolute deviation is a Robust statistics measure of the statistical dispersion of a univariate sample.For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median absolute deviation from the median:...
. Since the median is the same as the second quartile, its calculation is illustrated in the article on quartile
Quartile

In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents one fourth of the sampled population....
s.

Working with computers, a population of integers should have an integer median. Thus, for an integer population with an even number of elements, there are two medians known as lower median and upper median. For floating point population, the median lies somewhere between the two middle elements, depending on the distribution. Median is the middle value after arranging data by any order.

Medians of probability distributions

For any probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
 on the real
Real number

In mathematics, the real numbers may be described informally in several different ways. The real numbers include both rational numbers, such as 42 and −23/129, and irrational numbers, such as pi and the square root of two; or, a real number can be given by an infinite decimal representation, such as 2.4871773339...., where the digits co...
 line with cumulative distribution function
Cumulative distribution function

In probability theory and statistics, the cumulative distribution function or just distribution function, completely describes the probability distribution of a real-valued random variable X....
 F, regardless of whether it is any kind of continuous probability distribution, in particular an absolutely continuous distribution
Absolute continuity

In mathematics, absolute continuity is a smoothness property which is stricter than continuity and uniform continuity. Both absolute continuity of functions and absolute continuity of measures are defined....
 (and therefore has a probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
), or a discrete probability distribution, a median m satisfies the inequalities

or

in which a Riemann-Stieltjes integral
Riemann-Stieltjes integral

In mathematics, the Riemann?Stieltjes integral is a generalization of the Riemann integral, named after Bernhard Riemann and Thomas Joannes Stieltjes....
 is used. For an absolutely continuous probability distribution with probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 ƒ, we have

Medians of particular distributions: The medians of certain types of distributions can be easily calculated from their parameters: The median of a normal distribution
Normal distribution

The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields....
 with mean µ and variance s2 is µ. In fact, for a normal distribution, mean = median = mode. The median of a uniform distribution
Uniform distribution

Uniform distribution can refer to:...
 in the interval [ab] is (a + b) / 2, which is also the mean. The median of a Cauchy distribution
Cauchy distribution

The Cauchy?Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz,  is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as a Lorentz distribution, or a Lorentz function or the Breit?Wigner dis...
 with location parameter x0 and scale parameter y is x0, the location parameter. The median of an exponential distribution
Exponential distribution

In probability theory and statistics, the exponential distributions are a class of continuous probability distributions. They describe the times between events in a Poisson process, i.e....
 with rate parameter ? is the natural logarithm of 2 divided by the rate parameter: ln 2/?. The median of a Weibull distribution
Weibull distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is often called the Rosin?Rammler distribution when used to describe the size distribution of Granular material....
 with shape parameter k and scale parameter ? is ?(ln 2)1/k.

Medians in descriptive statistics

The median is primarily used for skewed
Skewness

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real number-valued random variable....
 distributions, which it summarizes differently than the arithmetic mean
Arithmetic mean

In mathematics and statistics, the arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list....
. Consider the multiset
Multiset

In mathematics, a multiset is a generalization of a Set . A Element of a multiset can have more than one Element , while each member of a set has only one membership....
 . The median is 2 in this case, as is the mode
Mode (statistics)

In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
, and it might be seen as a better indication of central tendency than the arithmetic mean
Arithmetic mean

In mathematics and statistics, the arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list....
 of 3.166.

Calculation of medians is a popular technique in summary statistics
Summary statistics

File:Michelsonmorley-boxplot.svgIn descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible....
 and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier
Outlier

In statistics, an outlier is an observation that is numerically distant from the rest of the data set.They can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution....
 values than is the mean
Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
.

Theoretical properties


An optimality property

The median is also the central point which minimizes the average of the absolute deviations; in the example above this would be (1 + 0 + 0 + 0 + 1 + 7) / 6 = 1.5 using the median, while it would be 1.944 using the mean. In the language of probability theory, the value of c that minimizes

is the median of the probability distribution of the random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
 X. Note, however, that c is not always unique, and therefore not well defined in general.

An inequality relating means and medians

For continuous probability distributions, the difference between the median and the mean is less than or equal to one standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
. See an inequality on location and scale parameters
An inequality on location and scale parameters

For probability distributions having an expected value and a median, the mean and the median can never differ from each other by more than one standard deviation....
.

The sample median


Efficient computation of the sample median

Even though sorting
Sorting algorithm

In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a List in a certain Total order. The most-used orders are numerical order and lexicographical order....
 n items takes in general O
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
(n log n) operations, by using a "divide and conquer" algorithm
Divide and conquer algorithm

In computer science, divide and conquer is an important algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple enough to be solved directly....
 the median of n items can be computed with only O
Big O notation

In mathematics, big O notation describes the asymptotic analysis of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions....
(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem
Selection algorithm

In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list This includes the cases of finding the minimum, maximum, and median elements....
).

Easy explanation of the sample median


As an example, we will calculate the median of the following population of numbers: 1, 5, 2, 8, 7.

Start by sorting the numbers: 1, 2, 5, 7, 8.

In this case, 5 is the median, because when the numbers are sorted, it is the middle number.

For a set of even numbers:

As an example of this scenario, we will calculate the median of the following population of numbers: 1, 5, 2, 10, 8, 7.

Again, start by sorting the numbers: 1, 2, 5, 7, 8, 10.

In this case, both 5 and 7, and all numbers between 5 and 7 are medians of the data points.

Sometimes one takes the average of the two median numbers to get a unique value ((5 + 7)/2 = 12/2 = 6).

Other estimates of the median

If data are represented by a statistical model
Statistical model

A statistical model is a set of mathematical equations which describe the behavior of an object of study in terms of random variables and their associated probability distributions....
 specifying a particular family of probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
s, then estimates of the median can be obtained by fitting that family of probability distributions to the data and calculating the theoretical median of the fitted distribution. See, for example Pareto interpolation
Pareto interpolation

Pareto interpolation is a method of estimator the median and other properties of a population that follows a Pareto distribution. It is used in economics when analysing the distribution of incomes in a population, when one must base estimates on a relatively small random sample taken from the population....
.

Medians in computer science

In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, a median calculation is often performed to determine the middle index
Index (information technology)

In computer science, an index can be:# an integer which identifies an array element# a pointer data element.# a data structure that enables sublinear-time lookup...
 of a sorted array
Array

In computer science, an array is a data structure consisting of a group of element s that are accessed by index . In most programming languages each element has the same data type and the array occupies a contiguous area of computer memory....
. The middle index is computed as (A + B)/2, where A is the index of the smallest value, and B is the index of the largest value. Joshua Bloch, a Google software engineer, posited that if (A + B) is larger than the maximum allowed integer
Integer (computer science)

In computer science, the term integer is used to refer to a data type which represents some finite subset of the mathematical integers. These are also known as integral data types....
 size, then a arithmetic overflow
Arithmetic overflow

The term arithmetic overflow or simply overflow has the following meanings.# In a digital computer, the condition that occurs when a calculation produces a result that is greater in magnitude than what a given processor register or Computer storage location can store or represent....
 would occur. He suggested that an alternative median calculation: A + ((B − A)/2) would avoid this problem. Note that the aforementioned calculations are for binary search and similar algorithms, and do not represent a true mathematical median.

History


Gustav Fechner introduced the median into the formal analysis of data.

See also

  • Order statistic
    Order statistic

    In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rankings, order statistics are among the most fundamental tools in non-parametric statistics and non-parametric inference....
  • An inequality on location and scale parameters
    An inequality on location and scale parameters

    For probability distributions having an expected value and a median, the mean and the median can never differ from each other by more than one standard deviation....
  • The median is the 2nd quartile
    Quartile

    In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents one fourth of the sampled population....
    , 5th decile
    Decile

    * In descriptive statistics, a decile is any of the 9 values that divide the sorted data into 10 equal parts, so that each part represents 1/10th of the sample or population....
    , and 50th percentile
    Percentile

    A percentile is the value of a variable below which a certain percentage of observations fall. So the 20th percentile is the value below which 20 percent of the observations may be found....
    .
  • Median voter theory
    Median voter theory

    The median voter theory, also known as the median voter theorem and the median voter model , is a famous voting model positing that in a majority election, if voter policy preferences can be represented as a points along a single dimension, if all voters vote deterministically for the politician that commits to a policy p...
  • The median in general is a biased
    Bias of an estimator

    In statistics, the difference between an estimator's expected value and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased....
     estimator.
  • Median graph
    Median graph

    In mathematics, and more specifically graph theory, a median graph is an undirected graph in which any three vertex a, b, and c have a unique median: a vertex m that belongs to shortest paths between any two of a, b, and c....
  • The centerpoint
    Centerpoint (geometry)

    In statistics and computational geometry, the centerpoint is a generalization of the median to data in two or more dimensions. Given a set of points, any hyperplane that goes through a centerpoint divides the points in two roughly equal parts: the smaller part should have at least a 1/ fraction of the points....
     is a generalization of the median for data in higher dimensions.


External links

  • for Median computations and income inequality metrics
    Income inequality metrics

    The concept of inequality is distinct from that of poverty and fairness. Income inequality metrics or income distribution metrics are used by social scientists to measure the distribution of income, and economic inequality among the participants in a particular economy, such as that of a specific country or of the world in general....