Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Summary statistics

Summary statistics

Overview
In descriptive statistics
Descriptive statistics
Descriptive statistics are used to describe the main features of a collection of data in quantitative terms. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to quantitatively summarize a data set, rather than being used to support...

, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible. Statisticians commonly try to describe the observations in
  1. a measure of location, or central tendency, such as the arithmetic mean
    Arithmetic mean
    In mathematics and statistics, the arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list. If the list is a statistical population, then the mean of that population is called a population mean...

    , median
    Median
    In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest...

    , mode
    Mode (statistics)
    In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score...

    , or interquartile mean
    Interquartile mean
    The interquartile mean is a statistical measure of central tendency, much like the mean , the median, and the mode....

  2. a measure of statistical dispersion
    Statistical dispersion
    In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

     like the standard deviation
    Standard deviation
    In probability theory and statistics, the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance. Standard deviation is a widely used measure of the variability or dispersion, being algebraically more tractable though...

    , variance
    Variance
    In probability theory and statistics, the variance of a random variable or distribution is the expected square deviation of that variable from its expected value or mean, or to put it another way: variance is the measure of the amount of variation of all the scores for a variable...

    , range
    Range (statistics)
    In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

    , or interquartile range
    Interquartile range
    In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the third and first quartiles....

    , or absolute deviation
    Absolute deviation
    In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.whereSeveral...

    .
  3. a measure of the shape of the distribution like skewness
    Skewness
    In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.-Introduction :...

     or kurtosis
    Kurtosis
    In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable...



A common collection of order statistics used as summary statistics are the five-number summary
Five-number summary
In descriptive statistics, the five-number summary of a data set consists of:# the sample minimum # the lower quartile or first quartile # the median...

, sometimes extended to a seven-number summary
Seven-number summary
In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is a modification or extension of the five-number summary...

, and the associated box plot
Box plot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries In descriptive statistics, a box plot or boxplot (also known as a box-and-whisker diagram or plot) is a convenient way of graphically depicting...

.

The following example using R is the standard summary statistics of a randomly sampled normal distribution
Normal distribution
In probability theory and statistics, the normal distribution or Gaussian distribution is a continuous probability distribution that describes data that cluster around a mean or average. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known...

, with a mean of 0, standard deviation
Standard deviation
In probability theory and statistics, the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance. Standard deviation is a widely used measure of the variability or dispersion, being algebraically more tractable though...

 of 1, and a population of 50:
> x <- rnorm(n=50, mean=0, sd=1)
> summary(x)

Min.
Discussion
Ask a question about 'Summary statistics'
Start a new discussion about 'Summary statistics'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In descriptive statistics
Descriptive statistics
Descriptive statistics are used to describe the main features of a collection of data in quantitative terms. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to quantitatively summarize a data set, rather than being used to support...

, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible. Statisticians commonly try to describe the observations in
  1. a measure of location, or central tendency, such as the arithmetic mean
    Arithmetic mean
    In mathematics and statistics, the arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list. If the list is a statistical population, then the mean of that population is called a population mean...

    , median
    Median
    In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest...

    , mode
    Mode (statistics)
    In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score...

    , or interquartile mean
    Interquartile mean
    The interquartile mean is a statistical measure of central tendency, much like the mean , the median, and the mode....

  2. a measure of statistical dispersion
    Statistical dispersion
    In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

     like the standard deviation
    Standard deviation
    In probability theory and statistics, the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance. Standard deviation is a widely used measure of the variability or dispersion, being algebraically more tractable though...

    , variance
    Variance
    In probability theory and statistics, the variance of a random variable or distribution is the expected square deviation of that variable from its expected value or mean, or to put it another way: variance is the measure of the amount of variation of all the scores for a variable...

    , range
    Range (statistics)
    In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

    , or interquartile range
    Interquartile range
    In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the third and first quartiles....

    , or absolute deviation
    Absolute deviation
    In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.whereSeveral...

    .
  3. a measure of the shape of the distribution like skewness
    Skewness
    In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.-Introduction :...

     or kurtosis
    Kurtosis
    In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable...



A common collection of order statistics used as summary statistics are the five-number summary
Five-number summary
In descriptive statistics, the five-number summary of a data set consists of:# the sample minimum # the lower quartile or first quartile # the median...

, sometimes extended to a seven-number summary
Seven-number summary
In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is a modification or extension of the five-number summary...

, and the associated box plot
Box plot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries In descriptive statistics, a box plot or boxplot (also known as a box-and-whisker diagram or plot) is a convenient way of graphically depicting...

.

Example


The following example using R is the standard summary statistics of a randomly sampled normal distribution
Normal distribution
In probability theory and statistics, the normal distribution or Gaussian distribution is a continuous probability distribution that describes data that cluster around a mean or average. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known...

, with a mean of 0, standard deviation
Standard deviation
In probability theory and statistics, the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance. Standard deviation is a widely used measure of the variability or dispersion, being algebraically more tractable though...

 of 1, and a population of 50:
> x <- rnorm(n=50, mean=0, sd=1)
> summary(x)

Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.72700 -0.49650 -0.05157 0.07981 0.67640 2.46700

Examples of summary statistics


The Gini coefficient
Gini coefficient
The Gini coefficient is a measure of statistical dispersion developed by the Italian statistician Corrado Gini and published in his 1912 paper "Variability and Mutability" . It is commonly used as a measure of inequality of income or wealth...

 was originally developed to measure income inequality, but can be used for other purposes as well.

See also

  • descriptive statistics
    Descriptive statistics
    Descriptive statistics are used to describe the main features of a collection of data in quantitative terms. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to quantitatively summarize a data set, rather than being used to support...

  • statistical theory
    Statistical theory
    The theory of statistics includes a number of topics:Statistical models of the sources of data and typical problem formulation:#Sampling from a finite population#Measuring observational error and refining procedures#Studying statistical relations...

  • sufficient statistic