All Topics  
Histogram

 

   Email Print
   Bookmark   Link






 

Histogram



 
 
In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, a histogram is a graphical display of tabulated frequencies
Frequency (statistics)

In statistics the frequency of an Event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....
, shown as bar
Bar

Bar may refer to:*The Aramaic word for "Son" .* A stick, pole, or handrail made of structural steel** Grab bar** Rebar* An ingot or gold bar...
s. It shows what proportion of cases fall into each of several categories
Categorization

Categorization is the process in which ideas and objects are recognition, difference and understanding. Categorization implies that objects are grouped into categories, usually for some specific purpose....
. The categories are usually specified as non-overlapping interval
Interval

Interval may refer to:* Interval , a range of numbers * Interval measurements or interval variables in statistics is a level of measurement* Interval , the relationship between two notes...
s of some variable. The categories (bars) must be adjacent. The intervals (or bands) should ideally be of the same size .

Histograms are used to plot density. The total area of a histogram always equals 1.






Discussion
Ask a question about 'Histogram'
Start a new discussion about 'Histogram'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, a histogram is a graphical display of tabulated frequencies
Frequency (statistics)

In statistics the frequency of an Event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....
, shown as bar
Bar

Bar may refer to:*The Aramaic word for "Son" .* A stick, pole, or handrail made of structural steel** Grab bar** Rebar* An ingot or gold bar...
s. It shows what proportion of cases fall into each of several categories
Categorization

Categorization is the process in which ideas and objects are recognition, difference and understanding. Categorization implies that objects are grouped into categories, usually for some specific purpose....
. The categories are usually specified as non-overlapping interval
Interval

Interval may refer to:* Interval , a range of numbers * Interval measurements or interval variables in statistics is a level of measurement* Interval , the relationship between two notes...
s of some variable. The categories (bars) must be adjacent. The intervals (or bands) should ideally be of the same size .

Histograms are used to plot density. The total area of a histogram always equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

The word histogram is derived from the Greek
Greek language

Greek is an Indo-European languages native to the southern Balkan peninsula, the language of the Greek people. It forms an independent branch within Indo-European....
 histos 'anything set upright' (as the masts of a ship, the bar of a loom, or the vertical bars of a histogram); and gramma 'drawing, record, writing'. The histogram is one of the seven basic tools of quality control, which also include the Pareto chart
Pareto chart

A Pareto chart is a special type of bar chart where the values being plotted are arranged in descending order. The graph is accompanied by a line graph which shows the cumulative totals of each category, left to right....
, check sheet
Check sheet

The check sheet is a simple document that is used for collecting data in real-time and at the location where the data is generated. The document is typically a blank form that is designed for the quick, easy, and efficient recording of the desired information, which can be either quantitative or qualitative....
, control chart
Control chart

The control chart, also known as the Shewhart chart or process-behaviour chart, in statistical process control is a tool used to determine whether a manufacturing or business Process is in a state of statistical control or not....
, cause-and-effect diagram, flowchart
Flowchart

A flowchart is common type of chart, that represents an algorithm or Process , showing the steps as boxes of various kinds, and their order by connecting these with arrows....
, and scatter diagram. A generalization of the histogram is kernel
Kernel (statistics)

A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable....
 smoothing techniques. This will construct a very smooth probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 from the supplied data.

Examples

As an example we consider data collected by the U.S. Census Bureau
United States Census Bureau

The United States Census Bureau is the government agency that is responsible for the United States Census. It also gathers other national demographic and economic data....
 on time to travel to work (2000 census, , Table 2). The census found that there were 124 million people who work outside of their homes. This rounding is a common phenomenon when collecting data from people.

Data by absolute numbers
Interval Width Quantity Quantity/width
0 5 4180 836
5 5 13687 2737
10 5 18618 3723
15 5 19634 3926
20 5 17981 3596
25 5 7190 1438
30 5 16369 3273
35 5 3212 642
40 5 4122 824
45 15 9200 613
60 30 6461 215
90 60 3435 57


This histogram shows the number of cases per unit interval
Unit interval

In mathematics, the unit interval is the interval [0,1], that is, the set of all real numbers that are greater than or equal to 0 and less than or equal to 1....
 so that the height of each bar is equal to the proportion of total people in the survey who fall into that category. The area under the curve represents the total number of cases (124 million). This type of histogram shows absolute numbers.


Data by proportion
Interval Width Quantity (Q) Q/total/width
0 5 4180 0.0067
5 5 13687 0.0221
10 5 18618 0.0300
15 5 19634 0.0316
20 5 17981 0.0290
25 5 7190 0.0116
30 5 16369 0.0264
35 5 3212 0.0052
40 5 4122 0.0066
45 15 9200 0.0049
60 30 6461 0.0017
90 60 3435 0.0005


This histogram differs from the first only in the vertical
Vertical direction

In astronomy, geography, geometry and related sciences and contexts, a Direction passing by a given point is said to be vertical if it is locally aligned with the gradient of the Gravitation Field , i.e., with the direction of the gravitational force at that point....
 scale. The height of each bar is the decimal percentage of the total that each category represents, and the total area of all the bars is equal to 1, the decimal equivalent of 100%. The curve displayed is a simple density estimate
Density estimation

In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function....
. This version shows proportions, and is also known as a unit area histogram.


In other words a histogram represents a frequency distribution by means of rectangles whose widths represent class intervals and whose areas are proportional to the corresponding frequencies. They only place the bars together to make it easier to compare data.

Activities and demonstrations

The SOCR
SOCR

The Statistics Online Computational Resource is a suite of online tools and interactive aids for hands-on learning and teaching concepts in statistical analysis and probability developed at the University of California, Los Angeles....
 resource pages contain a number of hands-on interactive activities demonstrating the concept of a histogram, histogram and using Java applets and .

Mathematical definition


In a more general mathematical sense, a histogram is a mapping that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let be the total number of observations and be the total number of bins, the histogram meets the following conditions:

Cumulative histogram

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram of a histogram is defined as:

Number of bins and width

There is no "best" number of bins, and different bin sizes can reveal different features of the data. Some theoreticians have attempted to determine an optimal number of bins, but these methods generally make strong assumptions about the shape of the distribution. You should always experiment with bin widths before choosing one (or more) that illustrate the salient features in your data.

The number of bins can be calculated directly, or from a suggested bin width : The braces indicate the ceiling function
Floor function

In mathematics and computer science, the floor and ceiling function s map a real number to the next smallest or next largest integer. More precisely, floor is the largest integer not greater than x and ceiling is the smallest integer not less than x....
.

Sturges' formula: which implicitly bases the bin sizes on the range of the data, and can perform poorly if .

Scott's choice: where is the common bin width, and is the sample standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
.

Freedman-Diaconis' choice: which is based on the interquartile range
Interquartile range

In descriptive statistics, the interquartile range , also called the midspread, middle fifty and middle of the #s, is a measure of statistical dispersion, being equal to the difference between the third and first quartiles....
.

Continuous data

The idea of a histogram can be generalized to continuous data. Let (see Lebesgue space), then the cumulative histogram operator can be defined by: with only finitely many interval
Interval

Interval may refer to:* Interval , a range of numbers * Interval measurements or interval variables in statistics is a level of measurement* Interval , the relationship between two notes...
s of monotony
Monotonic function

In mathematics, a monotonic function is a function which preserves the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order theory....
 this can be rewritten as is undefined if is the value of a stationary point
Stationary point

In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....
.

See also

  • Density estimation
    Density estimation

    In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function....
  • Freedman-Diaconis rule
    Freedman-Diaconis rule

    In statistics, the Freedman-Diaconis rule can be used to select the size of the bins to be used in a histogram. The general equation for the rule is:...
  • Image histogram
    Image histogram

    An image histogram is type of histogram which acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value....
  • Kernel density estimation
    Kernel density estimation

    In statistics, kernel density estimation is a Non-parametric statistics way of Density estimation the probability density function of a random variable....
     , another method of visualizing probability density functions that can be preferred to histograms.


External links

  • (location of census document cited in example)