In probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the discrete uniform distribution
is a probability distribution whereby a finite number of equally spaced values are equally likely to be observed; every one of n
values has equal probability 1/n
. Another way of saying "discrete uniform distribution" would be "a known, finite number of equally spaced outcomes equally likely to happen."
If a random variable has any of
that are equally spaced and equally probable, then it has a discrete uniform distribution. The probability of any outcome
. A simple example of the discrete uniform distribution is throwing a fair . The possible values of
are 1, 2, 3, 4, 5, 6; and each time the die is thrown, the probability of a given score is 1/6. If two dice are thrown and their values added, the uniform distribution no longer fits since the values from 2 to 12 do not have equal probabilities.
The cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(CDF) can be expressed in terms of a degenerate distribution
where the Heaviside step function
In mathematics, a function on the real numbers is called a step function if it can be written as a finite linear combination of indicator functions of intervals...
is the CDF of the degenerate distribution centered at
, using the convention that
Estimation of maximum
This example is described by saying that a sample of k
observations is obtained from a uniform distribution on the integers
, with the problem being to estimate the unknown maximum N
. This problem is commonly known as the German tank problem
In the statistical theory of estimation, estimating the maximum of a uniform distribution is a common illustration of differences between estimation methods...
, following the application of maximum estimation to estimates of German tank production during World War II
World War II, or the Second World War , was a global conflict lasting from 1939 to 1945, involving most of the world's nations—including all of the great powers—eventually forming two opposing military alliances: the Allies and the Axis...
The UMVU estimator for the maximum is given by
is the sample maximum and k
is the sample size
Sample size determination is the act of choosing the number of observations to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample...
, sampling without replacement. This can be seen as a very simple case of maximum spacing estimation
In statistics, maximum spacing estimation , or maximum product of spacing estimation , is a method for estimating the parameters of a univariate statistical model...
The formula may be understood intuitively as:
- "The sample maximum plus the average gap between observations in the sample",
the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum.
[The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.]
This has a variance of
so a standard deviation of approximately
, the (population) average size of a gap between samples; compare
The sample maximum is the maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
estimator for the population maximum, but, as discussed above, it is biased.
If samples are not numbered but are recognizable or markable, one can instead estimate population size via the capture-recapture method.
See rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly distributed random permutation
A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and simulation...