All Topics  
Likelihood function

 

   Email Print
   Bookmark   Link






 

Likelihood function



 
 
In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, the likelihood function (often simply the likelihood) is a function of the parameter
Parameter

In mathematics, statistics, and the mathematical sciences, a parameter is a quantity that defines certain characteristics of systems or function s....
s of a statistical model
Statistical model

A statistical model is a set of mathematical equations which describe the behavior of an object of study in terms of random variables and their associated probability distributions....
 that plays a key role in statistical inference
Statistical inference

Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population....
. In non-technical usage, "likelihood" is a synonym for "probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
", but throughout this article only the technical definition is used. Informally, if "probability" allows us to predict unknown outcomes based on known parameters, then "likelihood" allows us to estimate unknown parameters based on known outcomes.

In a sense, likelihood works backwards from probability: given parameter B, we use the conditional probability P(A|B) to reason about outcome A, and given outcome A, we use the likelihood function L(B|A) to reason about parameter B.






Discussion
Ask a question about 'Likelihood function'
Start a new discussion about 'Likelihood function'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In statistics
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
, the likelihood function (often simply the likelihood) is a function of the parameter
Parameter

In mathematics, statistics, and the mathematical sciences, a parameter is a quantity that defines certain characteristics of systems or function s....
s of a statistical model
Statistical model

A statistical model is a set of mathematical equations which describe the behavior of an object of study in terms of random variables and their associated probability distributions....
 that plays a key role in statistical inference
Statistical inference

Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population....
. In non-technical usage, "likelihood" is a synonym for "probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
", but throughout this article only the technical definition is used. Informally, if "probability" allows us to predict unknown outcomes based on known parameters, then "likelihood" allows us to estimate unknown parameters based on known outcomes.

In a sense, likelihood works backwards from probability: given parameter B, we use the conditional probability P(A|B) to reason about outcome A, and given outcome A, we use the likelihood function L(B|A) to reason about parameter B. This mode of reasoning is formalized in Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
:

A likelihood function is a conditional probability
Conditional probability

Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P, and is read "the probability of A, given B"....
 function
Function (mathematics)

The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
 considered as a function of its second argument with its first argument held fixed, thus:

and also any other function proportional to such a function. That is, the likelihood function for B is the equivalence class
Equivalence class

In mathematics, given a Set X and an equivalence relation ~ on X, the equivalence class of an element a in X is the subset of all elements in X which are equivalent to a:...
 of functions

for any constant of proportionality . The numerical value alone is immaterial; all that matters are likelihood ratio
Ratio

A ratio is an expression which compares quantities relative to each other. The most common examples involve two quantities, but in theory any number of quantities can be compared....
s of the form

which are invariant with respect to the constant of proportionality.

For more about making inferences via likelihood functions, see also the method of maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
, and likelihood-ratio test
Likelihood-ratio test

The likelihood ratio, often denoted by , is the ratio of the maximum probability of a result under two different hypotheses. A likelihood-ratio test is a statistical test for making a decision between two hypotheses based on the value of this ratio....
ing.

Likelihood function of a parameterized model

Among many applications, we consider here one of broad theoretical and practical importance. Given a parameterized family of probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
s (or probability mass function
Probability mass function

In probability theory, a probability mass function is a function that gives the probability that a discrete random variable random variable is exactly equal to some value....
s in the case of discrete distributions)

where θ is the parameter, the likelihood function is written

where x is the observed outcome of an experiment. In other words, when f(x | θ) is viewed as a function of x with θ fixed, it is a probability density function, and when viewed as a function of θ with x fixed, it is a likelihood function.

Note: This is not the same as the probability that those parameters are the right ones, given the observed sample. Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence. See prosecutor's fallacy
Prosecutor's fallacy

The prosecutor's fallacy is any of several fallacy of statistical reasoning often used in legal arguments. Two of the most common errors are described below:...
 for an example of this.

From a geometric standpoint, If we consider f (x,θ) as a function of two variables, then the family of probability distributions can be viewed as level curves parallel to the θ -axis, while the family of likelihood functions are the orthogonal level curves parallel to the x-axis.

Likelihoods for continuous distributions


The use of the probability density
Probability density

Probability density may refer to:* Probability density function in probability theory* Probability amplitude in quantum mechanics...
 instead of a probability in specifying the likelihood function above may be justified in a simple way. Suppose that, instead of an exact observation, x, the observation is the value in a short interval (xj-1,xj), with length ?j, where the subscripts refer to a predefined set of intervals. Then the probability of getting this observation (of being in interval j) is approximately

where x* can be any point in interval j. Then, recalling that the likelihood function is defined up to a multiplicative constant, it is just as valid to say that the likelihood function is approximately

and then, on considering the lengths of the intervals to decrease to zero,

Likelihoods for mixed continuous — discrete distributions


The above can be extended in a simple way to allow consideration of distributions which contain both discrete and continuous components. Suppose that the distribution consists of a number of discrete probability masses pk(?) and a density f(x|?), where the sum of all the ps added to the integral of f is always one. Assuming that it is possible to distinguish an observation corresponding to one of the discrete probability masses from one which corresponds to the density component, the likelihood function for an observation from the continuous component can be dealt with as above by setting the interval length short enough to exclude any of the discrete masses. For an observation from the discrete component, the probability can either be written down directly or treated within the above context by saying that the probability of getting an observation in an interval that does contain a discrete component (of being in interval j which contains discrete component k) is approximately

where x* can be any point in interval
j. Then, on considering the lengths of the intervals to decrease to zero, the likelihood function for a observation from the discrete component is

where
k is the index of the discrete probability mass corresponding to observation x.

The fact that the likelihood function can be defined in a way that includes contributions that are not commensurate (the density and the probability mass) arises from the way in which the likelihood function is defined up to a constant of proportionality, where this "constant" can change with the observation
x, but not with the parameter ?.

Example 1

For example, a coin
Coin

A coin is a piece of hard material, usually metal or a metallic material, usually in the shape of a Disk , and most often issued by a government....
 is tossed with a probability
pH of landing heads up ('H'), the probability of getting two heads in two trials ('HH') is pH2. If pH = 0.5, then the probability of seeing two heads is 0.25.

In symbols, we can say the above as

Another way of saying this is to reverse it and say that "the likelihood of
pH = 0.5, given the observation 'HH', is 0.25", i.e.,

.

But this is not the same as saying that the
probability of pH = 0.5, given the observation, is 0.25.

To take an extreme case, on this basis we can say "the likelihood of
pH = 1 given the observation 'HH' is 1". But it is clearly not the case that the probability of pH = 1 given the observation is 1: the event 'HH' can occur for any pH > 0 (and often does, in reality, for pH roughly 0.5). If the probability of pH = 1 given the observation is 1, it means that pH must and can only be equal 1 for event 'HH' to occur which is obviously not true.

The likelihood function is not a probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 – for example, the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood over the interval [0, 1] in
pH is 1/3, demonstrating again that the likelihood function cannot be interpreted as a probability density function for pH. On the other hand, given any particular value of pH, e.g. pH = 0.5, the integral of the probability density function over the domain of the random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
s
is 1.

Example 2


Consider a jar containing
N lottery tickets numbered from 1 through N. If you pick a ticket randomly you get number n with probability 1/N if n ≤ N, and zero if n > N. This is written

where the Iverson bracket
Iverson bracket

In mathematics, the Iverson bracket is a convenient notation that denotes a number that is 1 if the condition in square brackets is satisfied, and 0 otherwise....
 [
n ≤ N] is 1 when n ≤ N and 0 otherwise. When considered a function of n for fixed N this is the probability distribution, but when considered a function of N for fixed n this is a likelihood function. The maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
 estimate for
N is N0 = n (by contrast, the unbiased estimate
Bias of an estimator

In statistics, the difference between an estimator's expected value and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased....
 is 2
n − 1).

This likelihood function is not a probability distribution, because the total

is a divergent series
Divergent series

In mathematics, a divergent series is an infinite series that is not Convergent series, meaning that the infinite sequence of the partial sums of the series does not have a limit of a sequence....
.

Suppose, however, that you pick
two tickets rather than one.

The probability of the outcome , where
n1 < n2, is

When considered a function of
N for fixed n2, this is a likelihood function. The maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
 estimate for
N is N0 = n2.

This time the total

is a convergent series, and so this likelihood function can be normalized into a probability distribution.

If you pick 3 or more tickets the likelihood function has a well defined mean value, which is larger than the maximum likelihood estimate. If you pick 4 or more tickets the likelihood function has a well defined standard deviation
Standard deviation

In statistics, standard deviation is a simple measure of the variability or statistical dispersion of a data set. A low standard deviation indicates that all of the data points are very close to the same value , while high standard deviation indicates that the data are ?spread out? over a large range of values....
 too.

Likelihoods that eliminate nuisance parameters

In many cases, the likelihood is a function of more than one parameter but interest focusses on the estimation of only one or at most a few of them, with the others being considered as nuisance parameters. Several alternative ways have been developed to eliminate such nuisance parameters so that a likelihood can be written as a function of the parameter (or parameters) of interest only, the main ones being marginal, conditional and profile likelihoods.

These are useful because standard likelihood methods can become unreliable or fail entirely when there are many nuisance parameters (or the nuisance parameter is high-dimensional), particularly when the number of nuisance parameters is a substantial fraction of the number of observations and this fraction does not decrease when the sample size increases. They can also be used to derive closed-form formulae for statistical tests when direct use of maximum likelihood requires iterative numerical methods, and find application in some specialized topics such as sequential analysis
Sequential analysis

In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance....
.

Conditional likelihood

Sometimes it is possible to find a sufficient statistic for the nuisance parameters, and conditioning on this statistic results in a likelihood which does not depend on the nuisance parameters.

One example occurs in 2×2 tables, where conditioning on all four marginal totals leads to a conditional likelihood based on the non-central hypergeometric distribution
Hypergeometric distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement, just as the binomial distribution describes the number of successes for draws with replacement....
. (This form of conditioning is also the basis for Fisher's exact test
Fisher's exact test

Fisher's exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small. It is named after its inventor, Ronald Fisher, and is one of a class of exact test, so called because the significance of the deviation from a null hypothesis can be calculated exactly rather than by relying on a t...
.)

Marginal likelihood

Sometimes we can remove the nuisance parameters by considering a likelihood based on only part of the information in the data, for example by using the set of ranks rather than the numerical values. Another example occurs in linear mixed models, where considering a likelihood for the residuals only after fitting the fixed effects leads to residual maximum likelihood estimation of the variance components. (Note that there is a different meaning of marginal likelihood in Bayesian inference
Marginal likelihood

In Bayesian probability probability theory, a marginal likelihood function is a likelihood function integrated over some variables, typically model parameters....
).

Profile likelihood

It is often possible to write some parameters as functions of other parameters, thereby reducing the number of independent parameters. (The function is the parameter value which maximises the likelihood given the value of the other parameters.) This procedure is called concentration of the parameters and results in the concentrated likelihood function, also occasionally known as the maximized likelihood function, but most often called the profile likelihood function.

For example, consider a regression analysis
Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable and of one or more independent variables ....
 model with normally distributed
Normal distribution

The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields....
 errors
Errors and residuals in statistics

In statistics and Optimization , statistical errors and residuals are two closely related and easily confused measures of "deviation of a sample from the mean": the error of a sample is the deviation of the sample from the population mean or actual function, while the residual of a sample is the difference between the sa...
. The most likely value of the error variance
Variance

In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
 is the variance of the residuals
Errors and residuals in statistics

In statistics and Optimization , statistical errors and residuals are two closely related and easily confused measures of "deviation of a sample from the mean": the error of a sample is the deviation of the sample from the population mean or actual function, while the residual of a sample is the difference between the sa...
. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.

Unlike conditional and marginal likelihoods, profile likelihood methods can always be used (even when the profile likelihood cannot be written down explicitly). However, the profile likelihood is not a true likelihood as it is not based directly on a probability distribution and this leads to some less satisfactory properties. (Attempts have been made to improve this, resulting in modified profile likelihood.)

The idea of profile likelihood can also be used to compute confidence interval
Confidence interval

In statistics, a confidence interval is an interval estimation of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given....
s that often have better small-sample properties than those based on asymptotic standard error
Standard error

Standard error can refer to:* Standard error , the estimated standard deviation or error of a series of measurements* Standard error stream, one of the standard streams in Unix-like operating systems...
s calculated from the full likelihood.

Historical remarks

Some early thoughts on likelihood were made in a book by Thorvald N. Thiele
Thorvald N. Thiele

Thorvald Nicolai Thiele was a Denmark astronomer, actuary and mathematician, most notable for his work in statistics, interpolation and the N-body problem#Three-body problem....
 published in 1889. The first paper where the full idea of the "likelihood" appears was written by R.A. Fisher in 1922: "On the mathematical foundations of theoretical statistics". In that paper, Fisher also uses the term "method of maximum likelihood". Fisher argues against inverse probability
Inverse probability

In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution" of an unobserved variable given data is ra...
 as a basis for statistical inferences, and instead proposes inferences based on likelihood functions.

See also

  • Bayes factor
    Bayes factor

    In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing....
  • Bayesian inference
    Bayesian inference

    Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true....
  • Conditional probability
    Conditional probability

    Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P, and is read "the probability of A, given B"....
  • Likelihood principle
    Likelihood principle

    In statistics,the likelihood principle is a controversial principle of statistical inference which asserts that all of the information in a Sampling is contained in the likelihood function....
  • Likelihood-ratio test
    Likelihood-ratio test

    The likelihood ratio, often denoted by , is the ratio of the maximum probability of a result under two different hypotheses. A likelihood-ratio test is a statistical test for making a decision between two hypotheses based on the value of this ratio....
  • Principle of maximum entropy
    Principle of maximum entropy

    The principle of maximum entropy is a postulate about a universal feature of any probability assignment on a given set of propositions . Let some testable information about a probability distribution function be given....
  • Score (statistics)
    Score (statistics)

    In statistics, the score or score function is the partial derivative, with respect to some parameter , of the logarithm of the likelihood function....


External links