Mixture density
Encyclopedia
In probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a mixture distribution is the probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

 of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 whose values can be interpreted as being derived in a simple way from an underlying set of other random variables. In particular, the final outcome value is selected at random from among the underlying values, with a certain probability of selection being associated with each. Here the underlying random variables may be random vectors, each having the same dimension, in which case the mixture distribution is a multivariate distribution.

In cases where each of the underlying random variables is continuous, the outcome variable will also be continuous and its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

 is sometimes referred to as a mixture density. The cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

 (and the probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

 if it exists) can be expressed as a convex combination
Convex combination
In convex geometry, a convex combination is a linear combination of points where all coefficients are non-negative and sum up to 1....

 (i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density functions. The individual distributions that are combined to form the mixture distribution are called the mixture components, and the probabilities (or weights) associated with each component are called the mixture weights. The number of components in mixture distribution is often restricted to being finite, although in some cases the components may be countable. More general cases (i.e. an uncountable set of component distributions), as well as the countable case, are treated under the title of compound distributions.

A distinction needs to be made between a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 whose distribution function or density is the sum of a set of components (i.e a mixture distribution) and a random variable whose value is the sum of the values of two or more underlying random variables, in which case the distribution is given by the convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

 operator. As an example, the sum of two normally-distributed random variables, each with different means, will still be a normal distribution. On the other hand, a mixture density created as a mixture of two normal distributions with different means will have two peaks provided that the two means are far enough apart, showing that this distribution is radically different from a normal distribution.

Mixture distributions arise in many contexts in the literature and arise naturally where a statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

 contains two or more sub-populations. They are also sometimes used as a means of representing non-normal distributions. Data analysis concerning statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

s involving mixture distributions is discussed under the title of mixture model
Mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of sub-populations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs...

s, while the present article concentrates on simple probabilistic and statistical properties of mixture distributions and how these relate to properties of the underlying distributions.

Finite and countable mixtures

Given a finite set of probability density functions p1(x), …, pn(x), or corresponding cumulative distribution functions P1(x), …, Pn(x) and weights w1, …, wn such that and the mixture distribution can be represented by writing either the density, f, or the distribution function, F, as a sum (which in both cases is a convex combination):
This type of mixture, being a finite sum, is called a finite mixture, and in applications, an unqualified reference to a "mixture density" usually means a finite mixture. The case of a countable set of components is covered formally by allowing n=∞.

Uncountable mixtures

Where the set of component distributions is uncountable, the result is often called a compound probability distribution
Compound probability distribution
In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is distributed according to some other distribution G, and then...

. The construction of such distributions has a formal similarity to that of mixture distributions, with either infinite summations or integrals replacing the finite summations used for finite mixtures.

Consider a probability density function p(x;a) for a variable x, parameterized by a. That is, for each value of a in some set A, p(x;a) is a probability density function with respect to x. Given a probability density function w (meaning that w is nonnegative and integrates to 1), the function


is again a probability density function for x. A similar integral can be written for the cumulative distribution function. Note that the formulae here reduce to the case of a finite or infinite mixture if the density w is allowed to be a generalised function representing the "derivative" of the cumulative distribution function of a discrete distribution.

Mixtures of parametric families

The mixture components are often not arbitrary probability distributions, but instead are members of a parametric family
Parametric family
In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....

 (such as normal distributions), with different values for a parameter or parameters. In such cases, assuming that it exists, the density can be written in the form of a sum as:
for one parameter, or
for two parameters, and so forth.

Convexity

A general linear combination
Linear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...

 of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a convex combination
Convex combination
In convex geometry, a convex combination is a linear combination of points where all coefficients are non-negative and sum up to 1....

 of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.

Moments

Let X1, ..., Xn denote random variables from the n component distributions, and let X denote a random variable from the mixture distribution. Then, for any function H(·) for which exists, and assuming that the component densities pi(x) exist,


The relation,


holds more generally.

It is a trivial matter to note that the jth moment about zero (i.e. choosing ) is simply a weighted average of the jth moments of the components. Moments about the mean involve a binomial expansion:


where μi denotes the mean of the ith component.
In case of a mixture of one-dimensional normal distributions with weights wi, means μi and variances σi2, the total mean and variance will be:


These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as skewness
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

 and kurtosis
Kurtosis
In probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...

 (fat tail
Fat tail
A fat-tailed distribution is a probability distribution that has the property, along with the heavy-tailed distributions, that they exhibit extremely large skewness or kurtosis. This comparison is often made relative to the ubiquitous normal distribution, which itself is an example of an...

s
) and multi-modality, even in the absence of such features within the components themselves. Marron and Wand (1992) give an illustrative account of the flexibility of this framework.

Modes

The question of multimodality is simple for some cases, such as mixtures of exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

s: all such mixtures are unimodal
Unimodality
Unimodality is a term used in several contexts in mathematics. Originally, it relates to possessing a unique mode.- Unimodal probability distribution :...

. However, for the case of mixtures of normal distributions, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray and Lindsay extending the earlier work on univariate (Robertson and Fryer, 1969; Behboodian, 1970) and multivariate distributions (Carreira-Perpinan and Williams, 2003).

Here the problem of evaluation of the modes of a n component mixture in a D dimensional space is reduced to identification of critical points (local minima, maxima and saddle points) on a manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

 referred to as the ridgeline surface
where α belongs to the dimensional unit simplex

and correspond to the covariance and mean of the ith component. Ray and Lindsay consider the case in which showing a one-to-one correspondence of modes of the mixture and those on the elevation function
thus one may identify the modes by solving with respect to α and determining the value x*(α).

Using graphical tools, the potential multi-modality of } mixtures is demonstrated; in particular it is shown that the number of modes may exceed n and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to w1 and expressing the solutions as a function Π(α), so that the number and location of modes for a given value of w1 corresponds to the number of intersections of the graph on the line . This in turn can be related to the number of oscillations of the graph and therefore to solutions of leading to an explicit solution for a two component homoscedastic mixture given by
where is the Mahalanobis distance
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...

.

Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights.

Applications

Mixture densities express complex densities (mixture densities) in terms of simpler densities (the mixture components), and are used both because they provide a good model for certain data sets (where different subsets of the data exhibit different characteristics and can best be modeled separately), and because they can be more mathematically tractable, because the individual mixture components can be more easily studied than the overall mixture density.

Mixture densities can be used used to model a statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

 with subpopulations, where the mixture components are the densities on the subpopulations, and the weights are the proportion of each subpopulation in the overall population.

Mixture densities can also be used to model experimental error or contamination – one assumes that most of the samples measure the desired phenomenon,

Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few outliers – and instead one uses robust statistics
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK