Divergence (statistics)
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 and information geometry
Information geometry
Information geometry is a branch of mathematics that applies the techniques of differential geometry to the field of probability theory. It derives its name from the fact that the Fisher information is used as the Riemannian metric when considering the geometry of probability distribution families...

, divergence or a contrast function is a function which establishes the “distance” of one probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

 to the other on a statistical manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

. The divergence is a weaker notion than that of the distance
Distance
Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...

 in mathematics, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality
Triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....

.

Definition

Suppose S is a space of all probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s with common support. Then a divergence on S is a function satisfying
  1. D(p || q) ≥ 0 for all p, qS,
  2. D(p || q) = 0 if and only if p = q,
  3. The matrix g(D) (see definition in the “geometrical properties” section) is strictly positive-definite everywhere on S.


The dual divergence D* is defined as

Geometrical properties

Many properties of divergences can be derived if we restrict S to be a statistical manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....

, meaning that it can be parametrized with a finite-dimensional coordinate system θ, so that for a distribution we can write .

For a pair of points with coordinates θp and θq, denote the partial derivatives of D(p || q) as

Now we restrict these functions to a diagonal , and denote


By definition, the function D(p || q) is minimized at , and therefore

where matrix g(D) is positive semi-definite and defines a unique Riemannian metric on the manifold S.

Divergence D(· || ·) also defines a unique torsion-free affine connection
Affine connection
In the branch of mathematics called differential geometry, an affine connection is a geometrical object on a smooth manifold which connects nearby tangent spaces, and so permits tangent vector fields to be differentiated as if they were functions on the manifold with values in a fixed vector space...

 ∇(D) with coefficients

and the dual to this connection ∇* is generated by the dual divergence D*.

Thus, a divergence D(· || ·) generates on a statistical manifold a unique dualistic structure (g(D), ∇(D), ∇(D*)). The converse is also true: every torsion-free dualistic structure on a statistical manifold is induced from some globally defined divergence function (which however need not be unique).

For example, when D is an f-divergence
F-divergence
In probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...

 for some function ƒ(·), then it generates the metric  and the connection , where g is the canonical Fisher information metric
Fisher information metric
In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space....

, ∇(α) is the α-connection, , and .

Examples

The largest and most frequently used class of divergences form the so-called f-divergence
F-divergence
In probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...

s
, however other types of divergence functions are also encountered in the literature.

f-divergences

This family of divergences are generated through functions f(u), convex on and such that . Then an f-divergence is defined as

Kullback-Leibler divergence:
squared Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...

:
Jeffrey’s divergence:
Chernoff’s α-divergence:
exponential divergence:
Kagan’s divergence:
(α,β)-product divergence:


The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK