Wasserstein metric
Encyclopedia
In mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, the Wasserstein (or Vasershtein) metric is a distance function
Metric (mathematics)
In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...

 defined between probability distribution
Probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...

s on a given metric space
Metric space
In mathematics, a metric space is a set where a notion of distance between elements of the set is defined.The metric space which most closely corresponds to our intuitive understanding of space is the 3-dimensional Euclidean space...

 M.

Intuitively, if each distribution is viewed as a unit amount of "dirt" piled on M, the metric is the minimum "cost" of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the distance it has to be moved. Because of this analogy, the metric is known in computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 as the earth mover's distance
Earth Mover's Distance
In computer science, the earth mover's distance is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric...

.

The name "Wasserstein/Vasershtein distance" was coined by R.L. Dobrushin in 1970, after the Russia
Russia
Russia or , officially known as both Russia and the Russian Federation , is a country in northern Eurasia. It is a federal semi-presidential republic, comprising 83 federal subjects...

n mathematician
Mathematician
A mathematician is a person whose primary area of study is the field of mathematics. Mathematicians are concerned with quantity, structure, space, and change....

 Leonid Nasonovich Vasershtein who introduced the concept in 1969. Most English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

-language publications use the German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

 spelling "Wasserstein" (attributed to the name "Vasershtein" being of Germanic
Germanic languages
The Germanic languages constitute a sub-branch of the Indo-European language family. The common ancestor of all of the languages in this branch is called Proto-Germanic , which was spoken in approximately the mid-1st millennium BC in Iron Age northern Europe...

 origin).

Definition

Let (M, d) be a metric space for which every probability measure on M is a Radon measure
Radon measure
In mathematics , a Radon measure, named after Johann Radon, is a measure on the σ-algebra of Borel sets of a Hausdorff topological space X that is locally finite and inner regular.-Motivation:...

 (a so-called Radon space
Radon space
In mathematics, a Radon space, named after Johann Radon, is a separable metric space such that every Borel probability measure on M is inner regular. Since a probability measure is globally finite, and hence a locally finite measure, every probability measure on a Radon space is also a Radon...

). For p ≥ 1, let Pp(M) denote the collection of all probability measures μ on M with finite pth moment: for some x0 in M,


Then the pth Wasserstein distance between two probability measures μ and ν in Pp(M) is defined as


where Γ(μ, ν) denotes the collection of all measures on M × M with marginals
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

 μ and ν on the first and second factors respectively. (The set Γ(μ, ν) is also called the set of all couplings of μ and ν.)

The above distance is usually denoted Wp(μ, ν) (typically among authors who prefer the "Wasserstein" spelling) or ℓp(μ, ν) (typically among authors who prefer the "Vasershtein" spelling). The remainder of this article will use the Wp notation.

The Wasserstein metric may be equivalently defined by


where E[Z] denotes the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 Z and the infimum
Infimum
In mathematics, the infimum of a subset S of some partially ordered set T is the greatest element of T that is less than or equal to all elements of S. Consequently the term greatest lower bound is also commonly used...

 is taken over all simultaneous distributions of the random variables X and Y with marginals μ and ν respectively.

Applications

The Wasserstein metric is a natural way to compare the probability distributions of two variables X and Y, where one variable is derived from the other by small, non-uniform perturbations (random or deterministic).

In computer science, for example, the metric W1 is widely used to compare discrete distributions, e.g. the color histogram
Color histogram
In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges, that span the image's color space, the set of all...

s of two digital images.

Metric structure

It can be shown that Wp satisfies all the axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be self-evident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...

s of a metric
Metric (mathematics)
In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...

 on Pp(M). Furthermore, convergence with respect to Wp is equivalent to the usual weak convergence of measures plus convergence of pth moments.

Dual representation of W1

The following dual representation of W1 is a special case of the duality theorem of Kantorovich
Leonid Kantorovich
Leonid Vitaliyevich Kantorovich was a Soviet mathematician and economist, known for his theory and development of techniques for the optimal allocation of resources...

 and Rubinstein (1958): when μ and ν have bounded
Bounded set
In mathematical analysis and related areas of mathematics, a set is called bounded, if it is, in a certain sense, of finite size. Conversely, a set which is not bounded is called unbounded...

 support
Support (measure theory)
In mathematics, the support of a measure μ on a measurable topological space is a precise notion of where in the space X the measure "lives"...

,


where Lip(f) denotes the minimal Lipschitz constant
Lipschitz continuity
In mathematical analysis, Lipschitz continuity, named after Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: for every pair of points on the graph of this function, the absolute value of the...

 for f.

Compare this with the definition of the Radon metric
Radon measure
In mathematics , a Radon measure, named after Johann Radon, is a measure on the σ-algebra of Borel sets of a Hausdorff topological space X that is locally finite and inner regular.-Motivation:...

:


If the metric d is bounded by some constant C, then


and so convergence in the Radon metric (also known as strong convergence) implies convergence in the Wasserstein metric, but not vice versa.

Separability and completeness

For any p ≥ 1, the metric space (Pp(M), Wp) is separable, and is complete
Complete space
In mathematical analysis, a metric space M is called complete if every Cauchy sequence of points in M has a limit that is also in M or, alternatively, if every Cauchy sequence in M converges in M....

if (M, d) is separable and complete.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK