Uniform convergence (combinatorics) - AbsoluteAstronomy.com

For a class of predicates

defined on a set

and a set of samples

, where

, the empirical frequency of

. The Uniform Convergence Theorem states, roughly,that if

is "simple" and we draw samples independently (with replacement) from

according to a distribution

, then with high probability all the empirical frequency will be close to its expectation, where the expectation is given by

. Here "simple" means that the Vapnik-Chernovenkis dimension of the class

is small relative to the size of the sample.

In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.

Uniform convergence theorem statement

is a set of

-valued functions defined on a set

and

is a probability distribution on

then for

and

a positive integer, we have,

for some

where, for any

and

indicates that the probability is taken over

consisting of

i.i.d. draws from the distribution

is defined as: For any

-valued functions

over

and

And for any natural number

the shattering number

is defined as.

From the point of Learning Theory one can consider

to be the Concept/Hypothesis

Concept class

A concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept class is a subject of computational learning theory....

class defined over the instance set

. Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.

Sauer's lemma:

Sauer's Lemma relates the shattering number

to the VC Dimension.

Lemma:

, where

is the VC Dimension

VC dimension

In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...

of the concept class

.

Corollary:

Proof of uniform convergence theorem

Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.

Symmetrization: We transform the problem of analyzing into the problem of analyzing , where and are i.i.d samples of size drawn according to the distribution . One can view as the original randomly drawn sample of length , while may be thought as the testing sample which is used to estimate .
Permutation: Since and are picked identically and independently, so swapping elements between them will not change the probability distribution on and . So, we will try to bound the probability of for some by considering the effect of a specific collection of permutations of the joint sample . Specifically, we consider permutations which swap and in some subset of . The symbol means the concatenation of and .
Reduction to a finite class: We can now restrict the function class to a fixed joint sample and hence, if has finite VC Dimension, it reduces to the problem to one involving a finite function class.

We present the technical details of the proof.

Symmetrization

Lemma: Let

for some

and

for some

Then for

.

Proof:
By the triangle inequality,

if

and

then

.

Therefore,

and

[since

and

are independent].

Now for

fix an

such that

. For this

, we shall show that

Thus for any

and hence

. And hence we perform the first step of our high level idea.

Notice,

is a binomial random variable with expectation

and variance

. By Chebyshev's inequality

Chebyshev's inequality

In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...

we get,

for the mentioned bound on

. Here we use the fact that

for

Permutations

Let

be the set of all permutations of

that swaps

and

in some subset of

.

Lemma: Let

be any subset of

and

any probability distribution on

. Then,

where the expectation is over

chosen according to

, and the probability is over

chosen uniformly from

.

Proof:
For any

[since coordinate permutations preserve the product distribution

[Because

is finite]

[The expectation]

The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.

Reduction to a finite class

Lemma: Basing on the previous lemma,

Proof:
Let us define

and

which is atmost

. This means there are functions

such that for any

between

and

with

for

.

We see that

iff for some

satisfies,

.
Hence if we define

and

otherwise.

For

and

, we have that

iff for some

satisfies

. By union bound we get,

Since, the distribution over the permutations

is uniform for each

, so

equals

, with equal probability.

Thus,

where the probability on the right is over

and both the possibilities are equally likely. By Hoeffding's inequality

Hoeffding's inequality

In probability theory, Hoeffding's inequality provides an upper bound on the probability for the sum of random variables to deviate from its expected value. Hoeffding's inequality was proved by Wassily Hoeffding.LetX_1, \dots, X_n \!...

, this is at most

.

Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.