Margin classifier - AbsoluteAstronomy.com

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, a margin classifer is a classifier

Classifier

Classifier may refer to:*Classifier *Classifier *Classifier *Hierarchical classifier*Linear classifier...

which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier

Linear classifier

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics...

(e.g. perceptron

Perceptron

The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...

or linear discriminant analysis

Linear discriminant analysis

Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...

) is used, the distance (typically euclidean distance

Euclidean distance

In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...

, though others may be used) of an example from the separating hyperplane is the margin of that example.

The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the generalization error

Generalization error

The generalization error of a machine learning model is a function that measures how far the student machine is from the teacher machine in average over the entire set of possible data that can be generated by the teacher after each iteration of the learning process...

of the classifier. These bounds are frequently shown using the VC dimension

VC dimension

In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...

. Of particular prominence is the generalization error bound on boosting

Boosting

Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...

algorithms and support vector machine

Support vector machine

A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

Support vector machine definition of margin

See support vector machine

Support vector machine

s and maximum-margin hyperplane

Maximum-margin hyperplane

In geometry, a maximum-margin hyperplane is a hyperplane which separates two 'clouds' of points and is at equal distance from the two. The margin between the hyperplane and the clouds is maximal. See the article on Support Vector Machines for more details....

for details.

Boosting definition of margin

The margin for an iterative boosting

Boosting

algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair

where

is a domain space and

is the label of the example. The iterative boosting algorithm then selects a classifier

at each iteration

where

is a space of possible classifiers that predict real values. This hypothesis is then weighted by

as selected by the boosting algorithm. At iteration

, The margin of an example

can thus be defined as

By this definition, the margin is positive if the example is labeled correctly and negative is the example is labeled incorrectly.

This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing (see the Schapire et al. paper for details ).

Examples of margin-based algorithms

Many classifiers can give an associated margin for each example. However, only some classifiers utilize information of the margin while learning from a data set.

Many boosting

Boosting

algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in AdaBoost

AdaBoost

AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent...

, LogitBoost

LogitBoost

LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistics framework...

, and all members of the AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g. BrownBoost

BrownBoost

BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...

), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that provably maximize the minimum margin (e.g. see ).

Support vector machine

Support vector machine

s provably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the support vector machine

Support vector machine

article.

The voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic perceptron

Perceptron

algorithm.

Generalization error bounds

One theoretical motivation behind margin classifiers is that their generalization error

Generalization error

may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm. Let

be a set of

examples sampled independently at random from a distribution

. Assume the VC-dimension of the underlying base classifier is

and

. Then with probability

we have the bound

for all

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.