ID3 algorithm
Encyclopedia
In decision tree learning
Decision tree learning
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...

, ID3 (Iterative Dichotomiser 3) is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 used to generate a decision tree
Decision tree learning
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...

 invented by Ross Quinlan
Ross Quinlan
John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms...

. ID3 is the precursor to the C4.5 algorithm
C4.5 algorithm
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.-Algorithm:C4.5...

.

Algorithm

The ID3 algorithm can be summarized as follows:
  1. Take all unused attributes and count their entropy concerning test samples
  2. Choose attribute for which entropy is minimum (or, equivalently, information gain is maximum)
  3. Make node containing that attribute


The algorithm is as follows:

ID3 (Examples, Target_Attribute, Attributes)
  • Create a root node for the tree
  • If all examples are positive, Return the single-node tree Root, with label = +.
  • If all examples are negative, Return the single-node tree Root, with label = -.
  • If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples.
  • Otherwise Begin
    • A = The Attribute that best classifies examples.
    • Decision Tree attribute for Root = A.
    • For each possible value, , of A,
      • Add a new tree branch below Root, corresponding to the test A = .
      • Let Examples() be the subset of examples that have the value for A
      • If Examples() is empty
        • Then below this new branch add a leaf node with label = most common target value in the examples
      • Else below this new branch add the subtree ID3 (Examples(), Target_Attribute, Attributes – {A})
  • End
  • Return Root

The ID3 metrics

The algorithm is based on Occam's razor
Occam's razor
Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...

: it prefers smaller decision trees (simpler theories) over larger ones. However, it does not always produce the smallest tree, and is therefore a heuristic
Heuristic
Heuristic refers to experience-based techniques for problem solving, learning, and discovery. Heuristic methods are used to speed up the process of finding a satisfactory solution, where an exhaustive search is impractical...

. Occam's razor is formalized using the concept of information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

:

Entropy



Where :
  • is the information entropy
    Information entropy
    In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

     of the set ;
  • is the number of different values of the attribute in (entropy is computed for one chosen attribute)
  • is the frequency (proportion) of the value in the set
  • is the binary logarithm
    Binary logarithm
    In mathematics, the binary logarithm is the logarithm to the base 2. It is the inverse function of n ↦ 2n. The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2,...



An entropy of 0 identifies a perfectly classified set.

Entropy is used to determine which node to split next in the algorithm. The higher the entropy, the higher the potential to improve the classification here.

Gain

Gain is computed to estimate the gain produced by a split over an attribute :

Where :
  • is the gain of the set after a split over the attribute
  • is the information entropy
    Information entropy
    In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

    of the set
  • is the number of different values of the attribute in
  • is the frequency (proportion) of the items possessing as value for in
  • is possible value of
  • is a subset of containing all items where the value of is


Gain quantifies the entropy improvement by splitting over an attribute: higher is better.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK