In a given data table Categorical Attribute is Profit that has discrete values UP and DOWN.
P = Profit = UP = 5
Q = Profit = Down = 5
The first step is to calculate the Information Gain for this Profit Attribute
$$ Info(Profit) = I (P, Q)= - \sum_{i=1}^c P_i log_2 (P_i)$$
$$ Info (Profit) = I (P, Q) = \frac{-5}{10} log_2 \frac{5}{10} - \frac{5}{10} log_2 \frac{5}{10} = 1 bit$$
Hence, Information Gain for Attribute Profit = Info(Profit) = 1 bit
Now, Calculate the Entropy for every attribute like Age, Competition, and Type concerning Categorical Attribute Profit.
1] Entropy for attribute Age concerning Profit.
Entropy (Age, Profit) = 0.4 bits
2] Entropy for attribute Competition concerning Profit.
Entropy (Competition, Profit) = 0.8754 bits
3] Entropy for attribute Type concerning Profit.
Entropy (Type, Profit) = 1 bit
Now, calculate the Information Gain for every attribute
$$Information Gain (A) = I (P, Q) - E (A)$$
$Information Gain_{Age} = I (P, Q) - E (Age, Profit)$
= 1 - 0.4 = 0.6 bits
$Information Gain_{Competition} = I (P, Q) - E (Competition, Profit)$
= 1 - 0.8754 = 0.1246 bits
$Information Gain_{Type} = I (P, Q) - E (Type, Profit)$
= 1 - 1 = 0
General Rules to create Decision Tree:
- Select the attribute with largest Information Gain as the main Decision Node means Root Node.
- Branch with entropy 0 is considered as Leaf Node.
- Branch with entropy more than 0 needs further splitting.
- This process is done recursively on the Non-Leaf Branches until all the data is classified.
Decision Tree:
- Hence, based on these rules here Attribute Age has largest Information Gain among all the other attributes. Therefore selected as DECISION NODE or ROOT NODE.
- Age attribute contains three values Old, Mid and New. For the OLD value, all Profit values are Down. That indicates 0 Entropy for Old value. Hence, it is a Leaf Node with decisive value Down.
- Similarly, for the NEW value, all Profit values are UP. That indicates 0 Entropy for New value. Hence, it is a Leaf Node with decisive value UP.
- The Mid value contains 2 UP and 2 DOWN values for the Profit attribute. That indicates 1 Entropy for Mid value. Hence, Mid needs further splitting
- So, which attribute becomes the child node of Mid. To decide this just look at the Information Gain factor of both the attributes Information Gain for the Type attribute is 0 and Information Gain for the Competition attribute is 0.1246 bits that are greater than the type attribute.
- Hence, Competition attribute is the CHILD Node for the Mid.
- Attribute Competition contains two values Yes and No. From the given data table it is very clear that If the Competition is present means Yes then majority Profit contains DOWN and if Competition is not present means No then majority Profit contains UP.
Based on all these properties, for the given data Decision Tree look like as shown below:
Generated Rules from Decision Tree:
A decision tree can easily be converted to a group of rules by mapping from the root node to the leaf nodes one by one such as,
Rule 1 = IF (Age == New) THEN Profit = UP
Rule 2 = IF (Age == Mid) AND (Competition == Yes) THEN Profit = DOWN
Rule 3 = IF (Age == Mid) AND (Competition == No) THEN Profit = UP
Rule 4 = IF (Age == Old) THEN Profit = DOWN
These are the Decision Rules generated from the above Decision tree.