In a given data table Class Label is Outcome that has discrete values Nothing and Responded.
P = Outcome = Nothing = 5
Q = Outcome = Responded = 9
The first step is to calculate the Information Gain for this Outcome Attribute.
Hence, Information Gain for Attribute Outcome = Info(Outcome) = 0.94028 bits.
Now, Calculate the Entropy for every attribute like District, House Type, Income, and Previous Customers concerning class label attribute Outcome.
1] Entropy for attribute District concerning Outcome.
Entropy (Outcome, District) = 0.6935 bits
2] Entropy for attribute House Type concerning Outcome.
Entropy (Outcome, House Type) = 0.89028 bits
3] Entropy for attribute Income concerning Outcome.
Entropy (Outcome, Income) = 0.7884 bits
4] Entropy for attribute Previous Customer concerning Outcome.
Entropy (Outcome, Previous Customer) = 0.8921 bits
Now, calculate the Information Gain for every attribute
$$Information Gain (A) = I (P, Q) - E (A)$$
$Information Gain_{District} = I (P, Q) - E (Outcome, District)$
= 0.94028 - 0.6935 = 0.24678 bits
$Information Gain_{House Type} = I (P, Q) - E (Outcome, House Type)$
= 0.94028 - 0.89028 = 0.05 bits
$Information Gain_{Income} = I (P, Q) - E (Outcome, Income)$
= 0.94028 - 0.7884 = 0.15188
$Information Gain_{Previous Customer} = I (P, Q) - E (Outcome, Previous Customer)$
= 0.94028 - 0.8921 = 0.04818
General Rules to create Decision Tree:
- Select the attribute with largest Information Gain as the main Decision Node means Root Node.
- Branch with entropy 0 is considered as Leaf Node.
- Branch with entropy more than 0 needs further splitting.
- This process is done recursively on the Non-Leaf Branches until all the data is classified.
Decision Tree:
- Hence, based on these rules here District has largest Information Gain among all the other attributes. Therefore selected as DECISION NODE or ROOT NODE.
- The District attribute contains three values Suburban, Rural, and Urban.
- For the Rural value, all Outcome values are Responded. That indicates 0 Entropy for Rural value. Hence, it is a Leaf Node with decisive value Responded.
- The Suburban and Urban values do not have entropy 0, their entropies are more than 0therefore, requires further splitting.
So, which attribute becomes the child node for Suburban and Urban repeats the above-mentioned procedure for both.
Let's see for the Suburban,
- Entropy(Suburban) = 0.9709 bits
- Information Gain(Suburban, House Type) = 0.571 bits
- Information Gain(Suburban, Income) = 0.971 bits
- Information Gain(Suburban, Previous Customer) = 0.02 bits
So, here information gain of Income attribute is more hence selected as child node or decisive node for the Suburban.
Let's see for the Urban,
- Entropy(Urban) = 0.9709 bits
- Information Gain(Urban, House Type) = 0.02 bits
- Information Gain(Urban, Income) = 0.02 bits
- Information Gain(Urban, Previous Customer) = 0.971 bits
So, here information gain of Previous Customer attribute is more hence selected as child node or decisive node for the Urban.
- Now, we found Income and Previous Customer as decision nodes.
- Here, we reach the targeted classification class label values so there is no longer a need to split the decision tree based on the attributes.
- Attribute Income contains more values of Responded when income is LOW therefore when income is HIGH then the value is Nothing.
- Attribute Previous Customer contains more values of Responded when value is NO therefore when previous customer value is YES then the value is Nothing.
Based on all these properties, for the given data Decision Tree look like as shown below:
Generated Rules from Decision Tree:
A decision tree can easily be converted to a group of rules by mapping from the root node to the leaf nodes one by one such as,
Rule 1 = IF (District == Rural) THEN Outcome = Responded
Rule 2 = IF (District == Suburban) AND (Income == High) THEN Outcome = Nothing
Rule 3 = IF (District == Suburban) AND (Income == Low) THEN Outcome = Responded
Rule 4 = IF (District == Urban) AND (Previous Customer == No) THEN Outcome = Responded
Rule 5 = IF (District == Urban) AND (Previous Customer == Yes) THEN Outcome = Nothing
These are the Decision Rules generated from the above Decision tree.
Hence, Constructed a Decision Tree Classifier for the given sample dataset.
Now find out the Class label for the new sample example where the District is Rural, House-type is Semi-detached, Income is low and Previous Customer is no.
From the Decision Tree Rule 1 generate based on the above decision tree classifier which clearly states that when the value of the District attribute is Rural then Outcome always Responded.
Therefore, a New Sample example (Rural, Semidetached, Low, No) will be classified with Class Label Responded.