The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.

District	House Type	Income	Previous Customers	Outcome
Suburban	Detached	High	No	Nothing
Suburban	Detached	High	Yes	Nothing
Rural	Detached	High	No	Responded
Urban	Semi - Detached	High	No	Responded
Urban	Semi - Detached	Low	No	Responded
Urban	Semi - Detached	Low	Yes	Nothing
Rural	Semi - Detached	Low	Yes	Responded
Suburban	Terrace	High	No	Nothing
Suburban	Semi - Detached	Low	No	Responded
Urban	Terrace	Low	No	Responded
Suburban	Terrace	Low	Yes	Responded
Rural	Terrace	High	Yes	Responded
Rural	Detached	Low	No	Responded
Urban	Terrace	High	Yes	Nothing

69views

written 2.9 years ago by

binitamayekar ★ 6.7k

• modified 2.9 years ago

In a given data table Class Label is Outcome that has discrete values Nothing and Responded.

P = Outcome = Nothing = 5

Q = Outcome = Responded = 9

The first step is to calculate the Information Gain for this Outcome Attribute.

Outcom

Hence, Information Gain for Attribute Outcome = Info(Outcome) = 0.94028 bits.

Now, Calculate the Entropy for every attribute like District, House Type, Income, and Previous Customers concerning class label attribute Outcome.

1] Entropy for attribute District concerning Outcome.

District

Entropy (Outcome, District) = 0.6935 bits

2] Entropy for attribute House Type concerning Outcome.

House Type

Entropy (Outcome, House Type) = 0.89028 bits

3] Entropy for attribute Income concerning Outcome.

Income

Entropy (Outcome, Income) = 0.7884 bits

4] Entropy for attribute Previous Customer concerning Outcome.

Previous Customer

Entropy (Outcome, Previous Customer) = 0.8921 bits

Now, calculate the Information Gain for every attribute

$$Information Gain (A) = I (P, Q) - E (A)$$

$Information Gain_{District} = I (P, Q) - E (Outcome, District)$

= 0.94028 - 0.6935 = 0.24678 bits

$Information Gain_{House Type} = I (P, Q) - E (Outcome, House Type)$

= 0.94028 - 0.89028 = 0.05 bits

$Information Gain_{Income} = I (P, Q) - E (Outcome, Income)$

= 0.94028 - 0.7884 = 0.15188

$Information Gain_{Previous Customer} = I (P, Q) - E (Outcome, Previous Customer)$

= 0.94028 - 0.8921 = 0.04818

General Rules to create Decision Tree:

Select the attribute with largest Information Gain as the main Decision Node means Root Node.
Branch with entropy 0 is considered as Leaf Node.
Branch with entropy more than 0 needs further splitting.
This process is done recursively on the Non-Leaf Branches until all the data is classified.

Decision Tree:

Hence, based on these rules here District has largest Information Gain among all the other attributes. Therefore selected as DECISION NODE or ROOT NODE.
The District attribute contains three values Suburban, Rural, and Urban.
For the Rural value, all Outcome values are Responded. That indicates 0 Entropy for Rural value. Hence, it is a Leaf Node with decisive value Responded.
The Suburban and Urban values do not have entropy 0, their entropies are more than 0therefore, requires further splitting.
So, which attribute becomes the child node for Suburban and Urban repeats the above-mentioned procedure for both.
Let's see for the Suburban,
- Entropy(Suburban) = 0.9709 bits
- Information Gain(Suburban, House Type) = 0.571 bits
- Information Gain(Suburban, Income) = 0.971 bits
- Information Gain(Suburban, Previous Customer) = 0.02 bits
So, here information gain of Income attribute is more hence selected as child node or decisive node for the Suburban.
Let's see for the Urban,
- Entropy(Urban) = 0.9709 bits
- Information Gain(Urban, House Type) = 0.02 bits
- Information Gain(Urban, Income) = 0.02 bits
- Information Gain(Urban, Previous Customer) = 0.971 bits
So, here information gain of Previous Customer attribute is more hence selected as child node or decisive node for the Urban.
Now, we found Income and Previous Customer as decision nodes.
Here, we reach the targeted classification class label values so there is no longer a need to split the decision tree based on the attributes.
Attribute Income contains more values of Responded when income is LOW therefore when income is HIGH then the value is Nothing.
Attribute Previous Customer contains more values of Responded when value is NO therefore when previous customer value is YES then the value is Nothing.

Based on all these properties, for the given data Decision Tree look like as shown below:

Decision Tree

Generated Rules from Decision Tree:

A decision tree can easily be converted to a group of rules by mapping from the root node to the leaf nodes one by one such as,

Rule 1 = IF (District == Rural) THEN Outcome = Responded

Rule 2 = IF (District == Suburban) AND (Income == High) THEN Outcome = Nothing

Rule 3 = IF (District == Suburban) AND (Income == Low) THEN Outcome = Responded

Rule 4 = IF (District == Urban) AND (Previous Customer == No) THEN Outcome = Responded

Rule 5 = IF (District == Urban) AND (Previous Customer == Yes) THEN Outcome = Nothing

These are the Decision Rules generated from the above Decision tree.

Hence, Constructed a Decision Tree Classifier for the given sample dataset.

Now find out the Class label for the new sample example where the District is Rural, House-type is Semi-detached, Income is low and Previous Customer is no.

From the Decision Tree Rule 1 generate based on the above decision tree classifier which clearly states that when the value of the District attribute is Rural then Outcome always Responded.

Therefore, a New Sample example (Rural, Semidetached, Low, No) will be classified with Class Label Responded.

ADD COMMENT EDIT