0
2.3kviews
The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.
District House Type Income Previous Customers Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Responded
Urban Semi - Detached High No Responded
Urban Semi - Detached Low No Responded
Urban Semi - Detached Low Yes Nothing
Rural Semi - Detached Low Yes Responded
Suburban Terrace High No Nothing
Suburban Semi - Detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Yes Responded
Rural Terrace High Yes Responded
Rural Detached Low No Responded
Urban Terrace High Yes Nothing

Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?

1 Answer
0
69views

In a given data table Class Label is Outcome that has discrete values Nothing and Responded.

P = Outcome = Nothing = 5

Q = Outcome = Responded = 9

The first step is to calculate the Information Gain for this Outcome Attribute.

Outcom

Hence, Information Gain for Attribute Outcome = Info(Outcome) = 0.94028 bits.


Now, Calculate the Entropy for every attribute like District, House Type, Income, and Previous Customers concerning class label attribute Outcome.

1] Entropy for attribute District concerning Outcome.

District

Entropy (Outcome, District) = 0.6935 bits

2] Entropy for attribute House Type concerning Outcome.

House Type

Entropy (Outcome, House Type) = 0.89028 bits

3] Entropy for attribute Income concerning Outcome.

Income

Entropy (Outcome, Income) = 0.7884 bits

4] Entropy for attribute Previous Customer concerning Outcome.

Previous Customer

Entropy (Outcome, Previous Customer) = 0.8921 bits


Now, calculate the Information Gain for every attribute

$$Information Gain (A) = I (P, Q) - E (A)$$

$Information Gain_{District} = I (P, Q) - E (Outcome, District)$

= 0.94028 - 0.6935 = 0.24678 bits

$Information Gain_{House Type} = I (P, Q) - E (Outcome, House Type)$

= 0.94028 - 0.89028 = 0.05 bits

$Information Gain_{Income} = I (P, Q) - E (Outcome, Income)$

= 0.94028 - 0.7884 = 0.15188

$Information Gain_{Previous Customer} = I (P, Q) - E (Outcome, Previous Customer)$

= 0.94028 - 0.8921 = 0.04818


General Rules to create Decision Tree:

  • Select the attribute with largest Information Gain as the main Decision Node means Root Node.
  • Branch with entropy 0 is considered as Leaf Node.
  • Branch with entropy more than 0 needs further splitting.
  • This process is done recursively on the Non-Leaf Branches until all the data is classified.

Decision Tree:

  • Hence, based on these rules here District has largest Information Gain among all the other attributes. Therefore selected as DECISION NODE or ROOT NODE.
  • The District attribute contains three values Suburban, Rural, and Urban.
  • For the Rural value, all Outcome values are Responded. That indicates 0 Entropy for Rural value. Hence, it is a Leaf Node with decisive value Responded.
  • The Suburban and Urban values do not have entropy 0, their entropies are more than 0therefore, requires further splitting.
  • So, which attribute becomes the child node for Suburban and Urban repeats the above-mentioned procedure for both.

  • Let's see for the Suburban,

    • Entropy(Suburban) = 0.9709 bits
    • Information Gain(Suburban, House Type) = 0.571 bits
    • Information Gain(Suburban, Income) = 0.971 bits
    • Information Gain(Suburban, Previous Customer) = 0.02 bits
  • So, here information gain of Income attribute is more hence selected as child node or decisive node for the Suburban.

  • Let's see for the Urban,

    • Entropy(Urban) = 0.9709 bits
    • Information Gain(Urban, House Type) = 0.02 bits
    • Information Gain(Urban, Income) = 0.02 bits
    • Information Gain(Urban, Previous Customer) = 0.971 bits
  • So, here information gain of Previous Customer attribute is more hence selected as child node or decisive node for the Urban.

  • Now, we found Income and Previous Customer as decision nodes.
  • Here, we reach the targeted classification class label values so there is no longer a need to split the decision tree based on the attributes.
  • Attribute Income contains more values of Responded when income is LOW therefore when income is HIGH then the value is Nothing.
  • Attribute Previous Customer contains more values of Responded when value is NO therefore when previous customer value is YES then the value is Nothing.

Based on all these properties, for the given data Decision Tree look like as shown below:

Decision Tree


Generated Rules from Decision Tree:

A decision tree can easily be converted to a group of rules by mapping from the root node to the leaf nodes one by one such as,

Rule 1 = IF (District == Rural) THEN Outcome = Responded

Rule 2 = IF (District == Suburban) AND (Income == High) THEN Outcome = Nothing

Rule 3 = IF (District == Suburban) AND (Income == Low) THEN Outcome = Responded

Rule 4 = IF (District == Urban) AND (Previous Customer == No) THEN Outcome = Responded

Rule 5 = IF (District == Urban) AND (Previous Customer == Yes) THEN Outcome = Nothing

These are the Decision Rules generated from the above Decision tree.


Hence, Constructed a Decision Tree Classifier for the given sample dataset.

Now find out the Class label for the new sample example where the District is Rural, House-type is Semi-detached, Income is low and Previous Customer is no.

From the Decision Tree Rule 1 generate based on the above decision tree classifier which clearly states that when the value of the District attribute is Rural then Outcome always Responded.

Therefore, a New Sample example (Rural, Semidetached, Low, No) will be classified with Class Label Responded.

Please log in to add an answer.