written 6.8 years ago by | modified 2.6 years ago by |
Subject: Data Mining And Business Intelligence
Topic: Clustering
Difficulty: High
written 6.8 years ago by | modified 2.6 years ago by |
Subject: Data Mining And Business Intelligence
Topic: Clustering
Difficulty: High
written 2.6 years ago by | • modified 2.6 years ago |
Let's solve the given problem using the Euclidean Distance.
$$ Euclidean\ Distance [(x,y),(a,b)] = \sqrt{ (x - a)^2 + (y - b)^2} $$
Step 1 - Plot the given dataset in graph format for better understanding.
Step 2 - Create a Distance Matrix by using the Euclidean Distance formula.
$$ Distance\ (P1,P2) = \sqrt {(0.40 - 0.22)^2 + (0.53 - 0.38)^2}$$
$$ = \sqrt {(0.18)^2 + (0.15)^2} $$
$$ = \sqrt {0.0324 + 0.0225} $$
$$ = \sqrt {0.0549} $$
$$ = 0.23 $$
Similarly, find out the distance for the (P1,P3), (P1,P4), (P1,P5),(P1,P6)
Distance (P1,P3) = 0.22
Distance (P1,P4) = 0.37
Distance (P1,P5) = 0.34
Distance (P1,P6) = 0.23
$$ Distance\ (P2,P3) = \sqrt {(0.22 - 0.35)^2 + (0.38 - 0.32)^2}$$
$$ = \sqrt {(-0.13)^2 + (0.06)^2} $$
$$ = \sqrt {0.0169 + 0.0036} $$
$$ = \sqrt {0.0205} $$
$$ = 0.15 $$
Similarly, find out the distance for the (P2,P4), (P2,P5),(P2,P6)
Distance (P2,P4) = 0.20
Distance (P2,P5) = 0.14
Distance (P2,P6) = 0.25
$$ Distance\ (P3,P4) = \sqrt {(0.35 - 0.26)^2 + (0.32 - 0.19)^2}$$
$$ = \sqrt {(0.09)^2 + (0.13)^2} $$
$$ = \sqrt {0.0081 + 0.0169} $$
$$ = \sqrt {0.025} $$
$$ = 0.15 $$
Similarly, find out the distance for the (P3,P5),(P3,P6)
Distance (P3,P5) = 0.28
Distance (P3,P6) = 0.11
$$ Distance\ (P4,P5) = \sqrt {(0.26 - 0.08)^2 + (0.19 - 0.41)^2}$$
$$ = \sqrt {(0.18)^2 + (-0.22)^2} $$
$$ = \sqrt {0.0324 + 0.0484} $$
$$ = \sqrt {0.0808} $$
$$ = 0.29 $$
Similarly, find out the distance for the (P4,P6)
Distance (P4,P6) = 0.22
$$ Distance\ (P5,P6) = \sqrt {(0.08 - 0.45)^2 + (0.41 - 0.30)^2}$$
$$ = \sqrt {(- 0.37)^2 + (0.11)^2} $$
$$ = \sqrt {0.1369 + 0.0121} $$
$$ = \sqrt {0.149} $$
$$ = 0.39 $$
Based on the above-computed values Distance Matrix can be formed as follows:
. | P1 | P2 | P3 | P4 | P5 | P6 |
---|---|---|---|---|---|---|
P1 | 0 | |||||
P2 | 0.23 | 0 | ||||
P3 | 0.22 | 0.15 | 0 | |||
P4 | 0.37 | 0.20 | 0.15 | 0 | ||
P5 | 0.34 | 0.14 | 0.28 | 0.29 | 0 | |
P6 | 0.23 | 0.25 | 0.11 | 0.22 | 0.39 | 0 |
Step 3 - Find the minimum value element from the distance matrix.
The minimum value element is (P3,P6) with a value is 0.11 - 1st Cluster (P3,P6)
Step 4 - Now, Recalculate or update the distance matrix for the cluster (P3,P6)
$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P3,P6),P1)] $$ $$ Min[dist((P3,P1),(P6,P1))]$$ $$ Min[dist(0.22,0.23)] $$ $$ = 0.22 $$
Similarly, perform for P2, P4, and P5.
Therefore, the updated Distance Matrix looks as follows:
. | P1 | P2 | P3,P6 | P4 | P5 |
---|---|---|---|---|---|
P1 | 0 | ||||
P2 | 0.23 | 0 | |||
P3,P6 | 0.22 | 0.15 | 0 | ||
P4 | 0.37 | 0.20 | 0.15 | 0 | |
P5 | 0.34 | 0.14 | 0.28 | 0.29 | 0 |
Step 5 - Repeat steps 3 and 4.
The minimum value element is (P2,P5) with a value is 0.14 - 2nd Cluster (P2,P5)
Recalculate or update the distance matrix for the cluster (P2,P5)
$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5),P1)] $$ $$ Min[dist((P2,P1),(P5,P1))]$$ $$ Min[dist(0.23,0.34)] $$ $$ = 0.23 $$
Similarly, perform for (P3,P6), and P4.
Therefore, the updated Distance Matrix looks as follows:
. | P1 | P2,P5 | P3,P6 | P4 |
---|---|---|---|---|
P1 | 0 | |||
P2,P5 | 0.23 | 0 | ||
P3,P6 | 0.22 | 0.15 | 0 | |
P4 | 0.37 | 0.20 | 0.15 | 0 |
Step 6 - Repeat steps 3 & 4.
The minimum value element is (P2,P5,P3,P6) with a value is 0.15 - 3rd Cluster (P2,P5,P3,P6)
Here 2 values are the same then the first element is chosen as the minimum value element
Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6)
$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5,P3,P6),P1)] $$ $$ Min[dist((P2,P5),P1),((P3,P6),P1))]$$ $$ Min[dist(0.23,0.22)] $$ $$ = 0.22 $$
Similarly, perform for P4.
Therefore, the updated Distance Matrix looks as follows:
. | P1 | P2,P5,P3,P6 | P4 |
---|---|---|---|
P1 | 0 | ||
P2,P5,P3,P6 | 0.22 | 0 | |
P4 | 0.37 | 0.15 | 0 |
Step 7 - Repeat steps 3 & 4.
The minimum value element is (P2,P5,P3,P6,P4) with a value is 0.15 - 4th Cluster (P2,P5,P3,P6,P4)
Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4)
$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5,P3,P6,P4),P1)] $$ $$ Min[dist((P2,P5,P3,P6),P1),(P4,P1)]$$ $$ Min[dist(0.22,0.37)] $$ $$ = 0.22 $$
Therefore, the updated Distance Matrix looks as follows:
. | P1 | P2,P5,P3,P6,P4 |
---|---|---|
P1 | 0 | |
P2,P5,P3,P6,P4 | 0.22 | 0 |
Step 8 - Repeat the step 3 & 4.
The minimum value element is (P2,P5,P3,P6,P4,P1) with a value is 0.22 - 5th Cluster (P2,P5,P3,P6,P4,P1)
Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4,P1)
In this step, only 1 value is remaining so it is by default cluster.
Step 9 - Finally, draw the Dendrogram.