0
3.3kviews
Assume that the database D is given by the table below. Follow single link technique to find clusters in D. Use Euclidean distance measure.

Subject: Data Mining And Business Intelligence

Topic: Clustering

Difficulty: High


enter image description here

1 Answer
1
251views

Single Linkage Technique

  • Single linkage technique comes under Hierarchical Agglomerative Clustering (HAC).
  • It follows bottom-up approaches.
  • This technique starts with every single data point as a single cluster.
  • Then similar clusters are successively merged until all clusters have merged into one cluster and the result is represented in Dendrogram.

Let's solve the given problem using the Euclidean Distance.

$$ Euclidean\ Distance [(x,y),(a,b)] = \sqrt{ (x - a)^2 + (y - b)^2} $$


Step 1 - Plot the given dataset in graph format for better understanding.

Scatter Plot Graph


Step 2 - Create a Distance Matrix by using the Euclidean Distance formula.

$$ Distance\ (P1,P2) = \sqrt {(0.40 - 0.22)^2 + (0.53 - 0.38)^2}$$

$$ = \sqrt {(0.18)^2 + (0.15)^2} $$

$$ = \sqrt {0.0324 + 0.0225} $$

$$ = \sqrt {0.0549} $$

$$ = 0.23 $$

Similarly, find out the distance for the (P1,P3), (P1,P4), (P1,P5),(P1,P6)

Distance (P1,P3) = 0.22

Distance (P1,P4) = 0.37

Distance (P1,P5) = 0.34

Distance (P1,P6) = 0.23

$$ Distance\ (P2,P3) = \sqrt {(0.22 - 0.35)^2 + (0.38 - 0.32)^2}$$

$$ = \sqrt {(-0.13)^2 + (0.06)^2} $$

$$ = \sqrt {0.0169 + 0.0036} $$

$$ = \sqrt {0.0205} $$

$$ = 0.15 $$

Similarly, find out the distance for the (P2,P4), (P2,P5),(P2,P6)

Distance (P2,P4) = 0.20

Distance (P2,P5) = 0.14

Distance (P2,P6) = 0.25

$$ Distance\ (P3,P4) = \sqrt {(0.35 - 0.26)^2 + (0.32 - 0.19)^2}$$

$$ = \sqrt {(0.09)^2 + (0.13)^2} $$

$$ = \sqrt {0.0081 + 0.0169} $$

$$ = \sqrt {0.025} $$

$$ = 0.15 $$

Similarly, find out the distance for the (P3,P5),(P3,P6)

Distance (P3,P5) = 0.28

Distance (P3,P6) = 0.11

$$ Distance\ (P4,P5) = \sqrt {(0.26 - 0.08)^2 + (0.19 - 0.41)^2}$$

$$ = \sqrt {(0.18)^2 + (-0.22)^2} $$

$$ = \sqrt {0.0324 + 0.0484} $$

$$ = \sqrt {0.0808} $$

$$ = 0.29 $$

Similarly, find out the distance for the (P4,P6)

Distance (P4,P6) = 0.22

$$ Distance\ (P5,P6) = \sqrt {(0.08 - 0.45)^2 + (0.41 - 0.30)^2}$$

$$ = \sqrt {(- 0.37)^2 + (0.11)^2} $$

$$ = \sqrt {0.1369 + 0.0121} $$

$$ = \sqrt {0.149} $$

$$ = 0.39 $$

Based on the above-computed values Distance Matrix can be formed as follows:

. P1 P2 P3 P4 P5 P6
P1 0
P2 0.23 0
P3 0.22 0.15 0
P4 0.37 0.20 0.15 0
P5 0.34 0.14 0.28 0.29 0
P6 0.23 0.25 0.11 0.22 0.39 0

Step 3 - Find the minimum value element from the distance matrix.

The minimum value element is (P3,P6) with a value is 0.11 - 1st Cluster (P3,P6)


Step 4 - Now, Recalculate or update the distance matrix for the cluster (P3,P6)

$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P3,P6),P1)] $$ $$ Min[dist((P3,P1),(P6,P1))]$$ $$ Min[dist(0.22,0.23)] $$ $$ = 0.22 $$

Similarly, perform for P2, P4, and P5.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2 P3,P6 P4 P5
P1 0
P2 0.23 0
P3,P6 0.22 0.15 0
P4 0.37 0.20 0.15 0
P5 0.34 0.14 0.28 0.29 0

Step 5 - Repeat steps 3 and 4.

The minimum value element is (P2,P5) with a value is 0.14 - 2nd Cluster (P2,P5)

Recalculate or update the distance matrix for the cluster (P2,P5)

$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5),P1)] $$ $$ Min[dist((P2,P1),(P5,P1))]$$ $$ Min[dist(0.23,0.34)] $$ $$ = 0.23 $$

Similarly, perform for (P3,P6), and P4.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5 P3,P6 P4
P1 0
P2,P5 0.23 0
P3,P6 0.22 0.15 0
P4 0.37 0.20 0.15 0

Step 6 - Repeat steps 3 & 4.

The minimum value element is (P2,P5,P3,P6) with a value is 0.15 - 3rd Cluster (P2,P5,P3,P6)

Here 2 values are the same then the first element is chosen as the minimum value element

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6)

$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5,P3,P6),P1)] $$ $$ Min[dist((P2,P5),P1),((P3,P6),P1))]$$ $$ Min[dist(0.23,0.22)] $$ $$ = 0.22 $$

Similarly, perform for P4.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5,P3,P6 P4
P1 0
P2,P5,P3,P6 0.22 0
P4 0.37 0.15 0

Step 7 - Repeat steps 3 & 4.

The minimum value element is (P2,P5,P3,P6,P4) with a value is 0.15 - 4th Cluster (P2,P5,P3,P6,P4)

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4)

$$ Min[dist(point 1, point 2)]$$ $$ Min[dist((P2,P5,P3,P6,P4),P1)] $$ $$ Min[dist((P2,P5,P3,P6),P1),(P4,P1)]$$ $$ Min[dist(0.22,0.37)] $$ $$ = 0.22 $$

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5,P3,P6,P4
P1 0
P2,P5,P3,P6,P4 0.22 0

Step 8 - Repeat the step 3 & 4.

The minimum value element is (P2,P5,P3,P6,P4,P1) with a value is 0.22 - 5th Cluster (P2,P5,P3,P6,P4,P1)

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4,P1)

In this step, only 1 value is remaining so it is by default cluster.


Step 9 - Finally, draw the Dendrogram.

Dendrogram

Please log in to add an answer.