0
4.0kviews
Use K-means to cluster the following data set into 3 clusters.

Subject: Data Mining And Business Intelligence

Topic: Clustering

Difficulty: High


enter image description here

1 Answer
1
139views

K-means Clustering Problem

The Given Dataset = {(20, 9), (21, 9), (15, 7), (22, 17), (20, 8), (25, 12), (26, 14), (18, 9)}

Number of Clusters = K = 3


Iteration - 1

Step 1 -

  • Randomly select any 3 data points as cluster centers because the number of clusters is 3.
  • Select cluster centers in such a way that they are as farther as possible from each other.
  • So here we choose 3 random initial cluster centers as

$$C1 = (15, 7), C2 = (22, 17), C3 = (25, 12)$$

Step 2 -

  • Calculate the distance between each data point and each cluster center.
  • The distance may be calculated either by using the Distance Function or by using the Euclidean distance formula.
  • Here, we calculate the distance by using the Distance Function between two points a = (x1, y1) and b = (x2, y2) as follows:

$$P(a,b)=|x2–x1|+|y2–y1|$$

Step 3 -

  • Calculate the distance between each data point and each cluster center.
  • A data point is assigned to that cluster whose center is nearest to that data point.
Given Points Distance from center (15, 7) of Cluster - 1 Distance from center (22, 17) of Cluster - 2 Distance from center (25, 12) of Cluster - 3 Point belongs to Cluster
(20, 9) |20 - 15| + |9 - 7| = 7 |22 - 20| + |17 - 9| = 10 |25 - 20| + |12 - 9| = 8 C1
(21, 9) |21 - 15| + |9 - 7| = 8 |22 - 21| + |17 - 9| = 9 |25 - 21| + |12 - 9| = 7 C3
(15, 7) |15 - 15| + |7 - 7| = 0 |22 - 15| + |17 - 7| = 17 |25 - 15| + |12 - 7| = 15 C1
(22, 17) |22 - 15| + |17 - 7| = 17 |22 - 22| + |17 - 17| = 0 |25 - 22| + |17 - 12| = 8 C2
(20, 8) |20 - 15| + |8 - 7| = 6 |22 - 20| + |17 - 8| = 11 |25 - 20| + |12 - 8| = 9 C1
(25, 12) |25 - 15| + |12 - 7| = 15 |25 - 22| + |17 - 12| = 8 |25 - 25| + |12 - 12| = 0 C3
(26, 14) |26 - 15| + |14 - 7| = 18 |26 - 22| + |17 - 14| = 7 |26 - 15| + |14 - 12| = 3 C3
(18, 9) |18 - 15| + |9 - 7| = 5 |22 - 18| + |17 - 9| = 12 |25 - 18| + |12 - 9| = 10 C1
  • Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(21,9), (25, 12), (26, 14)\} $$

Step 4 -

  • Re-compute the center of newly formed clusters.
  • The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

1] Center for Cluster-1 (K1) -

$$ X = \frac {(20 + 15 + 20 + 18)}{4} = 18.25 $$

$$ Y = \frac {(9 + 7 + 8 + 9)}{4} = 8.25 $$

Therefore, new Clsuter Center C1 = (18.25, 8.25)

2] Center for Cluster-2 (K2) -

Cluster-2 (K2) has only 1 data point.

Therefore, new Clsuter Center C2 = (22, 17)

3] Center for Cluster-3 (K3) -

$$ X = \frac {(21 + 25 + 26)}{3} = 24 $$

$$ Y = \frac {(9 + 12 + 14)}{3} = 11.67 $$

Therefore, new Clsuter Center C3 = (24, 11.67)

This is the completion of Iteration 1.


Iteration - 2

Again Repeat steps 3 and 4 same as performed in Iteration - 1.

Therefore,

Given Points Distance from center (18.25, 8.25) of Cluster - 1 Distance from center (22, 17) of Cluster - 2 Distance from center (24, 11.67) of Cluster - 3 Point belongs to Cluster
(20, 9) |20 - 18.25| + |9 - 8.25| = 2.5 |22 - 20| + |17 - 9| = 10 |24 - 20| + |11.67 - 9| = 6.67 C1
(21, 9) |21 - 18.25| + |9 - 8.25| = 3.5 |22 - 21| + |17 - 9| = 9 |24 - 21| + |11.67 - 9| = 5.67 C1
(15, 7) |18.25 - 15| + |8.25 - 7| = 4.5 |22 - 15| + |17 - 7| = 17 |24 - 15| + |11.67 - 7| = 13.67 C1
(22, 17) |22 - 18.25| + |17 - 8.25| = 12.5 |22 - 22| + |17 - 17| = 0 |24 - 22| + |17 - 11.67| = 7.33 C2
(20, 8) |20 - 18.25| + |8.25 - 8| = 2 |22 - 20| + |17 - 8| = 11 |24 - 20| + |11.67 - 8| = 7.67 C1
(25, 12) |25 - 18.25| + |12 - 8.25| = 10.5 |25 - 22| + |17 - 12| = 8 |25 - 24| + |12 - 11.67| = 1.33 C3
(26, 14) |26 - 18.25| + |14 - 8.25| = 13.5 |26 - 22| + |17 - 14| = 7 |26 - 24| + |14 - 11.67| = 4.44 C3
(18, 9) |18.25 - 18| + |9 - 8.25| = 1 |22 - 18| + |17 - 9| = 12 |24 - 18| + |11.67 - 9| = 8.67 C1
  • Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(25, 12), (26, 14)\} $$

  • Re-compute the center of newly formed clusters.
  • The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

1] Center for Cluster-1 (K1) -

$$ X = \frac {(20 + 15 + 20 + 18 + 21)}{5} = 18.8 $$

$$ Y = \frac {(9 + 7 + 8 + 9 + 9)}{5} = 8.4 $$

Therefore, new Clsuter Center C1 = (18.8, 8.4)

2] Center for Cluster-2 (K2) -

Cluster-2 (K2) has only 1 data point.

Therefore, new Clsuter Center C2 = (22, 17)

3] Center for Cluster-3 (K3) -

$$ X = \frac {(25 + 26)}{2} = 25.5 $$

$$ Y = \frac {(12 + 14)}{2} = 13 $$

Therefore, new Clsuter Center C3 = (25.5, 13)

This is the completion of Iteration 2.


Iteration - 3

Again Repeat steps 3 and 4 same as performed in Iteration - 1.

Therefore,

Given Points Distance from center (18.8, 8.4) of Cluster - 1 Distance from center (22, 17) of Cluster - 2 Distance from center (25.5, 13) of Cluster - 3 Point belongs to Cluster
(20, 9) |20 - 18.8| + |9 - 8.4| = 1.8 |22 - 20| + |17 - 9| = 10 |25.5 - 20| + |13 - 9| = 9.5 C1
(21, 9) |21 - 18.8| + |9 - 8.4| = 2.8 |22 - 21| + |17 - 9| = 9 |25.5 - 21| + |13 - 9| = 8.5 C1
(15, 7) |18.8 - 15| + |8.4 - 7| = 5.2 |22 - 15| + |17 - 7| = 17 |25.5 - 15| + |13 - 7| = 16.5 C1
(22, 17) |22 - 18.8| + |17 - 8.4| = 11.8 |22 - 22| + |17 - 17| = 0 |25.5 - 22| + |17 - 13| = 7.5 C2
(20, 8) |20 - 18.8| + |8.4 - 8| = 1.6 |22 - 20| + |17 - 8| = 11 |25.5 - 20| + |13 - 8| = 10.5 C1
(25, 12) |25 - 18.8| + |12 - 8.4| = 9.8 |25 - 22| + |17 - 12| = 8 |25.5 - 25| + |13 - 12| = 1.5 C3
(26, 14) |26 - 18.8| + |14 - 8.4| = 12.8 |26 - 22| + |17 - 14| = 7 |26 - 25.5| + |14 - 13| = 1.5 C3
(18, 9) |18.8 - 18| + |9 - 8.4| = 1.4 |22 - 18| + |17 - 9| = 12 |25.5 - 18| + |13 - 9| = 11.5 C1
  • Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(25, 12), (26, 14)\} $$

  • Re-compute the center of newly formed clusters.
  • The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

1] Center for Cluster-1 (K1) -

$$ X = \frac {(20 + 15 + 20 + 18 + 21)}{5} = 18.8 $$

$$ Y = \frac {(9 + 7 + 8 + 9 + 9)}{5} = 8.4 $$

Therefore, new Clsuter Center C1 = (18.8, 8.4)

2] Center for Cluster-2 (K2) -

Cluster-2 (K2) has only 1 data point.

Therefore, new Clsuter Center C2 = (22, 17)

3] Center for Cluster-3 (K3) -

$$ X = \frac {(25 + 26)}{2} = 25.5 $$

$$ Y = \frac {(12 + 14)}{2} = 13 $$

Therefore, new Clsuter Center C3 = (25.5, 13)

This is the completion of Iteration 3.


  • Here we stopped after the 3 - Iterations because

    • The Center of newly formed clusters does not change
    • Data points remain present in the same clusters.
  • After 3 - Iterations we get the 3 - Clusters with their Center Points are as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\}\ with\ Center\ C1 = (18.8, 8.4) $$ $$ K2 = \{(22, 17)\}\ with\ Center\ C2 = (22, 17) $$ $$ K3 = \{(25, 12), (26, 14)\} \ with\ Center\ C3 = (25.5, 13) $$


Important Point -

  • Here, we see Cluster K2 holds only one data point.
  • Such, 1 data point clusters occur quite frequently in k-means because of dirty data.
  • That indicates K-means is highly sensitive to noise.
  • K-means minimizes squared errors by assigning outlier points to their own cluster and gives "optimal" results for the squared errors objective.
  • So this is often the correct result.
  • This makes k-means sensitive to the noises.
Please log in to add an answer.