K-means Clustering Problem
The Given Dataset = (2, 5), (2.5, 3), (3.5, 4), (5, 7.9), (8, 11.3), (11, 12), (12, 19), (13, 5), (7, 3), (11, 7), (15,15.1), (1, 2), (2, 20), (10, 10), (13, 1.1), (7, 9), (30, 42), (18, 21), (55, 39), (32, 68), (30,30), (50, 50.1)
Number of Clusters = K = 4
Iteration - 1
Step 1 -
- Randomly select any 4 data points as cluster centers because the number of clusters are 4.
- Select cluster centers in such a way that they are as farther as possible from each other.
So here we chooses 4 random initial cluster centers as C1 = (2, 5), C2 = (11, 12), C3 = (18, 21), and C4 = (30, 30)
Step 2 -
- Calculate the distance between each data point and each cluster center.
- The distance may be calculated either by using the Distance Function or by using the Euclidean distance formula.
Here, we calculate the distance by using the Distance Function between two points a = (x1, y1) and b = (x2, y2) as follows:
$$Ρ(a, b) = |x2 – x1| + |y2 – y1|$$
Now, calculate the distance of each point from each of the centers of the 4 clusters.
The distance is calculated by using the above-given distance function formula.
The following explanation shows the calculation of distance between the first data point of the given dataset (2, 5) and each of the centers of the 4 clusters:
1] Calculating Distance Between a = (2, 5) and C1 = (2, 5)
Ρ(a, C1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |5 – 5| = 0
2] Calculating Distance Between a = (2, 5) and C2 = (11, 12)
Ρ(a, C2) = |x2 – x1| + |y2 – y1| = |11 – 2| + |12 – 5| = 9 + 7 = 16
3] Calculating Distance Between a = (2, 5) and C3 = (18, 21)
Ρ(a, C3) = |x2 – x1| + |y2 – y1| = |18 – 2| + |21 – 5| = 16 + 16 = 32
4] Calculating Distance Between a = (2, 5) and C4 = (30, 30)
Ρ(a, C4) = |x2 – x1| + |y2 – y1| = |30 – 2| + |30 – 5| = 28 + 25 = 53
Similarly, now calculate the distance between all other data points from each of the centers of the 4 clusters.
Step 3 -
- To do this we use the table that shows all the calculations.
- After the calculation, we also decide which data point belongs to which cluster.
- The given data point belongs to that cluster whose center is nearest to it.
Given Points |
Distance from center (2, 5) of Cluster - 1 |
Distance from center (11, 12) of Cluster - 2 |
Distance from center (18, 21) of Cluster - 3 |
Distance from center (30, 30) of Cluster - 4 |
Point belongs to Cluster |
(2, 5) |
0 |
16 |
32 |
53 |
C1 |
(2.5, 3) |
2.5 |
17.5 |
33.5 |
54.5 |
C1 |
(3.5, 4) |
2.5 |
15.5 |
31.5 |
52.5 |
C1 |
(5, 7.9) |
5.9 |
10.1 |
26.1 |
47.1 |
C1 |
(8, 11.3) |
12.3 |
3.7 |
19.7 |
40.7 |
C2 |
(11, 12) |
16 |
0 |
16 |
37 |
C2 |
(12, 19) |
24 |
8 |
8 |
29 |
C3 |
(13, 5) |
11 |
9 |
21 |
42 |
C2 |
(7, 3) |
7 |
13 |
29 |
50 |
C1 |
(11, 7) |
11 |
5 |
21 |
42 |
C2 |
(15,15.1) |
23.1 |
7.1 |
8.9 |
29.9 |
C2 |
(1, 2) |
4 |
20 |
36 |
57 |
C1 |
(2, 20) |
15 |
17 |
17 |
38 |
C1 |
(10, 10) |
13 |
3 |
19 |
40 |
C2 |
(13, 1.1) |
14.9 |
12.9 |
24.9 |
45.9 |
C2 |
(7, 9) |
9 |
7 |
23 |
44 |
C2 |
(30, 42) |
65 |
49 |
33 |
12 |
C4 |
(18, 21) |
32 |
16 |
0 |
21 |
C3 |
(55, 39) |
87 |
71 |
55 |
34 |
C4 |
(32, 68) |
93 |
77 |
61 |
40 |
C4 |
(30,30) |
53 |
37 |
21 |
0 |
C4 |
(50, 50.1) |
93.1 |
77.1 |
61.1 |
40.1 |
C4 |
Step 4 -
From the above table, we can form 4 clusters are as follows:
Cluster - 1:
The First cluster contains the following 7 data points - (2, 5), (2.5, 3), (3.5, 4), (5, 7.9) (7, 3), (1, 2), (2, 20)
Cluster 2:
The Second cluster contains the following 8 data points - (8, 11.3), (11, 12), (13, 5), (11, 7), (15,15.1), (10, 10), (13, 1.1), (7, 9),
Cluster - 3:
The Third cluster contains the following 2 data points - (12, 19), (18, 21)
Cluster - 4:
The Fouth cluster contains the following 5 data points - (30, 42), (55, 39), (32, 68), (30,30), (50, 50.1)
Step 5 -
Now,
- Re-compute the new centers of 4 clusters.
- The new cluster center is computed by taking the mean of all the data points contained in that cluster.
For Center of Cluster - 1
X = (2 + 2.5 + 3.5 + 5 + 7 + 1 + 2) / 7 = 3.28
Y = (5 + 3 + 4 + 7.9 + 3 + 2 + 20) / 7 = 6.414
Therefore, C1 = (3.28, 6.414)
For Center of Cluster - 2
X = (8 + 11 + 13 + 11 + 15 + 10 + 13 + 7) / 8 = 11
Y = (11.3 + 12 + 5 + 7 + 15.1 + 10 + 1.1 + 9) / 8 = 8.8125
Therefore, C2 = (11, 8.8125)
For Center of Cluster - 3
X = (12 + 18) / 2 = 15
Y = (19 + 21) / 2 = 20
Therefore, C3 = (15, 20)
For Center of Cluster - 4
X = (30 + 55 + 32 +30 + 50) / 5 = 39.4
Y = (42 + 39 + 68 + 30 + 50.1) / 5 = 45.82
Therefore C4 = (39.4, 45.82)
This is the completion of Iteration 1.
Iteration - 2
Again Repeat steps 2 to 5 same as performed in Iteration - 1.
- Calculate the distance between all the data points from each of the new centers of the 4 clusters.
- The distance is calculated by using the Distance Function.
- After the calculation, also decide which data point belongs to which cluster.
- The given data point belongs to that cluster whose center is nearest to it.
- Re-compute the new centers of 4 clusters.
- The new cluster center is computed by taking the mean of all the data points contained in that cluster.
All these steps are shown in the below Figure:
Iteration stooped when any of the following conditions are fulfilled.
- The Center of newly formed clusters does not change
- Data points remain present in the same cluster
- Maximum number of iterations are reached
Here we stopped after the 2 - Iterations because the maximum number of iterations are reached.
After 2 - Iterations we get the 4 - Clusters with their Center Points are as follows:
k1 = { (2, 5), (2.5, 3), (3.5, 4), (5, 7.9) (7, 3), (1, 2) } and C1 = (3.5, 4.15)
k2 = { (8, 11.3), (11, 12), (13, 5), (11, 7), (10, 10), (13, 1.1), (7, 9) } and C2 = (10.42, 7.914)
k3 = { (12, 19), (15,15.1), (2, 20), (18, 21), (30,30) } and C3 = (15.5, 21.02)
k4 = { (30, 42), (55, 39), (32, 68), (50, 50.1) } and C4 = (41.75, 49.77)