Perform K-means clustering on the given data where K=2. {(1,2), (2,3), (3,4), (4,5), (5,6)}

195views

written 2.8 years ago by

binitamayekar ★ 6.6k

• modified 2.8 years ago

K-means Clustering Problem

The Given Dataset = {(1,2), (2,3), (3,4), (4,5), (5,6)}

Number of Clusters = K = 2

Iteration - 1

Step 1 -

Randomly select any 2 data points as cluster centers because the number of clusters is 2.
Select cluster centers in such a way that they are as farther as possible from each other.

So here we chooses 2 random initial cluster centers as C1 = (1, 2), and C2 = (4, 5)

Step 2 -

Calculate the distance between each data point and each cluster center.
The distance may be calculated either by using the Distance Function or by using the Euclidean distance formula.

Here, we calculate the distance by using the Distance Function between two points a = (x1, y1) and b = (x2, y2) as follows:

$$Ρ(a, b) = |x2 – x1| + |y2 – y1|$$

Now, calculate the distance of each point from each of the centers of the 2 clusters. The distance is calculated by using the above-given distance function formula.

The following explanation shows the calculation of distance between the first data point of the given dataset (1, 2) with the centers of the cluster:

1] Calculating Distance Between a = (1, 2) and C1 = (1, 2)

Ρ(a, C1) = |x2 – x1| + |y2 – y1| = |1 – 1| + |2 – 2| = 0

2] Calculating Distance Between a = (1, 2) and C2 = (4, 5)

Ρ(a, C2) = |x2 – x1| + |y2 – y1| = |4 – 1| + |5 – 2| = 3 + 3 = 6

Similarly, now calculate the distance between all other data points from both of the centers of 2 clusters.

Step 3 -

To do this we use the table that shows all the calculations.
After the calculation, we also decide which data point belongs to which cluster.
The given data point belongs to that cluster whose center is nearest to it.

Given Points	Distance from the center (1, 2) of Cluster - 1	Distance from the center (4, 5) of Cluster - 2	Point belongs to Cluster
(1,2)	= \|1 – 1\| + \|2 – 2\| = 0	= \|4 – 1\| + \|5 – 2\| = 3 + 3 = 6	C1
(2,3)	= \|2 – 1\| + \|3 – 2\| = 1 + 1 = 2	= \|4 – 2\| + \|5 – 3\| = 2 + 2 = 4	C1
(3,4)	= \|3 - 1\| + \|4 – 2\| = 2 + 2 = 4	= \|4 – 3\| + \|5 – 4\| = 1 + 1 = 2	C2
(4,5)	= \|4 – 1\| + \|5 – 2\| = 3 + 3 = 6	= \|4 – 4\| + \|5 – 5\| = 0 + 0 = 0	C2
(5,6)	= \|5 – 1\| + \|6 – 2\| = 4 + 4 = 8	= \|5 – 4\| + \|6 – 5\| = 1 + 1 = 2	C2

Step 4 -

From the above table, we can form 2 clusters are as follows:

Cluster - 1:

The First cluster contains the following 2 data points - {(1, 2), (2, 3)}

Cluster - 2:

The Second cluster contains the following 3 data points - {(3,4), (4,5), (5,6)}

Step 5 -

Now,

Re-compute the new centers of 2 clusters.
The new cluster center is computed by taking the mean of all the data points contained in that cluster.

For Center of Cluster - 1

X = (1 + 2) / 2 = 3 / 2 = 1.5

Y = (2 + 3) / 2 = 5 / 2 = 2.5

Therefore, C1 = (1.5, 2.5)

For Center of Cluster - 2

X = (3 + 4 + 5) / 3 = 12 / 3 = 4

Y = (4 + 5 + 6) / 3 = 15 / 3 = 5

Therefore, C2 = (4, 5)

This is the completion of Iteration 1.

Iteration - 2

Again Repeat steps 2 to 5 same as performed in Iteration - 1.

Calculate the distance between all the data points from both of the new centers of the 2 clusters.
The distance is calculated by using the Distance Function.
After the calculation, also decide which data point belongs to which cluster.
The given data point belongs to that cluster whose center is nearest to it.
Re-compute the new centers of 2 clusters.
The new cluster center is computed by taking the mean of all the data points contained in that cluster.

Given Points	Distance from the center (1.5, 2.5) of Cluster - 1	Distance from the center (4, 5) of Cluster - 2	Point belongs to Cluster
(1,2)	= \|1.5 – 1\| + \|2.5 – 2\| = 0.5 + 0.5 = 1	= \|4 – 1\| + \|5 – 2\| = 3 + 3 = 6	C1
(2,3)	= \|2 – 1.5\| + \|3 – 2.5\| = 0.5 + 0.5 = 1	= \|4 – 2\| + \|5 – 3\| = 2 + 2 = 4	C1
(3,4)	= \|3 – 1.5\| + \|4 – 2.5\| = 1.5 + 1.5 = 3	= \|4 – 3\| + \|5 – 4\| = 1 + 1 = 2	C2
(4,5)	= \|4 – 1.5\| + \|5 – 2.5\| = 2.5 + 2.5 = 5	= \|4 – 4\| + \|5 – 5\| = 0 + 0 = 0	C2
(5,6)	= \|5 – 1.5\| + \|6 – 2.5\| = 3.5 + 3.5 = 7	= \|5 – 4\| + \|6 – 5\| = 1 + 1 = 2	C2

From the above table, we again get the same 2 clusters as follows:

Cluster - 1:

The First cluster contains the following 2 data points - {(1, 2), (2, 3)}

Cluster - 2:

The Second cluster contains the following 3 data points - {(3,4), (4,5), (5,6)}

Now again,

Re-compute the new centers of 2 clusters.
The new cluster center is computed by taking the mean of all the data points contained in that cluster.

For Center of Cluster - 1

X = (1 + 2) / 2 = 3 / 2 = 1.5

Y = (2 + 3) / 2 = 5 / 2 = 2.5

Therefore, C1 = (1.5, 2.5)

For Center of Cluster - 2

X = (3 + 4 + 5) / 3 = 12 / 3 = 4

Y = (4 + 5 + 6) / 3 = 15 / 3 = 5

Therefore, C2 = (4, 5)

This is the completion of Iteration 2.

Iteration stooped when any of the following conditions are fulfilled.

The Center of newly formed clusters does not change
Data points remain present in the same cluster
Maximum numbers of iterations are reached

Here we stopped after the 2 - Iterations because The Center of newly formed clusters does not change and Data points remain present in the same clusters.

After 2 - Iterations we get the 2 - Clusters with their Center Points are as follows:

k1 = {(1,2), (2,3)} and C1 = (1.5, 2.5)

k2 = {(3,4), (4,5), (5,6)} and C2 = (4, 5)

ADD COMMENT EDIT