Explain K Means clustering algorithm? Apply K Means algorithms for the following data set with two clusters. Data set = {1, 2, 6, 7, 8, 10, 15, 17, 20}

499views

written 2.6 years ago by

binitamayekar ★ 6.6k

K Means Clustering

K-Means clustering is an unsupervised iterative clustering technique.
It partitions the given data set into k predefined distinct clusters.
A cluster is defined as a collection of data points exhibiting certain similarities.

K Means Clustering

It partitions the data set in such a way that -
- Each data point belongs to a cluster with the nearest mean.
- Data points belonging to one cluster have a high degree of similarity.
- Data points belonging to different clusters have a high degree of dissimilarity.

Algorithm for K-Means Clustering -

Step 1 -

Choose the number of clusters K.

Step 2 -

Randomly select any K data points as cluster centers.
Select cluster centers in such a way that they are as farther as possible from each other.

Step 3 -

Calculate the distance between each data point and each cluster center.
The distance may be calculated either by using the given Distance Function or by using Euclidean Distance.

Step 4 -

Assign each data point to some cluster.
A data point is assigned to that cluster whose center is nearest to that data point.

Step 5 -

Re-compute the center of newly formed clusters.
The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

Step 6 -

Keep repeating the procedure from Step 3 to Step 5 until any of the following stopping criteria is met:
- Center of newly formed clusters does not change.
- Data points remain present in the same cluster.
- A maximum number of iterations are reached.

The given Data set = {1, 2, 6, 7, 8, 10, 15, 17, 20}

K = 2

Iteration - 1

Step 1 -

Randomly select cluster centers in such a way that they are as farther as possible from each other.
Therefore,

$$m1 = 6, m2 = 17$$

Step 2 -

Calculate the distance between each data point and each cluster center.
A data point is assigned to that cluster whose center is nearest to that data point.

Data Points	Distance from center m1 = 6 for Cluster 1	Distance from center m2 = 17 for Cluster 2	Data Points belongs to cluster
1	\|6 - 1\| = 5	\|17 - 1\| = 16	K1
2	\|6 - 2\| = 4	\|17 - 2\| = 15	K1
6	\|6 - 6\| = 0	\|17 - 6\| = 11	K1
7	\|7 - 6\| = 1	\|17 - 7\| = 10	K1
8	\|8 - 6\| = 2	\|17 - 8\| = 9	K1
10	\|10 - 6\| = 4	\|17 - 10\| = 7	K1
15	\|15 - 6\| = 9	\|17 - 15\| = 2	K2
17	\|17 - 6\| = 11	\|17 - 17\| = 0	K2
20	\|20 - 6\| = 14	\|20 - 17\| = 3	K2

Therefore, two clusters K1 and K2 can be formed as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\}$$

$$K2 = \{15, 17, 20\}$$

Step 3 -

Re-compute the center of newly formed clusters.
The center of a cluster is computed by taking the mean of all the data points contained in that cluster.
Therefore,

$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$

$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$

Iteration - 2

Step 4 -

Repeat the procedures of Step 2 and Step 3.
Therefore,

Data Points	Distance from center m1 = 5.67 for Cluster 1	Distance from center m2 = 17.34 for Cluster 2	Data Points belongs to cluster
1	\|5.67 - 1\| = 4.67	\|17.34 - 1\| = 16.34	K1
2	\|5.67 - 2\| = 3.67	\|17.34 - 2\| = 15.34	K1
6	\|6 - 5.67\| = 0.33	\|17.34 - 6\| = 11.34	K1
7	\|7 - 5.67\| = 1.33	\|17.34 - 7\| = 10.34	K1
8	\|8 - 5.67\| = 2.33	\|17.34 - 8\| = 9.34	K1
10	\|10 - 5.67\| = 4.33	\|17.34 - 10\| = 7.34	K1
15	\|15 - 5.67\| = 9.33	\|17.34 - 15\| = 2.34	K2
17	\|17 - 5.67\| = 11.33	\|17.34 - 17\| = 0.34	K2
20	\|20 - 5.67\| = 14.33	\|20 - 17.34\| = 2.66	K2

Therefore, we again get the two similar clusters K1 and K2 as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\}$$

$$K2 = \{15, 17, 20\}$$

Center of these newly formed clusters is also the same as the previous one.

$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$

$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$

Here we stopped after the 2 - Iterations because
- The Center of newly formed clusters does not change
- Data points remain present in the same clusters.
After 2 - Iterations we get the 2 - Clusters with their Center Points are as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\} with\ center\ m1 = 5.67$$

$$K2 = \{15, 17, 20\} with\ center\ m2 = 17.34$$

ADD COMMENT EDIT

Data Points	Distance from center m1 = 6 for Cluster 1	Distance from center m2 = 17 for Cluster 2	Data Points belongs to cluster
1	\|6 - 1\| = 5	\|17 - 1\| = 16	K1
2	\|6 - 2\| = 4	\|17 - 2\| = 15	K1
6	\|6 - 6\| = 0	\|17 - 6\| = 11	K1
7	\|7 - 6\| = 1	\|17 - 7\| = 10	K1
8	\|8 - 6\| = 2	\|17 - 8\| = 9	K1
10	\|10 - 6\| = 4	\|17 - 10\| = 7	K1
15	\|15 - 6\| = 9	\|17 - 15\| = 2	K2
17	\|17 - 6\| = 11	\|17 - 17\| = 0	K2
20	\|20 - 6\| = 14	\|20 - 17\| = 3	K2

Data Points	Distance from center m1 = 5.67 for Cluster 1	Distance from center m2 = 17.34 for Cluster 2	Data Points belongs to cluster
1	\|5.67 - 1\| = 4.67	\|17.34 - 1\| = 16.34	K1
2	\|5.67 - 2\| = 3.67	\|17.34 - 2\| = 15.34	K1
6	\|6 - 5.67\| = 0.33	\|17.34 - 6\| = 11.34	K1
7	\|7 - 5.67\| = 1.33	\|17.34 - 7\| = 10.34	K1
8	\|8 - 5.67\| = 2.33	\|17.34 - 8\| = 9.34	K1
10	\|10 - 5.67\| = 4.33	\|17.34 - 10\| = 7.34	K1
15	\|15 - 5.67\| = 9.33	\|17.34 - 15\| = 2.34	K2
17	\|17 - 5.67\| = 11.33	\|17.34 - 17\| = 0.34	K2
20	\|20 - 5.67\| = 14.33	\|20 - 17.34\| = 2.66	K2