written 2.6 years ago by |
K Means Clustering
- K-Means clustering is an unsupervised iterative clustering technique.
- It partitions the given data set into k predefined distinct clusters.
- A cluster is defined as a collection of data points exhibiting certain similarities.
- It partitions the data set in such a way that -
- Each data point belongs to a cluster with the nearest mean.
- Data points belonging to one cluster have a high degree of similarity.
- Data points belonging to different clusters have a high degree of dissimilarity.
Algorithm for K-Means Clustering -
Step 1 -
- Choose the number of clusters K.
Step 2 -
- Randomly select any K data points as cluster centers.
- Select cluster centers in such a way that they are as farther as possible from each other.
Step 3 -
- Calculate the distance between each data point and each cluster center.
- The distance may be calculated either by using the given Distance Function or by using Euclidean Distance.
Step 4 -
- Assign each data point to some cluster.
- A data point is assigned to that cluster whose center is nearest to that data point.
Step 5 -
- Re-compute the center of newly formed clusters.
- The center of a cluster is computed by taking the mean of all the data points contained in that cluster.
Step 6 -
- Keep repeating the procedure from Step 3 to Step 5 until any of the following stopping criteria is met:
- Center of newly formed clusters does not change.
- Data points remain present in the same cluster.
- A maximum number of iterations are reached.
The given Data set = {1, 2, 6, 7, 8, 10, 15, 17, 20}
K = 2
Iteration - 1
Step 1 -
- Randomly select cluster centers in such a way that they are as farther as possible from each other.
- Therefore,
$$m1 = 6, m2 = 17$$
Step 2 -
- Calculate the distance between each data point and each cluster center.
- A data point is assigned to that cluster whose center is nearest to that data point.
Data Points | Distance from center m1 = 6 for Cluster 1 | Distance from center m2 = 17 for Cluster 2 | Data Points belongs to cluster |
---|---|---|---|
1 | |6 - 1| = 5 | |17 - 1| = 16 | K1 |
2 | |6 - 2| = 4 | |17 - 2| = 15 | K1 |
6 | |6 - 6| = 0 | |17 - 6| = 11 | K1 |
7 | |7 - 6| = 1 | |17 - 7| = 10 | K1 |
8 | |8 - 6| = 2 | |17 - 8| = 9 | K1 |
10 | |10 - 6| = 4 | |17 - 10| = 7 | K1 |
15 | |15 - 6| = 9 | |17 - 15| = 2 | K2 |
17 | |17 - 6| = 11 | |17 - 17| = 0 | K2 |
20 | |20 - 6| = 14 | |20 - 17| = 3 | K2 |
- Therefore, two clusters K1 and K2 can be formed as follows:
$$ K1 = \{1, 2, 6, 7, 8, 10\}$$
$$K2 = \{15, 17, 20\}$$
Step 3 -
- Re-compute the center of newly formed clusters.
- The center of a cluster is computed by taking the mean of all the data points contained in that cluster.
- Therefore,
$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$
$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$
Iteration - 2
Step 4 -
- Repeat the procedures of Step 2 and Step 3.
- Therefore,
Data Points | Distance from center m1 = 5.67 for Cluster 1 | Distance from center m2 = 17.34 for Cluster 2 | Data Points belongs to cluster |
---|---|---|---|
1 | |5.67 - 1| = 4.67 | |17.34 - 1| = 16.34 | K1 |
2 | |5.67 - 2| = 3.67 | |17.34 - 2| = 15.34 | K1 |
6 | |6 - 5.67| = 0.33 | |17.34 - 6| = 11.34 | K1 |
7 | |7 - 5.67| = 1.33 | |17.34 - 7| = 10.34 | K1 |
8 | |8 - 5.67| = 2.33 | |17.34 - 8| = 9.34 | K1 |
10 | |10 - 5.67| = 4.33 | |17.34 - 10| = 7.34 | K1 |
15 | |15 - 5.67| = 9.33 | |17.34 - 15| = 2.34 | K2 |
17 | |17 - 5.67| = 11.33 | |17.34 - 17| = 0.34 | K2 |
20 | |20 - 5.67| = 14.33 | |20 - 17.34| = 2.66 | K2 |
- Therefore, we again get the two similar clusters K1 and K2 as follows:
$$ K1 = \{1, 2, 6, 7, 8, 10\}$$
$$K2 = \{15, 17, 20\}$$
- Center of these newly formed clusters is also the same as the previous one.
$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$
$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$
Here we stopped after the 2 - Iterations because
- The Center of newly formed clusters does not change
- Data points remain present in the same clusters.
After 2 - Iterations we get the 2 - Clusters with their Center Points are as follows:
$$ K1 = \{1, 2, 6, 7, 8, 10\} with\ center\ m1 = 5.67$$
$$K2 = \{15, 17, 20\} with\ center\ m2 = 17.34$$