Clustering

Clustering is one of the unsupervised Machine Learning mechanisms that search for similarity and relationship patterns among data samples and then creates a cluster of those samples based on similarity measurements.
This clustering mechanism is further classified into various types, such as Density-based, Hierarchical-based, Partitioning, and Grid.

Hierarchical Clustering

This type of clustering groups together the unlabeled data points having similar characteristics.
Hierarchical clustering treats every data point as a separate cluster.
Then, it repeatedly executes the subsequent steps like, Identify the two clusters which can be closest together, and merging the two maximum comparable clusters.
This process needs to continue until all the clusters are merged.
Hence, this method creates a hierarchical decomposition of the given set of data objects.
Based on this how the hierarchical decomposition is formed this clustering is further classified into two types,
- Agglomerative Approach
- Divisive Approach
Hierarchical clustering typically works by sequentially merging similar clusters. This is known as agglomerative hierarchical clustering.
In theory, it can also be done by initially grouping all the observations into one cluster, and then successively splitting these clusters. This is known as divisive hierarchical clustering.
Divisive clustering is rarely done in practice.

This approach is also known as the Bottom-Up Approach.
This approach starts with each object forming a separate group.
It keeps on merging the objects or groups that are close to one another.
It keeps on doing so until all of the groups are merged into one or until the termination condition holds.
Algorithm for Agglomerative Hierarchical Clustering is:
- Step 1 - Calculate the similarity of one cluster with all the other clusters. Calculation of Proximity Matrix.
- Step 2 - Consider every data point as an individual cluster.
- Step 3 - Merge the clusters which are highly similar or close to each other.
- Step 4 - Recalculate the proximity matrix for each cluster.
- Step 5 - Repeat Steps 3 and 4 until only a single cluster remains.

This approach is also known as the Top-Down Approach.
This approach starts with all of the objects in the same cluster.
In the continuous iteration, a cluster is split up into smaller clusters.
It is down until each object in one cluster or the termination condition holds.
This method is rigid, i.e., once a merging or splitting is done, it can never be undone.

ADD COMMENT EDIT