Explain initialization of cluster tree in GRGPF algorithm.

207views

written 2.7 years ago by

Krishna_Agrawal • 2.7k

Initialization of cluster tree in GRGPF algorithm :-

The clusters are organized into a tree, and the nodes of the tree may be very large, perhaps disk blocks or pages, as in the case of a B-tree, which the cluster-representing tree resembles.
Each leaf of the tree holds as many cluster representations as can fit.
A cluster representation has a size that does not depend on the number of points in the cluster.
An interior node of the cluster tree holds a sample of the clustroids of the clusters represented by each of its subtrees, along with pointers to the roots of those subtrees.
The samples are of fixed size, so the number of children that an interior node may have is independent of its level.
As we go up the tree, the probability that a given cluster's clustroid is part of the sample diminishes.
We initialize the cluster tree by taking a main-memory sample of the dataset and clustering it hierarchically.
The result of this clustering is a tree T, but T is not exactly the tree used by the GRGPF Algorithm. Rather, we select from T certain of its nodes that represent clusters of approximately some desired size n.
These are the initial clusters for the GRGPF Algorithm, and we place their representations at the leaf of the cluster-representing tree. We then group clusters with a common ancestor in T into interior nodes of the cluster-representing tree. In some cases, rebalancing of the cluster-representing tree will be necessary.

ADD COMMENT EDIT