written 2.6 years ago by | • modified 2.6 years ago |
Explain initialization of cluster tree in GRGPF algorithm.
written 2.6 years ago by | • modified 2.6 years ago |
Explain initialization of cluster tree in GRGPF algorithm.
written 2.6 years ago by |
Initialization of cluster tree in GRGPF algorithm :-
The clusters are organized into a tree, and the nodes of the tree may be very large, perhaps disk blocks or pages, as in the case of a B-tree, which the cluster-representing tree resembles.
Each leaf of the tree holds as many cluster representations as can fit.
A cluster representation has a size that does not depend on the number of points in the cluster.
An interior node of the cluster tree holds a sample of the clustroids of the clusters represented by each of its subtrees, along with pointers to the roots of those subtrees.
The samples are of fixed size, so the number of children that an interior node may have is independent of its level.
As we go up the tree, the probability that a given cluster's clustroid is part of the sample diminishes.
We initialize the cluster tree by taking a main-memory sample of the dataset and clustering it hierarchically.
The result of this clustering is a tree T, but T is not exactly the tree used by the GRGPF Algorithm. Rather, we select from T certain of its nodes that represent clusters of approximately some desired size n.
These are the initial clusters for the GRGPF Algorithm, and we place their representations at the leaf of the cluster-representing tree. We then group clusters with a common ancestor in T into interior nodes of the cluster-representing tree. In some cases, rebalancing of the cluster-representing tree will be necessary.