written 5.8 years ago by | • modified 5.7 years ago |
Subject: Big Data Analytics
Difficulty: Medium
Marks: 10M
written 5.8 years ago by | • modified 5.7 years ago |
Subject: Big Data Analytics
Difficulty: Medium
Marks: 10M
written 5.7 years ago by |
PCY algorithm exploits the observation that there may be much unused space in main memory on the first pass of apraisy.
In first pass only a bash function is applied on pair of item so that they bashes to a bucket.
we hash each pair and add 1 to the bucket.
At the end of 1st pass each bucket has count.
If count of bucket is > = support then the bucket is frequent bucket.
we can define a candidate pair C2 to be those pair { i , j } such that
1) i and j are frequent items.
2) { i , j } hashes to a frequent bucket.
Multistage uses several successive hash tables to reduce further the number of candidate pairs.
But requires more passes than normal PCY.