Suppose a data stream consists of the integers 1,3,2,1,2,3,4,3,1,2,3,1. Let the Hash function being used is h(x) = (6x+1) mod 5;

88views

written 5.8 years ago by

teamques10 ★ 69k

• modified 5.8 years ago

PCY algorithm exploits the observation that there may be much unused space in main memory on the first pass of apraisy.
In first pass only a bash function is applied on pair of item so that they bashes to a bucket.
we hash each pair and add 1 to the bucket.
At the end of 1st pass each bucket has count.
If count of bucket is > = support then the bucket is frequent bucket.
we can define a candidate pair C2 to be those pair { i , j } such that

1) i and j are frequent items.

2) { i , j } hashes to a frequent bucket.

enter image description here

Multistage uses several successive hash tables to reduce further the number of candidate pairs.
But requires more passes than normal PCY.

enter image description here

Document data store are used for storing, retrieving and managing document oriented information.
Hierarchical data structures can be directly stored in document database.
Document data store uses a key structure.
Document path is used like a key to success the leaf values of a document tree structure.

- For eg. Employee [id = '2300'] / Address / street / text ( )

- eg of document store: Mongo DB, Couch DB, Couch Base.

In column based store, data is stoned into columns and these columns are logically grouped into column families.
splitting the data column size helps in faster retrieval of data
example : Big Table, H Base, Hyper Table.

ADD COMMENT EDIT