written 5.7 years ago by |
Overall approach (FLOW):
Scale-divide key, value data into shards.
Build Patricia tree per shard and store all key, values for latter.
Prepare trees to have place hold (short) value for each key.
Flatten each Patricia tree to a disk friendly and byte alighted format fit for random access.
Recalculated file addresses in each Patricia tree to be able to store the actual values.
Create final Patricia tree with values on disk.
Step 1 : scale
Map (key, value):
{#} eg use simple hash
Shard – key = shard function (key, value)
Out value = (key, value)
Emit shard key, out value.
Step 2 : Build Patricia tree
Reduce in it ()
{#} called once per reduces before it start.
Self.patricia = patricia ()
Self. Temp key value store = temp key value store
Reduces (shard key, list of key-value pairs) :
For (key, valve) in list of key valve pairs:
Self.tempkeyvalvestore[key]=valve
Step 3 : prepare trees
Reduce_final () :
{#} called once per reduces after all reduce for key, valve in self. tempkeystore:
Self.patricia.add(key,key)