written 5.8 years ago by | • modified 5.8 years ago |
In map phase the task tracker performs the computation on local data and output is generated.
The output is called as intermediate results and are stored on temporary local storage.
After the map phase is over, all the intermediate values for a given intermediate key are combined together into a list.
The list is given to a reducer.
There may be single or multiple reducers.
All values associated with a particular intermediate key are guaranteed to go to the same reducer.
The intermediate keys, and their value lists, are passed to the reducer in sorted key order.
This step is known as ' shuffle and sort'.
The reducer outputs zero or more final key valve pairs.
These are written to HDFS.
The reducer usually emits a single key/valve pair for each input key.
The job tracker starts a reduce task on any one of the nodes and instruct to grab the intermediate data from the completed map task.
The reduce performs final computation and o/p is written to HDFS.
The client reads the output from file and job completes.