written 8.5 years ago by |
One iteration of the PageRank algorithm involves taking an estimated PageRank vector v and computing the next estimate v ′ by
Where,
β is a constant slightly less than 1
e is a vector of all 1’s
n is the number of nodes in the graph that transition matrix M represents
If n is small enough that each Map task can store the full vector v in main memory and also have room in main memory for the result vector v ′ , then there is little more here than a matrix–vector multiplication. The additional steps are to multiply each component of Mv by constant β and to add (1 − β)/n to each component.
However, it is likely, given the size of the Web today, that v is much too large to fit in main memory. So, we break M into vertical stripes and break v into corresponding horizontal stripes. This allow us to execute the Map Reduce process efficiently, with no more of v at any one Map task than can conveniently fit in main memory.