written 7.0 years ago by | • modified 5.6 years ago |
Suppose we have an $n×n$ matrix M, whose element in row i and column j will be denoted mij . Suppose we also have a vector v of length n, whose jth element is vj . Then the matrix-vector product is the vector x of length n, whose ith element xi is given by $xi = \sum_{j=1}^n mij \times vj$. If n = 100, we do not want to use a DFS or MapReduce for this calculation. But this sort of calculation is at the heart of the ranking of Web pages that goes on at search engines, and there, n is in the tens of billions.3 Let us first assume that n is large, but not so large that vector v cannot fit in main memory and thus be available to every Map task. The matrix M and the vector v each will be stored in a file of the DFS. We assume that the row-column coordinates of each matrix element will be discoverable, either from its position in the file, or because it is stored with explicit coordinates, as a triple (i, j,mij). We also assume the position of element vj in the vector v will be discoverable in the analogous way.
The Map Function: The Map function is written to apply to one element of M. However, if v is not already read into main memory at the compute node executing a Map task, then v is first read, in its entirety, and subsequently will be available to all applications of the Map function performed at this Map task. Each Map task will operate on a chunk of the matrix M. From each matrix element mij it produces the key-value pair (i,mij* vj). Thus, all terms of the sum that make up the component xi of the matrix-vector product will get the same key, i.
The Reduce Function: The Reduce function simply sums all the values associated with a given key i. The result will be a pair (i, xi).
we can divide the matrix into vertical stripes of equal width and divide the vector into an equal number of horizontal stripes, of the same height. Our goal is to use enough stripes so that the portion of the vector in one stripe can fit conveniently into main memory at a compute node. Figure suggests what the partition looks like if the matrix and vector are each divided into five stripes.
Map : for input mij
Emit (i, ps = ∑ mij * vj)
Reduce : Compute Xi = ∑ ps
written 2.7 years ago by |
Matrix Multiplication
- MapReduce is a technique in which a huge program is subdivided into small tasks and run parallelly to make computation faster, save time, and mostly used in distributed systems. It has 2 important parts:
- Mapper: It takes raw data input and organizes into key, value pairs. For example, In a dictionary, you search for the word “Data” and its associated meaning is “facts and statistics collected together for reference or analysis”. Here the Key is Data and the Value associated with is facts and statistics collected together for reference or analysis.
- Reducer: It is responsible for processing data in parallel and produce final output.
Matrix-Vector Multiplication by Map-Reduce
Therefore computing the mapper for Matrix A:
Matrix-Vector Multiplication by Map-Reduce
Matrix-Vector Multiplication by Map-Reduce
Matrix-Vector Multiplication by Map-Reduce
Matrix-Vector Multiplication by Map-Reduce