written 7.1 years ago by | • modified 2.9 years ago |
Mumbai University > Computer Engineering > Sem 8 > parallel and distributed systems
Marks: 10M
written 7.1 years ago by | • modified 2.9 years ago |
Mumbai University > Computer Engineering > Sem 8 > parallel and distributed systems
Marks: 10M
written 7.1 years ago by | • modified 7.1 years ago |
Single instruction is applied to a multiple data item to produce the same output.
Master instruction work on vector of operand
No of processors running the same instruction one clock cycle by the strict lock approach
It is type of Instruction level parallelism
Communication network allow parallel synchronous communication between several Processing Element / Memory modules.
Following two SIMD architectures depict fundamentally different approaches to the parallel processing
Data Communication based on message passing paradigm:
Here the memory is part of PE and thus it communicates through the interconnection network for passing the data.
Shared memory between processors:
Here memories are not local and the data is read and aligned by the alignment network that aligns the data between PEs and Memory modules
SIMD Parallel Process:
During the execution of program, it is often required to mask of a PE from doing processing, which is equivalent to having some autonomous control within a PE.
PE has a mask bit which can be masked during processing of an instruction.
When a mask in PE is set it receives instruction from Control Unit as No operation.
Executes instruction when mask bit is reset.
Each PE has one or more index registers added to global addresses supplied by the CU Instruction.
The arithmetic logic unit has few general purpose registers and pointer registers to support data and address manipulation.
SIMD mesh connected architecture:
Here we are dealing with the mesh Connected architecture which has been built using the mesh connected architecture
Each node of such machine will have four ports- Top port, left port,right port and bottom port.
The instruction set belongs to CU with PEs executing some of instructions that are prefixed with P to indicate that these shall be executed on PEs in parallel.
Each PE also has four bidirectional ports for communication to four neighbors.
CU to PEs communication:
The data is distributed from CU to PEs., one invloves distribution of data to all PEs and the other one is between PE and the CU.
Instruction, BROADCAST, R - Broadcast the data to all processing elements where all PEs receive and stores the data in the register R. Data is in the D register of CU
Routing instruction are used such as WRAPTB- wrap end around connection top bottom , WRAPLR- wrap end around connection left right, UNWRAPTB-Unwrap top bottom, UNWRAPLR- unwrap left right
PE computing
There are numerous instructions available for computing the processing element.
Instruction like PFADD R1,R2- Parallel float addition , PFSUB- parallel float subtraction etc
PE port to PE GPRS instructions - PMOV R,LP- parallel move in to register R from the right port , PMOV R,TP- Parallel move in to register R from the top port
PE GPRS to PE Port instructions:
PMOV RP,R- Parallel move in to right port from register R etc
Instructions for PEs to PEs:- PIN Lp- parallel the data to left port, PIN RP- right port etc
Example Mesh connected architecture
ILLIAC-IV
The ILLIAC-IV project was started in 1966 at the University of Illinois.
A system with 256 processors controlled by a CP was envisioned.
The set of processors was divided into four quadrants of 64 processors.
The PE array is arranged as an 8x8 torus.