written 5.6 years ago by |
Multiprocessor system is divided into following basic architectures:
Symmetric Multiprocessor System (SMP)
UMA (Uniform Memory Access)
NUMA (Non-Uniform Memory Access)
1. Symmetric Multiprocessor System (SMP)
In this architecture, two or more processors are connected to the same memory. It has full access to input and output devices with the same rights. Only one instance of the operating system runs all processors equally. No one processor will treat as a special. Most of the multiprocessors use SMP architecture. SMP structure is given below.
SMP Architecture
SMP has a tightly coupled system because a number of homogeneous processors running independently of each other. That means each processor running different programs and uses different data sets. Above figure shows the pool of processors each one having own cache and sharing the common main memory as well as common i/o devices. When CPU wants to read the memory, it first checks the bus is idle or not. If the bus is idle, it puts the address of the bus it wants then it activates certain signals and waits for memory to put the required word on the bus. But if the bus is busy, the CPU has to wait.
To solve this problem, the cache is used with a processor. Due to this many reads can be possible. There is much less bus traffic and the system can support more CPUs.
2. Uniform Memory Access (UMA)
In this type of architecture, all processors share the common (Uniform) centralized primary memory. Each CPU has the same memory access time. This system also called as shared memory multiprocessor (SMM). In the figure below each processor has a cache at one or more level. And also shares a common memory as well as input output systems.
UMA architecture
There are three types of UMA
I) Using a cross based switch
II) Using a multistage interconnection network
III) Using bus based symmetric multiprocessor
I) Using a cross based switch
In Normal structure, we can extend the size upto only 16 CPU limits. But sometimes we need to extend the limit. So, we required a different kind of interconnection networks. One of them is simple circuit crossbar which connects 'n' CPUs to 'k' memories. It is used mostly in telephone switches. Each intersection has a cross point which has a switch for closing and opening purpose. It is one of the non-blocking networks. E.g. Sun Enterprise 1000 uses this technology. It consists of a single cabinet with up to 64 CPUs. The crossbar switch is packaged on a circuit board with eight plugs in slots on each side.
UMA cross based switch
In the above figure, there are the cross-connection switching of multiple CPUs and memories. Each cross point has a switch which is either opened or closed (in (b) and (c)). The drawback of this architecture is 'n' number of CPUs and 'k' number of memory required n2 switches. i.e. 1000 CPU needs 1000000 switches. But the main advantage is any CPU can access any available memory with less blocking
II) Using multistage interconnection network
To go beyond the Sun Enterprise of limit 1000, we need powerful interconnection network. 2 x 2 switches can be used to build up the large network. Example of this technology is the Omega network. In this system, the wired pattern is shuffle perfectly. Each memory has given labels which used to find the route in the network. The Omega network is a blocking network. Requests come in a sequence but cannot be served concurrently. Conflicts can arise in using a connection or a switch, in accessing a memory block or in answering a CPU request. Many techniques are used to minimize conflicts.
Multistage interconnection network
In the above figure, 8 CPUs are connected to 8 memories, using 12 switches laid out in three stages. In generally n CPUs and n memories require log2n stages and n/2 switch per stage, giving a total of (n/2) log2n.
III) Using bus based symmetric multiprocessor
Bus based symmetric multiprocessor
The simplest multiprocessors system consists of a single bus. Two or more CPUs and one or more memory modules all use the same bus for communication. If the bus is busy when a CPU wants to access memory, it must wait. Adding more CPUs results in more waiting. This can be mitigated to some degree by including processor cache support.
3. NUMA (Non-Uniform Memory Access)
If we want to improve the limit of the number of CPUs, UMA is not the best option. Because it accesses the memory uniformly. To solve this problem we have another technique called Non-Uniform Memory Access (NUMA). They share the single address space (local memory) through all the CPU for improving the result. But provides faster local access than remote access.
UMA programs can run without change in NUMA machines but performance may be slow. Memory access time depends on the memory location which is relative to a processor. The processor can access it’s own local memory fastly than it's non-local memory. NUMA is used in a symmetric multiprocessing system.
SMP is obviously tightly coupled and shares everything in multiple processors by using a single operating system. NUMA has three key properties.
i. Access to remote memory is possible.
ii. Accessing remote memory is slower than local memory.
iii. Remote access time is not hidden by the caching.
The following figure shows the structure of NUMA. In this CPU and memory is connected to MMU (Memory management unit) via local bus and local memory connected to the system bus via local bus.
NUMA Structure
It is used in SMP. Multiprocessor system without NUMA create the problem of cache missed and only one processor can access the computer memory at a time. To solve this problem, separate memory is provided for each processor.
There are 2 types of NUMA
i. NC-NUMA:
When the time to accessing a remote is not hidden, the system is called as NC-NUMA (Non-Caching-Non-Uniform Memory Access). In this type of NUMA, processors have no local cache. Cache coherence problem is not present in NC-NUMA. Each memory item is in a single location. Remote memory access is difficult so this system helps to the software's those are relocated memory pages from one block to another just for improving the performance. A page scanner demon activates every few seconds, examines statistics on memory usage, and moves pages from one block to another. The Figure is the NC-NUMA type structure.
ii. CC- NUMA:
When a coherent cache is present, the system is called as CC-NUMA (Cache Coherent-Non Uniform Memory Access). CC-NUMA uses the directory based protocol rather than snooping. The basic idea is to manage each node in the system with a directory for its RAM blocks. A database tells that in which cache is located a RAM block, and what is its state.