Design issues in Multiprocessor Architecture

818views

written 5.6 years ago by

teamques10 ★ 69k

An operating system for good performance is more difficult for shared-memory Multiprocessors as compare to uni-processors. Multiprocessors require that different issues to be considered at the time of designing.

For example, the overhead of cache consistency requires careful attention to the placement of data in order to reduce the number of cache misses. Similarly, in large systems, the distribution of memory across the system also need to track the location of data where it is placed and improve memory access locality. Following are some issues while designing the multiprocessor Architecture and operating system.

Multiple processors

The most important difference between shared-memory multiprocessors (SMMPs) and uniprocessors is the number of processors. Although uni-processor system software may already deal with concurrency issues. The multiprocessors create proper parallelism with additional complications that can affect both the correctness and performance of uni-processor synchronization strategies.

The idea can be applied on the synchronization of uni-processors frequently, such as disabling interrupts in the kernel or depending on the non-preempt ability of a server thread are not directly applicable on multiprocessors. Although they can be allowed only a single processor to be in the kernel at a time (or a single process to be executing in a server at a time), this serializes all requests and is not acceptable for performance reasons.

Cache coherency

If thread one in the first processor is operating on the same data as thread one in the second processor, each processor will have its own copy of the data in its cache. The system must ensure that in case of changes done by any processor must be reflected accurately to whole copies by the threads. This is called maintaining cache coherency. These operations are performed by the system hardware.

Cache coherency occurs when each processor in a multiprocessor system have the same value of data item in its own cache as the value that is in System Memory. If any processor wants to update the data value, then an inconsistent view of memory can result. This situation is transparent to the software but maintaining that state can affect the software’s performance. In some cases, the data values in a processor cache or System Memory may not be updated until the next time the data is accessed.

So, at a given point in time, the value of a data item may be different between a processor’s cache and System Memory. The system maintains cache coherency by detecting when a data item is about to be accessed and performing the operations needed to update the data item with the correct value before allowing access to complete.

Snooping

Snooping is used to keep the track of cache subsystem and each of its cache lines. This technique monitors all transactions that take place on the system bus and detects when a read or write operation takes place on an address that is in its cache. When the cache subsystem “snoops” a read on the system bus to an address in its cache, it changes the state of the cache line to “shared”. If it snoops a write to that address, it will change the state of the cache line to “invalid”.

Because it is snooping the system bus, the cache subsystem will know if it has the only copy of a data item in its cache. If such a data item is updated by its own CPU, the cache subsystem will change the state of the cache line from “exclusive” to “modified”. If the cache subsystem snoops access to that data item by another processor, it can stop that access from occurring, update the data item in System Memory, and then allow the other processor’s access to proceed. It will also change the state of the cache line with the data item to “shared”.

False Sharing

The selection of a programming model looks at cases where the designer knows that data is shared between threads. False sharing occurs when separate data items that are accessed by separate threads are allocated to the same cache line.

Since data access causes an entire cache line to be read into the cache from System memory, if one data item in the cache line is shared, all of the data items in that cache line will be treated as shared by the cache subsystem. Two data items could be updated in unrelated transactions by two threads running on different cores but, if the two items are in the same cache line, the cache subsystem will have to update the System Memory in order to maintain cache coherency setting up a condition where pin ponging can occur.

An array of structures could be used to organize data that will be accessed by multiple threads. Each thread could access one structure from the array following the domain decomposition model. By following this model, data sharing between threads will not occur and the system avoids the performance impact of maintaining consistency between the caches of each thread’s core unless the structures used by the threads are in the same cache line.

If the cache line size is 64 bytes and the structure size of a structure in the array is 32 bytes, two structures will occupy one cache line. If the two threads accessing the two structures are running on different cores, an update to one structure in the cache line will force the entire cache line to be written to System Memory. It will also invalidate the cache line in the cache on the second core.

The next time the second structure is accessed, it will have to be read from System Memory. If a sequence of updates is done to the structures, the performance of the system can seriously degrade.

One technique to avoid false sharing is to align data items on cache line boundaries using compiler alignment directives. However, over-use of this technique can result in cache lines that are only partially used. True sharing refers to cases where the sharing of data between threads is intended by the software designer.

Processor affinity

Processor affinity also called as CPU pinning. It allows allocating or de-allocating of threads or processes to the same CPU. By using this technique, the CPU can manage its tasks in a given time. This happens when either CPU has multiple processors or multiple cores.

In the figure below, processes of interrupts are mapped to a single CPU, it means no affinity at all. But on another side of the diagram, two CPU’s are sharing the processes to manage the workload. This is the case of processor affinity. In this way, a process or interrupt will be assigned to only designated CPU or core.

enter image description here

Fig: Processor Affinity

Programming Models

There are two programming models in software designing, functional decomposition, and data (domain) decomposition. These help in assigning the work to threads. The task of functional decomposition is to divide the software work into small units or threads. One thread holds the multiple operations but each thread has given a particular function to perform fast. In data decomposition, data sets are decomposed into parts and software works on it independently. Each thread operates on separate data component parallel.

ADD COMMENT EDIT