written 8.2 years ago by | modified 2.9 years ago by |
What is cache coherence problem and how it can be solved. Please explain
written 8.2 years ago by | modified 2.9 years ago by |
What is cache coherence problem and how it can be solved. Please explain
written 8.2 years ago by | modified 8.1 years ago by |
For higher performance in a multiprocessor system, each processor will usually have its own cache. Cache coherence refers to the problem of keeping the data in these caches consistent. The main problem is dealing with writes by a processor.
There are two general strategies for dealing with writes to a cache:
Write-through - all data written to the cache is also written to memory at the same time.
Write-back - when data is written to a cache, a dirty bit is set for the affected block. The modified block is written to memory only when the block is replaced.
Software solution:
In software approach, the detecting of potential cache coherence problem is transferred from run time to compile time, and the design complexity is transferred from hardware to software.
On the other hand, compile time; software approaches generally make conservative decisions. Leading to inefficient cache utilization.
Compiler-based cache coherence mechanism perform an analysis on the code to determine which data items may become unsafe for caching, and they mark those items accordingly. So, there are some more cacheable items, and the operating system or hardware does not cache those items.
The simplest approach is to prevent any shared data variables from being cached. This is too conservative, because a shared data structure may be exclusively used during some periods and may be effectively read-only during other periods.
It is only during periods when at least one process may update the variable and at least one other process may access the variable then cache coherence is an issue More efficient approaches analyze the code to determine safe periods for shared variables. The compiler then inserts instructions into the generated code to enforce cache coherence during the critical periods.
Hardware solutions:
Hardware solution provide dynamic recognition at run time of potential inconsistency conditions. Because the problem is only dealt with when it actually arises, there is more effective use of caches, leading to improved performances over a software approach.
Hardware schemes can be divided into two categories: directory protocol and snoopy protocols.
Directory protocols:
Directory protocols collect and maintain information about where copies of lines reside. Typically, there is centralized controller that is part of the main memory controller, and a directory that is stored in main memory.
The directory contains global state information about the contents of the various local caches.
When an individual cache controller makes a request, the centralized controller checks and issues necessary commands for data transfer between memory and caches or between caches themselves.
It is also responsible for keeping the state information up to date, therefore, every local action that can effect the global state of a line must be reported to the central controller.
The controller maintains information about which processors have a copy of which lines.
Before a processor can write to a local copy of a line, it must request exclusive access to the line from the controller.
Before granting thus exclusive access, the controller sends a message to all processors with a cached copy of this time, forcing each processors to invalidate its copy.
After receiving acknowledgement back from each such processor, the controller grants exclusive access to the requesting processor.
When another processor tries to read a line that is exclusively granted to another processors, it will send a miss notification to the controller.
The controller then issues a command to the processor holding that line that requires the processors to do a write back to main memory.
Directory schemes suffer from the drawbacks of a central bottleneck and the overhead of communication between the various cache controllers and the central controller.
Snoopy Protocols:
Snoopy protocols distribute the responsibility for maintaining cache coherence among all of the cache controllers in a multiprocessor system.
A cache must recognize when a line that it holds is shared with other caches.
When an update action is performed on a shared cache line, it must be announced to all other caches by a broadcast mechanism.
Each cache controller is able to “snoop” on the network to observed these broadcasted notification and react accordingly.
Snoopy protocols are ideally suited to a bus-based multiprocessor, because the shared bus provides a simple means for broadcasting and snooping.
Two basic approaches to the snoopy protocol have been explored: Write invalidates or write- update (write-broadcast)
With a write-invalidate protocol, there can be multiple readers but only one write at a time.
Initially, a line may be shared among several caches for reading purposes.
When one of the caches wants to perform a write to the line it first issues a notice that invalidates that tine in the other caches, making the line exclusive to the writing cache. Once the line is exclusive, the owning processor can make local writes until some other processor requires the same line.
With a write update protocol, there can be multiple writers as well as multiple readers. When a processors wishes to update a shared line, the word to be updated is distributed to all others, and caches containing that line can update it.