written 5.9 years ago by | modified 5.6 years ago by |
Mumbai University > Computer Science > Sem 8 > Parallel And Distributed System
written 5.9 years ago by | modified 5.6 years ago by |
Mumbai University > Computer Science > Sem 8 > Parallel And Distributed System
written 5.6 years ago by |
A file-caching scheme for a distributed file system contributes to its scalability and reliability as it is possible to cache remotely located data on a client node. Every distributed file system use some form of file caching.
The following can be used:
1.Cache Location
Cache location is the place where the cached data is stored. There can be three possible cache locations
i.Servers main memory:
A cache located in the server’s main memory eliminates the disk access cost on a cache hit which increases performance compared to no caching.
The reason for keeping locating cache in server’s main memory-
ii.Clients disk:
If cache is located in clients disk it eliminates network access cost but requires disk access cost on a cache hit. This is slower than having the cache in servers main memory. Having the cache in server’s main memory is also simpler.
Advantages:
Disadvantages:
iii.Clients main memory
A cache located in a client’s main memory eliminates both network access cost and disk access cost. This technique is not preferred to a client’s disk cache when large cache size and increased reliability of cached data are desired.
Advantages:
2.Modification propagation
When the cache is located on client’s nodes, a files data may simultaneously be cached on multiple nodes. It is possible for caches to become inconsistent when the file data is changed by one of the clients and the corresponding data cached at other nodes are not changed or discarded.
The modification propagation scheme used has a critical effect on the systems performance and reliability.
Techniques used include –
i.Write-through scheme
When a cache entry is modified, the new value is immediately sent to the server for updating the master copy of the file.
Advantage:
Disadvantage:
Poor Write performance.
ii.Delayed-write scheme
To reduce network traffic for writes the delayed-write scheme is used. New data value is only written to the cache when a entry is modified and all updated cache entries are sent to the server at a later time.
There are three commonly used delayed-write approaches:
Modified data in cache is sent to server only when the cache-replacement policy has decided to eject it from client’s cache. This can result in good performance but there can be a reliability problem since some server data may be outdated for a long time.
The cache is scanned periodically and any cached data that has been modified since the last scan is sent to the server.
Modification to cached data is sent to the server when the client closes the file. This does not help much in reducing network traffic for those files that are open for very short periods or are rarely modified.
Advantages:
Write accesses complete more quickly that result in a performance gain.
Disadvantage:
Reliability can be a problem.
3.Cache validation schemes
The modification propagation policy only specifies when the master copy of a file on the server node is updated upon modification of a cache entry. It does not tell anything about when the file data residing in the cache of other nodes is updated. A file data may simultaneously reside in the cache of multiple nodes. A client’s cache entry becomes stale as soon as some other client modifies the data corresponding to the cache entry in the master copy of the file on the server. It becomes necessary to verify if the data cached at a client node is consistent with the master copy. If not, the cached data must be invalidated and the updated version of the data must be fetched again from the server.
There are two approaches to verify the validity of cached data:
i.Client-initiated approach
The client contacts the server and checks whether its locally cached data is consistent with the master copy.
ii.Server-initiated approach