written 8.7 years ago by |
A good distributed file system should have the following features:
1.Transparency
Transparency refers to hiding details from a user.The following types of transparency are desirable.
i. Structure transparency:Multiple file servers are used to provide better performance,scalability,& reliability.The multiplicity of file servers should be transparent to the client of a distributed file system.Clients should not know the number or locations of file servers or the storage devices instead it should look like a conventional file system offered by a centralized,time sharing operating system.
ii.Access transparency: Local and remote files should be accessible in the same way. The file system should automatically locate an accessed file and transport it to the client’s site.
iii.Naming transparency: The name of the file should not reveal the location of the file. The name of the file must not be changed while moving from one node to another.
iv.Replication transparency: The existence of multiple copies and their locations should be hidden from the clients where files are replicated on multiple nodes.
1.User mobility
The user should not be forced to work on a specific node but should have the flexibility to work on different nodes at different times. This can be achieved by automatically bringing the users environment to the node where the user logs in.
2.Performance
Performance is measured as the average amount of time needed to satisfy client requests, which includes CPU time plus the time for accessing secondary storage along with network access time. Explicit file placement decisions should not be needed to increase the performance of a distributed file system.
3.Simplicity and ease of use
User interface to the file system be simple and number of commands should be as small as possible. A DFS should be able to support the whole range of applications.
4.Scalability
A good DFS should cope with an increase of nodes and not cause any disruption of service. Scalability also includes the system to withstand high service load, accommodate growth of users and integration of resources.
5.High availability
A distributed file system should continue to function even in partial failures such as a link failure, a node failure, or a storage device crash. Replicating files at multiple servers can help achieve availability.
6.High reliability
Probability of loss of stored data should be minimized. System should automatically generate backup copies of critical files in event of loss.
7.Data integrity
Concurrent access requests from multiple users who are competing to access the file must be properly synchronized by the use of some form of concurrency control mechanism. Atomic transactions can also be provided to users by a file system for data integrity.
8.Security
A distributed file system must secure data so that its users are confident of their privacy. File system should implement mechanisms to protect data that is stored within.
9.Heterogeneity
Distributed file system should allow various types of workstations to participate in sharing files via distributed file system. Integration of a new type of workstation or storage media should be designed by a DFS.
Modifications propagated in file caching schemes
When the cache is located on client’s nodes, a files data may simultaneously be cached on multiple nodes. It is possible for caches to become inconsistent when the file data is changed by one of the clients and the corresponding data cached at other nodes are not changed or discarded.
There are two design issues involved:
1.When to propagate modifications made to a cached data to the corresponding file server.
2.How to verify the validity of cached data.
The modification propagation scheme used has a critical effect on the system’s performance and reliability. Techniques used include:
a.Write-through scheme.
When a cache entry is modified, the new value is immediately sent to the server for updating the master copy of the file.
Advantage: - High degree of reliability and suitability for UNIX-like semantics. - This is due to the fact that the risk of updated data getting lost in the event of a client crash is very low since every modification is immediately propagated to the server having the master copy.
Disadvantage:
- This scheme is only suitable where the ratio of ready-to-write accesses is fairly large. It does not reduce network traffic for writes.
- This is due to the fact that every write access has to wait until the data is written to the master copy of the server. Hence the advantages of data caching are only read accesses because the server is involved for all write accesses.
b.Delayed-write scheme.
To reduce network traffic for writes the delayed-write scheme is used. In this case, the new data value is only written to the cache and all updated cache entries are sent to the server at a later time.
There are three commonly used delayed-write approaches:
i.Write on ejection from cache
Modified data in cache is sent to server only when the cache-replacement policy has decided to eject it from client’s cache. This can result in good performance but there can be a reliability problem since some server data may be outdated for a long time.
ii.Periodic write
The cache is scanned periodically and any cached data that has been modified since the last scan is sent to the server.
iii. Write on close
Modification to cached data is sent to the server when the client closes the file. This does not help much in reducing network traffic for those files that are open for very short periods or are rarely modified.
Advantages
- Write accesses complete more quickly because the new value is written only client cache. This results in a performance gain.
- Modified data may be deleted before it is time to send to send them to the server (e.g. temporary data). Since modifications need not be propagated to the server this results in a major performance gain.
- Gathering of all file updates and sending them together to the server is more efficient than sending each update separately.
Disadvantage
- Reliability can be a problem since modifications not yet sent to the server from a client’s cache will be lost if the client crashes.