written 8.7 years ago by | modified 8.7 years ago by |
Distributed File Systems
A good distributed file system should have the following features:
1. Transparency
Transparency refers to hiding details from a user. The following types of transparency are desirable.
i. Structure transparency: Multiple file servers are used to provide better performance, scalability, and reliability. The multiplicity of file servers should be transparent to the client of a distributed file system. Clients should not know the number or locations of file servers or the storage devices instead it should look like a conventional file system offered by a centralized, time sharing operating system.
ii. Access transparency: Local and remote files should be accessible in the same way. The file system should automatically locate an accessed file and transport it to the client’s site.
iii. Naming transparency: The name of the file should not reveal the location of the file. The name of the file must not be changed while moving from one node to another.
iv. Replication transparency: The existence of multiple copies and their locations should be hidden from the clients where files are replicated on multiple nodes.
2. User mobility
The user should not be forced to work on a specific node but should have the flexibility to work on different nodes at different times. This can be achieved by automatically bringing the users environment to the node where the user logs in.
3. Performance
Performance is measured as the average amount of time needed to satisfy client requests, which includes CPU time plus the time for accessing secondary storage along with network access time. Explicit file placement decisions should not be needed to increase the performance of a distributed file system.
4. Simplicity and ease of use
5. User interface to the file system be simple and number of commands should be as small as possible. A DFS should be able to support the whole range of applications.
6. Scalability
A good DFS should cope with an increase of nodes and not cause any disruption of service. Scalability also includes the system to withstand high service load, accommodate growth of users and integration of resources.
7. High availability
A distributed file system should continue to function even in partial failures such as a link failure, a node failure, or a storage device crash. Replicating files at multiple servers can help achieve availability.
8. High reliability
Probability of loss of stored data should be minimized. System should automatically generate backup copies of critical files in event of loss.
9. Data integrity
Concurrent access requests from multiple users who are competing to access the file must be properly synchronized by the use of some form of concurrency control mechanism. Atomic transactions can also be provided to users by a file system for data integrity.
10. Security
A distributed file system must secure data so that its users are confident of their privacy. File system should implement mechanisms to protect data that is stored within.
11. Heterogeneity
Distributed file system should allow various types of workstations to participate in sharing files via distributed file system. Integration of a new type of workstation or storage media should be designed by a DFS.
File-sharing semantics
Multiple users may access a shared file simultaneously. An important design issue for any file system is to define when modifications of file data done by a user are visible to other users. This is defined by the file-sharing semantics used by the file system.
1. UNIX semantics
Absolute time ordering is enforced on operations which ensure that read operation on a file sees the effects of all previous write operations performed on that file. Write to an open file immediately become visible to users accessing the file at the same time.
2. Session semantics
A session is a series of file accesses made between the open and close file operations. The changes made to a file are made visible only to the client process that opened the session and is made invisible to the other remote processes that have the same file open simultaneously. The changes made to the file are available to the remote processes only after the session is closed.
3. Immutable shared-files semantics
This is based on the use of immutable file model where an immutable file cannot be modified once it is created. Changes to the file are handled by creating a new updated version of the file. The semantics allows the file to be shared only in the read-only mode. With this approach, shared files cannot be shared at all.
4. Transaction-like semantics This is based on the use of transaction mechanism which ensures that partial changes made to the shared data by a transaction will not be visible to other concurrently executing transactions until the transaction ends.