written 3.2 years ago by |
Key Points on CAP Theorem The three aspects of the CAP theorem are consistency, Availability, and Partition tolerance. Let’s first discuss all of these separately then we will join the pieces.
Consistency According to this theorem, all connected nodes of the distributed system see the same value at the same time and partial transactions will not be saved. Suppose there are multiple steps inside a transaction and due to some malfunction some middle operation got corrupted, now if part of the connected nodes read the corrupted value, the data will be inconsistent and misleading. So according to the CAP principle, we will not allow such a transaction. A transaction cannot be executed partially. It will always be ‘All or none’. If something goes wrong in between the execution of a transaction, the whole transaction needs to be rolled back.
Availability According to this, the connected or distributed systems should remain operational all the time. There should be a response for every client request in the system irrespective of a particular node is being available or not. Though in a practical scenario it is purely based on the traffic requirements. The key point of this is every functioning node must return a response for all read and write requests in a reasonable amount of time.
Partition tolerance According to the partition tolerance policy, if a subpart of the network is compromised, the entire distributed system should not go down. A system that is partition tolerance should recover fast from partial outrage. In practical scenarios partition tolerance cannot be an optional criterion, it should be maintained thoroughly. So adhering CAP theorem became always a choice between high consistency and high availability.
We cannot maintain all three principles of the CAP theorem simultaneously. Theoretically, we can maintain only CA, CP, or AP.
Consistency and Availability: These are systems with high consistency and very lesser downtime but the option of partition tolerance is not enforced. For example, network issues can down the entire distributed RDBMS system.
Consistency and Partition tolerance: These systems adhere to high consistency and partition tolerance but there is a risk of some data being unavailable. Ex. MongoDB.
Availability and Partition tolerance: These systems adhere to high availability and partition tolerance but there is a risk of reading inconsistent data. Ex. Cassandra.
How is CAP theorum different from ACID Properties
The CAP theorem asserts that any distributed system that uses data from different locations can have at most two of the three desirable CAP properties. The NoSQL movement has applied the CAP theorem as an argument against traditional ACID (atomicity, consistency, isolation, and durability) databases, which prioritize consistency and partition-tolerance at the cost of a potentially low availability.
Recently, Brewer has modified the CAP theorem, pointing out that all the CAP properties are more or less continuous, and possible to optimize, weighing them against each other, in practice it is possible for an application area to have both relative high availability and sufficient data consistency, despite the presence of network partitions.
The overall objective of this paper is to improve the CAP optimization methods by using optimization techniques outside of those preferred in CAP optimization literature. The main contribution is to use relaxed ACID properties in the CAP optimization process. This may be viewed as a bridge between the CAP theorem and the traditional ACID theory.
Traditional ACID properties are weakened, but not completely dropped, in order to optimize CAP properties. From a user point of view, systems should thus function as if both the traditional ACID properties and all the CAP properties were implemented.
This optimizing is especially important in mobile integrated databases, where disconnections are normal and frequent. It is also important in distributed databases like EHR (electronic Health Records) where many different hospital locations are involved, since the risk for disconnections increases with the number of participating locations. We use distributed integrated EHR databases as an example where our optimizing method may contribute.