CAP theorem states, you have option to pick only 2 at any point of time out of three: C, A or P.
– C stand for Consistency,
– A stand for Availability and
– P stand for Partition Tolerance.
1. History of CAP Theorem
The CAP theorem, also known as Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed computer system to simultaneously provide all three (C, A , P ) guarantees.
Eric Brewer, the theorem first appeared in 1998. In 2012 Brewer clarified some of his positions, including why the often-used “two out of three”
2. Basic of CAP Theorem
I try to explain in basic way what are the meaning of each three term here.
All client has same view of data irrespective of delete or update .
Each client can always read and write.
Availability can be achieved from distributed system with some cost. How much cost is acceptable for your business trade-off, based on this you can make your system available , if you want to know about the Availability of system you can read this post.
2.3 Partition tolerance
All system works well despite of physical network partitioned.
The meaning of above is, system continues to operate as expected even with node failure. In the above quotes, network partitioned means , it could partitioned in two one part is group of working node and another part may be a group of breakdown nodes.
3. Triangular View of CAP Theorem
Here, the above equilateral triangle view of CAP theorem, explain a lot of high level things.
- CA (Consistency and Availability) : CA says: Single site cluster, therefore all nodes are always in contact, when a partition occurs, the system blocks. Choose C and A with compromising of P (Partition Tolerance). e.g. type of applications: Banking and Finance application, system which must have transaction e.g. connected to RDBMS.
- AP ( Availability and Partition Tolerance): AP says: System is still available under partitioning, but some of the data returned may be inaccurate. Choose A and P with compromising of C (Consistency). When to choose AP to achieve what is itself a question. There is a use case: return the most recent version of the data you have, which could be stale. This system state will also accept writes that can be processed later when the partition is resolved. Availability is also a compelling option when the system needs to continue to function in spite of external errors. Type of applications: shopping carts, any consumer facing system, News publishing CMS ( e.g. times of India news site), etc.
- CP (Consistency and Partition Tolerance): CP says some data may not be accessible, but the rest is still consistent/accurate. choose based on the requirement analysis. for example: Cassandra, Amazon’s DynamoDB, Voldemort, Riak, simpleDB etc.
4. When to Opt What ?
In a nutshell, the following are the key point which may guide when to choose what. When to choose what is some times very difficult but most of the time, in all the three first out of three is very clear, only second one out of three have to choose very carefully and that must be chosen cleverly based on the experience that you have.
- choose Consistency over Availability when your business requirements dictate atomic reads and writes.
- choose Availability over Consistency when business requirements allow for some flexibility, to synchronizes data with some acceptable delay ( or when network failed restored in working mode).
- The choosing between Consistency and Availability is a software trade off.
- In any business requirement, you must have first one of three i.e. primary out of (C, A and P) , but second should be chosen carefully
As an architect, the control is in your hands, choose what to do for facing a network partition. Network failure is obvious, don’t assume network will not fail, assume it will fail when. Network outages can be either short duration or long duration, it can be in different form.
Building distributed systems provide many advantages, but also adds complexity. Understanding the trade-offs available to you, for facing of network errors, and choosing the right path is crucial to the success of your application.
In a nutshell, first of all analyse the requirements of your application, database, NFR and ask possible question to yourself, few of them as:
- do you required Linear Scalability?
- what type of data you have?
- What extends of Data Loss is acceptable?
- No downtime?
- No Data loss?
- Performance under workload (irrespective of read heavy, write heavy , read-write heavy)?
- How much Consistency you required?
- Which philosophy you choose either ACID or BASE?
- another requirements…. etc
Please write your comment or suggestion, if you like this post.