Learn what are the caveats for relational developer in Apache Cassandra? Key points of caveats: normalization, focus on query, physical topology of cluster, join, etc.
1. Caveats for Relational Developer in Apache Cassandra
Most of us came from Relational background so we somewhere think relational beauty on the first go. Apache Cassandra is well known column-oriented data based in the NoSQL big data category. Let us discuss one by one of its primary caveats for the relational developer. If you have a background in relational databases then you have to get out of that and think from a different perspective. If you are a learner of Apache Cassandra you can visit what is Apache Cassandra?. The RDBMS best practices can be bad practices in Cassandra.
In the software world, we came from different backgrounds, with different roles and with different responsibilities in a company. Some of them from the hard-core developer, Project manager, Architect, Product Manager, Operational team, DevOps team, IT-infrastructure, business, process. The thinking mindset of all of them is different from different perspectives. For instance, being an architect, you have to think on different NFR (Non-Functional Requirements) for your business Use Case. The following things will certainly help out your business need.
2. Key points of caveats
The following are the primary caveats for a relational developer in Apache Cassandra:
- No Normalization:
- De-normalization:
- Focus on Query:
- Physical Topology of Cluster:
- Collection (Set, List, Map):
- No Foreign key & no Referential Integrity:
- No Join:
- No Sequence:
- Trade-off in ACID:
- Column oriented storage engine:
- Schema-less & Data Type
Apache Cassandra does not supports normalize structure of Data, if you do so, you have to pay for this, this is an anti-pattern in Cassandra. In another word, normalization is the best practice in any RDBMS but this is bad practice in Cassandra.
Apache Cassandra not only encourages denormalization but also allows you to De-normalize your data structure to store in its storage engine.
In the relational world, first, identify the business domain model and start storing data, and later point in time you start pulling the data. In Apache Cassandra, you have to reverse your thinking, first, you think about what data you want to get and what would be the query for that, based on the query you need to think about how to store data. How to choose Primary key, cluster key, and data type in order to distribute data and in order to scale out the application.
The physical topology is different if you compare it to the RDBMS world. Cassandra is peer-to-peer architecture it means no primary no secondary, no master no slave, all the node in the cluster treats equally.
In the RDBMS, we keep the data straight forward , based on the data types available for that RDBMS database. Apache Cassandra, supports complex data type i.e. Set
( which keeps unique data) , list
( a list of any type) , Map
( key and value) where key and value can be any type. Here, Set
, Map
and List
are same data type as in Java. You know , Cassandra is developed in Java so these complex data type derived from java language, which almost all language have these type.
Foreign key and referential integrity checks are one of the beauties of RDBMS but not in Apache Cassandra. Cassandra does not support any forms of foreign key, referential integrity check.
You know almost all the NoSQL does not support join, in order to scale out of your system. Apache Cassandra also does not support any type of join on the database level. If join is not supported, then how can you make the relationship between different entities? This is a very basic question that must raise in your mind, what is the workaround to achieve this. One of the options is, you can apply to join at your application level, not on the database level, but remember if you do so, you have to pay something like performance cost.
Apache Cassandra does not support any sequence like RDBMS, it supports counter data type to increment your counter value for your business needs, like a number of pageView of a specific URL in your application, etc.
According to CAP theorem Apache Cassandra lies in the A (Availability) and P (Partition Tolerance), but support eventual consistency. There is a trade-off of ACID and BASE which you have to opt for your business need.
Cassandra store data in Map
. It does not store in completely row-oriented fashion like any RDBMS.
Apache Cassandra supports all type of data: Structured data, Semi-Structured data and Unstructured data. Cassandra is not completely schema-less but due to its flexible nature of schema and supporting of complex data type , you can say its schema-less.
3. Conclusion
In a nutshell, you can say, you have to think following things for your business Use-Case, which Apache Cassandra provides and well known for. In order to understands caveats for relational developer
- scale-out (Horizontal Scaling)
- Fault-Tolerance
- High Availability
- Distributed database,
- Massive storage engine in Petabyte, exabyte..
- Super fastening write and Optimized read,
- Eventual consistency
If you wnat to deep dive in Cassandra you can visit Key Concepts Data Structures and Algorithms In Cassandra
4. Reference
Apache Cassandra official Site
I hope you enjoyed this post about what are the caveats for relational developer in Apache Cassandra, and you can visit Apache Cassandra tutorial for more blog post.
Your suggestions or comments are welcome to improve this post. Happy Learning! 🙂
very informative post , really like this
WOW just what I was searching for. Came here by searching like this