Apache Cassandra is well known column-oriented data based in the NOSQL big data space. Lest us discuss one by one of it’s primary caveats for the relational developer, it means, if you have the background of relational then you have to get out of that and think from the different perspective. If you are learner of Apache Cassandra you can visit my previous post about what is Apache Cassandra?. The RDBMS best practices can be good practices in Cassandra.
In the software world, we came from different background, it means, with different roles and with different responsibilities in a company. Some of them from: hard-core developer, Project manager, Architect, Product Manager, Operational team, devOps team, IT-infrastructure, business , process. The thinking mindset of all of them are different from different perspective. Being an architect, you have to think on different NFR (Non-Functional Requirements) for you business Use Case. These following things will certainly help out your business need.
The following are the primary caveats for the relational developer in Apache Cassandra:
- No Normalization:
- Focus on Query:
- Physical Topology of Cluster:
- Collection (Set, List, Map):
- No Foreign key & no Referential Integrity:
- No Join:
- No Sequence:
- Trade-off in ACID:
- Column oriented storage engine:
- Schema-less & Data Type
Apache Cassandra does not supports normalize structure of Data, if you do so , you have to pay for this , this is anti-pattern in Cassandra. Normalization is the best practices in any RDBMS but this is bad practices in Cassandra.
Apache Cassandra not only encourage but also allow to De-normalize your data structure to store in its storage engine.
In the relational world , first identify business domain model and start storing data and later point of time you start pulling the data. In Apache Cassandra, you have to reverse your thinking, first you thing what data you want to get and what would be the query for that , based on the query you need to thing how to store data, How to choose Primary key, cluster key, and data type in order to distribute data and in order to scale out the application.
The physical topology are different if you compare to RDBMS world. Cassandra is peer-to-peer architecture it means no primary no secondary, no master no slave, all the node in the cluster treats equally.
in the RDBMS, we keep the data straight forward , based on the data types available for that RDBMS database. Apache Cassandra, supports complex data type i.e.
Set ( which keeps unique data) ,
list ( a list of any type) ,
Map ( key and value) where key and value can be any type. Here,
List are same data type as in Java. You know , Cassandra is developed in Java so these complex data type derived from java language, which almost all language have these type.
Foreign key and referential integrity check is the one of the beauty of RDBMS but not in Apache Cassandra. Cassandra does not support any forms of foreign key, referential integrity check.
You know almost all the NOSQL does not support join, in order to scale out of your system. Apache Cassandra also does not support any type of join on database level. If join is not supported, then how can you make relationship between different entities? This is very basic question which must raise in your mind , what are the work around to achieve this. One of the option is, you can join in your application level not on database level, but remember if you do so, you have to pay something like performance cost.
Apache Cassandra does not supports any sequence like RDBMS, it support counter data type to ncrement your counter value for your business needs, like number of pageView of a specific URL in your application etc.
According to CAP theorem Apache Cassandra lies in the A (Availability) and P (Partition Tolerance), but support eventual consistency. There is a trade-off of ACID and BASE which you have to opt for your business need.
Cassandra store data in
Map. It does not store in completely row-oriented fashion like any RDBMS.
Apache Cassandra supports all type of data: Structured data, Semi-Structured data and Unstructured data. Cassandra is not completely schema-less but due to its flexible nature of schema and supporting of complex data type , you can say its schema-less.
In a nutshell, you can say, you have to think following things for your business Use-Case, which Apache Cassandra provides and well known for:
- scale-out (Horizontal Scaling)
- High Availability
- Distributed database,
- Massive storage engine in Petabyte, exabyte..
- Super fastening write and Optimized read,
- Eventual consistency
Your suggestions are welcome to improve this article 🙂