Learn key concepts in Cassandra, data structure and Algorithms in Cassandra. Few of the concepts picked from Amazon’s Dynamo and a few of them picked from Google’s BigTable, that is why Cassandra is best known for the rich features in NOSQL space.
1. Key Concepts Data Structures and Algorithms in Cassandra
For any NoSQL key concepts is the basic building block and backbone behind developing of the database engine. Apache Cassandra has a rich set of features because a few of the concepts picked from Amazon’s Dynamo database and a few of the key concepts picked from Google’s BigTable. Apache is an amalgamation of both the BigTable and Dynamo which is why it’s the best NoSQL database in the Column-oriented family.
2. Key Concepts of Apache Cassandra
Following are few of the key Concept, Data structures and Algorithm used in Apache Cassandra.
- Gossip protocol:
The Gossip protocol is similar to real-world gossip, where a node says “A” tells a few of its peers in the cluster what it knows about the state of a node say “B”. Those nodes tell a few other nodes about node “B”, and over a period of time, all the nodes know about node “B” and its peer. Cassandra is a peer-to-peer system with no single point of failure, the cluster topology information is communicated via the Gossip protocol. - Partitioners:
Apartitioner
determines how data is distributed across the nodes in the cluster. TheMurmur3Partitioner
is the default partitioner. You can also create your own partitioner by implementing theorg.apache.cassandra.dht.IPartitioner
class and placing it on Cassandra’s classpath. - Replication Strategies:
The first replica will always be the node that claims the range in which the token falls, but the remainder of the replicas are placed according to the replication strategy chosen while creating of keyspace. The first replica will always be the node that claims the range in which the token falls, but the remainder of the replicas are placed according to the replication strategy.
3. Key Data Structures and Algorithms In Apache Cassandra
Following are a few of the key Concepts, Data structures and Algorithms used in Apache Cassandra.
- memtable:
memtable
is memory-resident data structure. After it’s written to the commit log, the value is written to a memory-resident data structure called thememtable
. - SSTable:
SSTable is a compaction of Sorted String Table. When the number of objects stored in thememtable
reaches a threshold, the contents of the memtable are flushed to disk in a file called anSSTable
. TheSSTable
is a concept borrowed from Google’s Bigtable. Once a memtable is flushed to disk as an SSTable, it is immutable and cannot be changed by the application. - Tombstone:
Tombstone is a similar like soft-delete concept implementation in Cassandra. A tombstone is a deletion marker that is required to suppress older data inSSTable
until compaction can run. - Bloom Filters:
Bloom filters are very fast, non-deterministic algorithms for existence checking whether an element is a member of a set. They are non-deterministic because it is possible to get a false-positive read from a Bloom filter, but not a false-negative. Bloom filters are used in other distributed database and caching technologies: Apache Hadoop, Google’s Bigtable, and Squid Proxy Cache. - Snitches:
A snitch determines which datacenters and racks nodes belong to. This help while replication of data to another node. - Commit Log:
When perform a write operation, it’s immediately written to a commit log. The commit log is a crash-recovery mechanism that supports Cassandra’s durability goals.
4. Reference
Apache Cassandra official Site
I hope you enjoyed this post of key concepts, data structures and Algorithms in Cassandra, and you can visit Apache Cassandra tutorial for more blog post.
Your suggestions or comments are welcome to improve this post.