What is Apache Cassandra?

Connect with

Apache Cassandra
Apache Cassandra is Faul-tolerance, massive scaling NOSQL database.

Cassandra in 50 Words

“Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, row-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web.”

I had picked above 50 words from Cassandra: The Definitive Guide, 2nd Edition – O’Reilly Media , I highly recommend to read this book if you really want to go in-depth of Apache Cassandra.

If you want to learn Apache Cassandra quickly in few days you can go for Apache Cassandra Essential . I’m the lead reviewer of this book and I know once you pick this you love to learn Cassandra quickly.

Key Features of Cassandra

  1. High Availability
  2. NO SPOF (Single Point of Failure)
  3. Scale Horizontally (Linear Availability / Scale Out)
  4. Peer-to-peer Architecture ( no primary secondary)
  5. Eventual Consistency
  6. Tunable tradoff between consistency and latency
  7. Minimum Administration
Apache Cassandra High Level Features
Fig: Apache Cassandra High Level Features

Cassandra Operations per Sec

Cassandra ops/sec
Fig: Apache Cassandra ops/sec

Who are Using Cassandra?

There are so many companies using Apache Cassandra worldwide few of them are: Netflix, Twitter, Cisco, Rackspace, Constant Contact, Reddit, … The largest known Cassandra cluster has over 300 TB of data in over 400 machines.

Architecture

Shared Nothing Architecture
The Cassandra database is a shared-nothing architecture, as it has no central controller and no notion of master/slave; all of its nodes are the same it means peer-to-peer architecture.
Shared-nothing architecture was more recently popularized by Google, which has written systems such as its Bigtable database and its MapReduce implementation that do not share state, and are therefore capable of near-infinite scaling.

for more details visit this: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf

Cassandra is distributed, which means that it is capable of running on multiple machines.
The fact that Cassandra is decentralized means that there is no single point of failure. All of the nodes in a Cassandra cluster function exactly the same. This is sometimes referred to as “server symmetry.” Because they are all doing the same thing

In short, because Cassandra is distributed and decentralized, there is NO SPOF (single point of failure), which supports high availability.

Elastic Scalability
Scalability is an architectural feature of a system that can continue serving a greater number of requests with little degradation in performance. Vertical scaling—simply adding more hardware capacity and memory to your existing machine—is the easiest way to achieve this. Horizontal scaling means adding more machines that.
Elastic scalability refers to a special property of horizontal scalability.

High Availability and Fault Tolerance
The availability of a system is measured according to its ability to fulfill requests.

History of Apache Cassandra

Cassandra was born by marrying of Google’s Bigtable and Amazon’s Dynano paper. Started development at Facebook in 2006 , in Java Language.

Apache Cassandra
Fig: Apache Cassandra Born from Bigtable and Dynamo

Reference

  1. Apache Cassandra
  2. Apache Cassandra Essential

Your comments or suggestions are welcome to improve this post. cheers 🙂


Connect with

5 thoughts on “What is Apache Cassandra?

  1. Pingback: Swati
  2. Pingback: Asmita
  3. Pingback: Swati
  4. Pingback: Balu
  5. Pingback: kan

Leave a Reply

Your email address will not be published. Required fields are marked *