What is Cassandra?

Ebru Dalkır
4 min readNov 23, 2022

Apache Cassandra is an open source NoSQL database with a scalable, distributed, fault-tolerant and decentralized architecture where we can store large amounts of data. Large amounts of data can be managed across multiple data centers using Cassandra.

Examples of usage areas are Internet of Things applications, Messaging applications, Social media applications and Suggestion systems.

The terminology used in Apache Cassandra is briefly as follows.

  • Node: The place where the data is stored (server or virtual server)
  • DataCenter: A collection of related nodes
  • Cluster: A cluster of one or more data centers
  • Replication: Keeping a copy of the data in one node in another node. A node that keeps a copy of the data in a node is called a Replica.

Apache Cassandra’s key features:

Hybrid : On-premise, private and public cloud can be easily integrated without interruption.

Fault-Tolerance : In Cassandra, multiple Node copies of the data are kept. Called replication, a problem with a Node does not cause data loss and makes the system more resilient to errors/problems. When a node has a problem, read/write requests are forwarded to other nodes. Replication also enables the Cassandra cluster to respond to concurrent requests, thus increasing performance.

It has decentralized architecture : As in the Master-Slave architecture, there is no master Node that manages and organizes other nodes. All nodes are equal and perform similar tasks. Each node in the cluster accepts read/write requests regardless of which node the data is kept in the cluster. In Cassandra, all nodes communicate using the gossip protocol, which is a peer-to-peer communication protocol.

Consistency : Data consistency is ensured if the last written value is always read when a read request is made. All clients see the same value for the same query.

There is no Single Point Of failure : In Cassandra, since all nodes perform similar tasks and there is no Master who organizes other nodes as in the Master-slave architecture, a problem with a node does not cause the whole system to crash. In Master-Slave architectures, a problem that may occur in the master node can cause the whole system to crash.

Availability :The database is always accessible by the clients and can always be read/written by the clients.

Distributed : Data is automatically distributed over multiple nodes in the cluster.

Scalable : Cassandra is a horizontally scalable database and new nodes can be easily added thanks to its ring architecture. Newly added nodes do not require changing the configurations of the entire cluster or restarting other nodes.

Partition Tolerance : Even if the connection between some of the nodes in the cluster is broken, the system continues to work.

Let’s talk about how to install cassandra on Centos.

Cassandra Set-up on Centos

$ yum update
$ sudo yum install -y java-1.8.0-openjdk
$ java -version
$ sudo yum install python –y
$ cat <<EOF | sudo tee -a /etc/yum.repos.d/cassandra311x.repo

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
EOF

$ sudo yum install cassandra -y
$ sudo systemctl daemon-reload
$ sudo service cassandra start

-------Starting cassandra (via systemctl): - OK

$ sudo chkconfig cassandra on

- After changing in "/etc/cassandra/conf/cassandra.yaml" to with server-ip instead of localhost, Cassandra should be restarted.

seeds: "<your-server-ip>"
listen_address: "<your-server-ip>"
rpc_address: "<your-server-ip>"


- You can use the cqlsh shell to interact with Apache Cassandra:
$ cqlsh <your-server-ip>

Create Cassandra Cluster with 3 Nodes

$ service cassandra stop
$ rm -rf /var/lib/cassandra/data/system/*
$ vi /etc/cassandra/conf/cassandra.yaml

cluster_name: 'JC Cluster'
num_tokens: 256
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: <your-server-ip>, <your-server-ip2>
listen_address: <your-server-ip>
rpc_address: <your-server-ip>
endpoint_snitch: GossipingPropertyFileSnitch

$ systemctl restart cassandra
$ nodetool status

It offers CQL (Cassandra Query Language = Cassandra Query Language) similar to Cassandra SQL.

All keyspaces can be displayed by running the describe keyspaces command on the CQL command line.

It is specified in the replication strategy when creating the keyspace.

The replication factor specified in SimpleStrategy is applied to the entire cluster. In NetworkTopologyStrategy, the replication factor is specified separately for each data center.

I hope it was a useful writing. Enjoyable reading already…

--

--