Run Cassandra on EC2 Instance

Subscribe Send me a message home page tags


In this post, we will document how to install Cassandra and run it as a server on a EC2 instance.

Install Cassandra

First, we need to follow the instructions on the Cassandra website to install the software. We choose to install Cassandra via APT.

The jdk on my machine is

openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

The Cassandra version we will install is 3.11.11

Here are the commands to execute:

Add cassandra to apt source list:

echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

Perform a apt update:

curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-get update

We may see errors in the outputs complaining about missing public key. To fix the issue, we could use the following command:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys [missing-key-value]

Install cassandra:

sudo apt-get install cassandra

Updathe Permission

It's recommended not to run Cassandra as root. This means we cannot run Cassandar as root user or use sudo. When Cassandra is installed, the software is "owned" by cassandra user; however, the default user when we log in to an EC2 instance is ec2-user or ubuntu. If we start Cassandra directly, we would see access denied error.

To fix this issue, we need to grant the user appropriate permissions, one way to do this is to change the owner of /var/lib/cassandra and /var/log/cassandra directory.

sudo chown -R [user]:[user] /var/lib/cassandra
sudo chown -R [user]:[user] /var/log/cassandra

At this point, we should be able to start Cassandra and run cqlsh locally.

How to allow remote connection?

There are three parts:

First, we need to make sure the following ports are open on our machine. For more information, please check the official documentation.

To open the ports, we could use the following command:

sudo ufw allow [port-number]

Second, we need to update inbound and outbound rules in AWS Security Group to allow traffic for these ports. Keep in mind that open these port to the public is a security risk so we should only allow access that is truly necessary.

The last step is to configure Cassandra. The configuration file is /etc/cassandra/cassandra.yaml. Recall that for an EC2 instance, we have public IP and private IP and most of the time they are different.

We need to change the following items in the cassandra.yaml file

(To be honest, not sure why this setup works.)

Now we should be able to connect to cassandra from a different machine by running

cqlsh ec2_public_ip_address

Other Common Questions

How to specify java version for Cassandra?

We first need to find where Java is installed. For example, on Mac, java is usually installed in /Library/Java/JavaVirtualMachines

Ryan@Mac $ls -1 /Library/Java/JavaVirtualMachines
adoptopenjdk-11.jdk
jdk-10.0.2.jdk
jdk1.8.0_60.jdk

As we can see, there are three different java versions installed.

Now we need to specify the java version in Cassandra configuration file. The configuration file in question is in bin/cassandra.in.sh. Set the JAVA_HOME variable to the java home directory. For instance the configuration below instructs Cassandra to use Java 8

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home

Why Cassandra does not start and there is no outputs or error messages?

We could quickly check if there is any running Cassandra process on the machine

ps aux | grep java

If there is no java program running and nothing happens when we start Cassandra, it may indicate that the machine is unable to start the Cassandra application. Note that Cassandra is designed to handle heavy workload and we are supposed to run it on some powerful machine.

If we run in on a t2.micro instance, we only have 1 vcpu and 1G memory:

ubuntu@:/etc/cassandra$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          1
On-line CPU(s) list:             0
Thread(s) per core:              1


ubuntu@:/etc/cassandra$ free -m
              total        used        free      shared  buff/cache   available
Mem:            978         475         236           1         266         351
Swap:             0           0           0

In such cases, we may need to change the Cassandra jvm configuration so that it does not consume all system resource. The jvm configuration file is /etc/cassandra/jvm.options/

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png