In this post, we will document how to install Cassandra and run it as a server on a EC2 instance.
Install Cassandra
First, we need to follow the instructions on the Cassandra website to install the software. We choose to install Cassandra via APT.
The jdk on my machine is
openjdk version "1.8.0_292" OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10) OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
The Cassandra version we will install is 3.11.11
Here are the commands to execute:
Add cassandra to apt source list:
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
Perform a apt update:
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add - sudo apt-get update
We may see errors in the outputs complaining about missing public key. To fix the issue, we could use the following command:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys [missing-key-value]
Install cassandra:
sudo apt-get install cassandra
Updathe Permission
It's recommended not to run Cassandra as root. This means we cannot run Cassandar as root user or use sudo. When Cassandra is installed, the software is "owned" by cassandra
user; however, the default user when we log in to an EC2 instance is ec2-user or ubuntu. If we start Cassandra directly, we would see access denied error.
To fix this issue, we need to grant the user appropriate permissions, one way to do this is to change the owner of /var/lib/cassandra
and /var/log/cassandra
directory.
sudo chown -R [user]:[user] /var/lib/cassandra sudo chown -R [user]:[user] /var/log/cassandra
At this point, we should be able to start Cassandra and run cqlsh
locally.
How to allow remote connection?
There are three parts:
- Server level configuratiaon
- AWS configuration
- Cassandra configuration
First, we need to make sure the following ports are open on our machine. For more information, please check the official documentation.
- 7000
- 7001
- 7199
- 9042
- 9160
- 9142
To open the ports, we could use the following command:
sudo ufw allow [port-number]
Second, we need to update inbound and outbound rules in AWS Security Group to allow traffic for these ports. Keep in mind that open these port to the public is a security risk so we should only allow access that is truly necessary.
The last step is to configure Cassandra. The configuration file is /etc/cassandra/cassandra.yaml
. Recall that for an EC2 instance, we have public IP and private IP and most of the time they are different.
We need to change the following items in the cassandra.yaml
file
- set
listen_address
to ec2_private_ip_address - set
rpc_address
to ec2_private_ip_address - set
broadcast_address
to ec2_public_ip_address - set
seeds
to "ec2_public_ip_address"
(To be honest, not sure why this setup works.)
Now we should be able to connect to cassandra from a different machine by running
cqlsh ec2_public_ip_address
Other Common Questions
How to specify java version for Cassandra?
We first need to find where Java is installed. For example, on Mac, java is usually installed in /Library/Java/JavaVirtualMachines
Ryan@Mac $ls -1 /Library/Java/JavaVirtualMachines adoptopenjdk-11.jdk jdk-10.0.2.jdk jdk1.8.0_60.jdk
As we can see, there are three different java versions installed.
Now we need to specify the java version in Cassandra configuration file. The configuration file in question is in bin/cassandra.in.sh
. Set the JAVA_HOME
variable to the java home directory. For instance the configuration below instructs Cassandra to use Java 8
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home
Why Cassandra does not start and there is no outputs or error messages?
We could quickly check if there is any running Cassandra process on the machine
ps aux | grep java
If there is no java program running and nothing happens when we start Cassandra, it may indicate that the machine is unable to start the Cassandra application. Note that Cassandra is designed to handle heavy workload and we are supposed to run it on some powerful machine.
If we run in on a t2.micro instance, we only have 1 vcpu and 1G memory:
ubuntu@:/etc/cassandra$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 ubuntu@:/etc/cassandra$ free -m total used free shared buff/cache available Mem: 978 475 236 1 266 351 Swap: 0 0 0
In such cases, we may need to change the Cassandra jvm configuration so that it does not consume all system resource. The jvm configuration file is /etc/cassandra/jvm.options/
----- END -----
©2019 - 2022 all rights reserved