In this post we will walk through setting up a production ready 3 node Cassandra cluster with Munin monitoring running on Amazon EC2 in under 30 minutes. We will also walk through getting the sample Cassandra stress scripts running with a basic load on the 3 node cluster. This post builds on a previous post about how to setup and maintain an EC2 virtual instance with our supplied unattended install scripts. If you wish to know more about how our unattended install scripts works please review my previous post.
Setup The First Node
Step 1 - Create the Instance
We are going to setup the first box in the Cassandra cluster utilizing the
supplied scripts in the download section. For testing we have included a
basic storage-conf.xml file, which you will want to replace with your own.
Make sure your storage-conf.xml settings for
<ListenAddress>localhost</ListenAddress> are set to
<ListenAddress></ListenAddress> which will use the output from hostname
to properly configure the node. You should also must set
<ThriftAddress>localhost </ThriftAddress> to
<ThriftAddress>0.0.0.0</ThriftAddress> for all devices on this node.
From a local command prompt issue the following command to launch and
configure an Amazon EC2 virtual instance.
Notice, when the script completes you are logged into the remote instance. For a detailed description of what setup_box.sh script does please see my previous post located here.
Step 2 - Modify storage-config.xml
On the remote machine First Node run the hostname command to grab the remote machines host name.
Edit the LOCAL copy of the supplied storage-conf.xml and place the hostname
in the seed section changing the seed line from
<Seed>domU-33-92-42-0B-22-21</Seed>. This will be the config to use for all
new nodes added to the cluster. For a more robust production enviroment you
might map an elastic IP address to the seed node alleviating the need for
using an internal private Amazon hostname.
Step 3 - Start Cassandra
Start Cassandra by running the following command
Later, you can kill the process if needed by issuing the command
kill $(cat pid.txt).
You can verify Cassandra is up and running by issuing the following command
Step 4 - Install Munin Server (Optional Step)
We will setup the First Node to monitor the entire cluster. In a true production enviroment we would recommend running the Munin server on a sepeate box verus running it on an actual Cassandra node. We are running it here for convenience. To install the Munin server run the following command on the First Node.
verify that the Munin server is working correctly by navigating to http://public_dns/munin you should see the localdomain node along with a Cassandra hyperlink to the right. You can find the public accessible name by running ec2-describe-instances from a local command prompt (scaning for the hostname internal name to find the public name). Or you could use the AWS managemnt console to find the public accessible host name. Notice the setup script from step 1 also installed the munin plugins for monitoring the Cassandra node. You can find more information about how to monitor a Cassandra node here. For now the script only configured the basics, which, you can modify later to collect more metrics.
Setup The Second Node
If you are adding a node to an existing cluster with data then you should
modify the local storage-confg.xml file to set the bootstrap property
<AutoBootstrap>true</AutoBootstrap>. This is only necessary if the cluster
already has data in it otherwise the default is fine.
Step 1 - Create the Instance
From a local command prompt issue the following command with the modified storage-conf.xml from step 2 above.
Step 2 - Start Cassandra
Start Cassandra by running the following command
You can verify Cassandra is up and running and has successfully inserted itself into the cluster by issuing the following command
Notice both machine are in the cluster as evidence by the ring command.
Step 4 - Install Munin Node Monitoring (Optional Step)
In this step we will show you how to configure the Second Node to report it’s monitoring data to the Munin server or in our case the First Node. Remember in our example we used the First Node as the Munin server, but this may be different in your enviroment. First you need to modify the remote instance to allow others to connect to this munin node. Run the following command to edit the munin node configuration
Add a line after allow
^127\.0\.0\.1$ that looks like
which will allow anyone in your Amazon security group to see this munin node.
You will need to restart the Munin node with the following command
You will need to modify the Munin server to allow it to monitor the newly created Cassandra node. From our example above you should recall that we installed our Munin server on the First Node in our cassandra cluster. On the First Node remote instance edit the munin config file by running the following command
Under the entry
Add a new entry, pointing to your newly created Cassandra node (you can use hostname on the 2nd node to get the address) It should look something like the following
For clarity you should change
The easiest thing to do is wait for the Munin server to refresh the charts and
you should see something like the screen shot below. You may need to wait for
a couple of minutes to see the changes.
If you click on the day link you should see something like
Setup The Third Node
For the Third Node I repeated the steps from the Second Node. Once the First Node is completed and running you can repeat the Second Node steps as many time as necessary to spin up multiple nodes. Notice this process can be done in parallel meaning I can add the Forth Node and Fifth Node at the same time in parallel. You are ready to Rock! Connect a client to any node and start inserting/reading data from your Cassandra cluster.
Setup and Run the Basic Stress Test
To see some more interesting results lets run the Cassandra stress tool against the 3 node cluster. For simplicity I’ll run the stress script on the Third Node in the cluster. WARNING: For an accurate stress test you should run several client instances on non-cassandra boxes. Running the stress script on the Thrid Node will skew the results and isn’t a true measure of performance, but we want to demonstarte how it can be done without needing to spin up more instances. Go ahead and login to the Thrid Node remote instance.
Step 4 - Install Thrift complier
First we need to download, build, and install the thrift complier because we need the python bindings. Run the following commands.
You also need to configure python to see the newly created thrift bindings. Run the following commands.
Notice I am not a python expert and basically copied the commands by following the blog post located here. If the install was successful then typing thrift on the commnad line should output help.
Next download the Cassandra source to the client machine for testing by running the following commands on the Third Node.
Run the ring command to get the nodes IP addresses to supply to the stress script. Now lets run the stress tool inserting 2 million keys.
Now lets randomly read 2 million rows of data
Backing Up your Cluster
The install scripts configure the system with some basic backup and snapshot scripts. Look at the files created in ~/cron. You can also do a crontab -l to see the scripts scheduled to run. The snapshot script will run every day at 1 am. The copy to EBS drive script will run everyday at 3 am. You should create another script that uses AWS to snapshot your EBS drives for backup to Amazon S3. You can run these commands manually if needed.
You can then navigate to /var/lib/cassandra (which is actually mapped to ephemeral drive /mnt/cassandra) and inspect the data folder and see the snapshot dirs with their appropriate files.
You can now backup the snapshots directories to the EBS drive by running the following command
Navigate to /backupvol/cassandra on the EBS drive and see the snapshot directories are copied over to the non-epherimal storage drive. Notice the cron job runs nightly snapshots so your epherimal drives must have 3x the disk space of your expected data to handle the worst case. 1x for the data itself, 2x for the snapshoted data, and 3x if compaction occured at the most in-opportune time. For saving snapshots to the EBS drive the size of the EBS drive must be 2x the data size since we copy the data before deleting the older snapshots for consistency. If you are doing regular AWS EBS snapshots (don’t confuse this with Cassandra snapshots) then you can modify the script to remove the directory before copying so you will only need 1x the disk space since the previous data will have already been backed up into Amazon S3.
I hope people can use these scripts as a starting point for setting up a Cassandra clusters. Please feel free to modify and re-use these scripts as you see fit. In a future post I’ll describe how the cassandra.sh script works and what it does.
Amazon EC2 remote install script - ec2-cassandra-setup.zip