Description Eindex is a module for sharding column based indexes. With a similar schemeof how Cassandra shards row keys we decided to shard column keys. With this scheme you can still use the Cassandra Random Partitioner and get range queries for keys. Our goal is to support 100’s of millions of keys across a Cassandra cluster.
EID is a service for generating unique ID numbers at high scale with some simple guarantees (based on the work from https://github.com/twitter/snowflake). The service can be in-memory or run as a REST-ful web service using Jetty.
The tests were performed over several days in Sept 2010 with Cassandra 0.6.2 utilizing the supplied contrib/py_stress tests.
This will be a short post describing how we configure a raid level 0 drive on an EC2 instance using the EBS drives. For a lot of our functionality we typically use the ephemeral drives and periodically backup content using the EBS drives and snapshots. We mainly use raided EBS drives to get the maximum performance out of an Amazon EC2 small instances. For example we have seen nearly double the performance out of our Cassandra cluster on small instances using raided EBS drives.
In this post we will walk through setting up a production ready 3 node Cassandra cluster with Munin monitoring running on Amazon EC2 in under 30 minutes. We will also walk through getting the sample Cassandra stress scripts running with a basic load on the 3 node cluster. This post builds on a previous post about how to setup and maintain an EC2 virtual instance with our supplied unattended install scripts. If you wish to know more about how our unattended install scripts works please review my previous post.
After maintaining several version of my own private AMI’s and, realizing what a pain maintenance was, I decided to find a better solution. There is a lot of great information on the net if your google-fu is good, but I decided to compile all the information I use into a couple scripts and describe each step in detail so others could understand, modify and use the scripts. The overriding goal is to allow the flexibility of launching and configuring remote Amazon EC2 instances in an non-interactive manner.
I have added JMX support to the Simple Java Performance Counters. For a detailed description on the Java Performance Counters please check out my previous post located.
Lets look at creating and using a simple thread-safe Java in-memory cache. It would be nice to have a cache that can expire items from the cache based on a time to live as well as keep the most recently used items. Luckily the apache common collections has a LRUMap, which, removes the least used entries from a fixed sized map. Great, one piece of the puzzle is complete. For the expiration of items we can timestamp the last access and in a separate thread remove the items when the time to live limit is reached. This is nice for reducing memory pressure for applications that have long idle time in between accessing the cached objects. There is also some debate weather the cache items should return a cloned object or the original. I prefer to keep it simple and fast by returning the original object. So the onus is on the user of the cache to understand modifying the underlying object will modify the object in the cache as well. Notice this is also an in-memory cache so objects are not serialized to disk.
There doesn’t seem to be a whole lot of open source options for Java performance counters. Since I found it frustrating and rolled my own I decided to share my work so others could just ditto it. The overarching principal is Simplicity or more importantly KISS. I wanted something fast, simple, easy to use, fast, simple and thread-safe (did I mention fast and simple). After working with windows C++ performance counters (yuck!) talk about warts and .NET performance counters (nice band-aid, but still didn’t cover the warts) I opted for a simple under-engineered design. Before we dive into some samples lets briefly explain the included Java performance counters.
When building and configuring Amazon EC2 instances I find myself needing to install the Sun Java 6 runtime and/or the JDK unattended. This is sometimes referred to as non-interactive or headless install. The script below is what I typically use to install Java on my Ubuntu 9.10 instances running on Amazon EC2.