Cassandra AWS System Memory Guidelines

March 15, 2017

                                                                           

System Memory Guidelines for Cassandra AWS

Basic guidelines for AWS Cassandra

Do not use less than 8GB of memory for the JVM. The more RAM the better. Use G1GC. SSTable are first stored in memory and then written to disk sequentially. The larger the SSTable the less scanning that needs to be done while reading and determining if a key is in an SSTable using a bloom filter. In the EC2 world this equates to an m4.xlarge (16GB of memory), and you need some memory for the OS, specifically the IO buffers. The i2.xlarge and d2.xlarge are the smallest in their family and exceed the min memory requirement (and then some).

Java heap usage for Cassandra

Cassandra maintains these components in Java heap memory:

  • Bloom filters
  • Partition summary
  • Partition key cache
  • Compression offsets
  • SSTable index summary

Some of these component’s Java heap usage grows as you increase the Java heap size.

Cassandra uses OS memory: Leave enough memory for the Linux OS - System RAM

Cassandra uses memory in 4 ways:

  • Java heap
  • offheap memory
  • OS page cache
  • OS TCP/IP stack I/O cache

The more memory the better. If the memory is available Cassandra and the Linux OS can use it. In the Java heap, Cassandra can use memory for the key cache which can speed up queries. For smaller tables, that are read often, you can use the row cache. If the cache hit rate is high, then there is less read IO. In the NoSQL world, Cassandra gets high marks for writes, and lower marks for reads, which use case permitting could benefit from caches.

For a read-heavy system in EC2, it could make sense to go into 60 GB to 120 GB (e.g., m4.4xlarge, i3.2xlarge). Above this range in EC2, and you have to worry about NUMA concerns, see NUMA Cassandra AWS guidelines.

Cassandra relies heavily on the Linux OS page cache for caching of data on stored on EBS and local instances volumes. Every read that the OS gets a cache hit on, means the data is read from RAM not the EC2 volume, and we take the IOPs or throughput of the EBS out of the equation. This means you must leave memory for the Linux OS. You are not running a stateless servlet engine. You must also leave some space for Linux IO buffers. You are running a stateful NoSQL database. The OS memory left over after the JVM should be 2x to 6x the size of the JVM.

Cassandra makes uses off-heap memory as follows:

  • Page cache
  • Bloom filter
  • Compression offset maps
  • row caches

Think of off-heap memory as OS memory that is not managed by the JVM garbage collectors.

Since Cassandra uses OS/off-heap memory, you have quite a bit more OS memory than allocated to the JVM for Cassandra to be effective.

Starting breakdowns of JVM heap size vs OS RAM for Cassandra

Taking our recommendation for at least 8 vCPUs per Cassandra EC2 instance and using the 2x to 6x ratio of Cassandra JVM memory vs. Linux system memory gives us this table.

Cassandra JVM size vs. Linux OS memory for AWS

EC2 Instance Type Instance Size GB JVM Size Range GB Linux OS memory Range GB
c4.2xlarge 15 5 10
m4.2xlarge 32 5 to 16 16 to 27
m4.4xlarge 64 10 to 32 32 to 54
m4.10xlarge 160 27 to 80 80 to 133
i3.2xlarge 61 10 to 30 31 to 51
i3.4xlarge 122 20 to 60 62 to 102

JVM Garbage Collector for Cassandra on AWS

Due to the fact that Cassandra has some long lived objects on the heap, the choice in GCs come down to CMS and GCG1. Choose GCG1.

The general rule is don’t use GGC1 if your heap is under 5GB some say 8GB as GCG1 (Oracle says under 1GB).

You will notice on the preceding chart above, there is no JVM configuration with a heap less than 5GB.

Do not use JVM CMS garbage collector if your JVM is over 16 GB. CMS is deprecated in JDK 9. You could make a case with a heap size between 5 GB and 8 GB for CMS.

GCG1 under 8GB

The promise of G1 on smaller systems vs CMS is more robust performance across a range of workloads without manual tuning. GCG1 probably won’t perform as well in terms of ops/s, etc. Using GCG1 under a 8GB heap you are trading some speed against CMS pain once you start having cascading IO and heap pressure through the system. There are benchmarks that clearly show G1 beating CMS at 8GBE.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

If you want to take CMS out of the picture all together, use this table as a guide.

Cassandra JVM size vs. Linux OS memory no CMS for AWS

EC2 Instance Type Instance Size GB JVM Size Range GB Linux OS memory Range GB
m4.2xlarge 32 8 to 16 13 to 24
m4.4xlarge 64 10 to 32 32 to 54
m4.10xlarge 160 27 to 80 80 to 133
i3.2xlarge 61 10 to 30 31 to 51
i3.4xlarge 122 20 to 60 62 to 102

Note that the Apache Cassandra on AWS: Guidelines and Best Practices has a mistake. It says the max heap size you should use for Cassandra is 8GB, and it says the DataStax Documentation says this. The DataStax documentation says use between 14GB and 64GB of heap. 8GB is only for older computers.

To set heap size for m4.2xlarge

-Xms12G
-Xmx12G

-Xms sets min heap size and -Xms sets max heap size

You should prefer an easily tuned and stable setting over one that has the issues that CMS does.

We hope this information on System memory requirements for Cassandra running in AWS useful. Cloudurable provides Casandra consulting and Kafka consulting to get you setup fast in AWS with CloudFormation and CloudWatch. We specialize in AWS DevOps Automation for Cassandra and Kafka. Check out our Casandra training and Kafka training as well.

Guide for using GCG1 with Cassandra JVM

-XX:+UseG1GC
-XX:MaxGCPauseMillis=500
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:InitiatingHeapOccupancyPercent=25

The XX:G1RSetUpdatingPauseTimePercent=5 sets a percent target amount (defaults 10 ) that G1GC spends in updating RSets during a GC evacuation pause. An RSets is Remembered Sets, per-region entries that allow G1GC to track outside references to heap region. This is so GCG1 does not have to scan the whole heap for references into a region. Read tips for tuning the GCG1. By decreasing G1RSetUpdatingPauseTimePercent, the JVM will spend less time in updating the RSets during the stop-the-world (STW) GC pause, and the RSets will be updated in the refinement threads.

InitiatingHeapOccupancyPercent defaults to 45% of your total Java heap. You can drop the value starts the marking cycle earlier. It is another way to start GC earlier to avoid STW.

If you want to maximize throughput, and are less concerned with pauses. Here is another way to configure GCG1.

Guide for using GCG1 with Cassandra JVM

-XX:+UseG1GC
-XX:MaxGCPauseMillis=1000
-XX:InitiatingHeapOccupancyPercent=60

Just make sure your Cassandra timeouts are more than 1000ms.

GCG1 will self adjust using ergonomics and runtime statistics. The only setting that you really need to set is –XX:MaxGCPauseMillis.

GC settings common to both CMS and GCG1

Guide setting for both GCG1 and CMS

-XX:ParallelGCThreads={#vCPU / 2 }
-XX:ConcGCThreads={#vCPU / 2 }
-XX:+ParallelRefProcEnabled
-XX:+AlwaysPreTouch                   # allocate and zero (force fault) heap memory on startup
-XX:+UseTLAB                          # thread local allocation blocks
-XX:+ResizeTLAB                       # auto-optimize TLAB size
-XX:-UseBiasedLocking                 # disable biased locking for cassandra

Increasing ParallelGCThreads and ConcGCThreads is useful for any parallel garbage collector.

Turning on ParallelRefProcEnabled helps collect reference objects (e.g., WeakReference) in parallel which will be faster if there is a lot.

Reference processing isn’t usually a big deal for Cassandra, but in some workloads it does start to show up in the GC logs. Since we pretty much always want all the parallel stuff offered by the JVM, go ahead and enable parallel reference processing to bring down your p99.9’s.
Al Tobey Writes Blog: Cassandra tuning guide

Use +AlwaysPreTouch to allocate and zero, which does a force fault, heap memory on startup. This ensures all memory is faulted and zeroed on startup, and prevents soft faults making hugepage allocation more effective. Use XX:+UseTLAB to add thread local allocation blocks. Use +ResizeTLAB to allow JVM to auto-optimize TLAB size. The we disable bias locking for Cassandra with -UseBiasedLocking. Biased locking was introduced in Hotspot 1.5 to reduce locking in systems that use locks efficiently (single-writer locks). Cassandra has contended locks in frequently used areas which makes this optimization a net loss when Cassandra is under load.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

CMS

Don’t use CMS. If you really have to have a JVM that is between 5GB and 8GB, and having an easy to configure, reliable systems does not win out over raw speed then use this as a guide.

Remember, you can take CMS out of the equation by using m4.2xlarge and i3.2xlarge as the smallest EC2 instances you deploy to. If for some reason you need to go smaller than 8GB, try using GCG1 anyway.

Do not use CMS for anything above 8GB JVM heap. It does have STW pauses.

Guide to setting up CMS for Cassandra running in AWS

-XX:+UseConcMarkSweepGC
-XX:ParGCCardsPerStrideChunk=4096
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=16
-XX:+CMSScavengeBeforeRemark
-XX:CMSMaxAbortablePrecleanTime=60000
-XX:CMSWaitDuration=30000
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-Xmn{1/4 to 1/3 the size of the heap}

The Parallel copy collector (ParNew) is responsible for young collection in CMS.
Use ParGCCardsPerStrideChunk (default 256) to increase granularity of tasks distributed between worker threads.

Use CMSScavengeBeforeRemark triggers a Young GC (STW) before running CMS Remark (STW) phase to reduce the duration of the Remark phase.

Use CMSWaitDuration so that once CMS detects it should start a new cycle, it will wait this long for a Young GC cycle to occur. This reduces the duration of the Initial-Mark (STW) CMS phase.

The SurvivorRatio=N divides the young generation by N+2 segments, take N segments for Eden and 1 segment for each survivor.

The MaxTenuringThreshold defines number of young GC an object survives before it gets pushed into the old generation. The idea here is if this is too low that it will increase pressure on CMS.

CMS GC usually uses heuristic rules to trigger garbage collection making it less predictable for production JVM options. The UseCMSInitiatingOccupancyOnly initiates CMS GC in advance to avoid full, stop-the-world, GC. CMSInitiatingOccupancyFraction sets the trigger level for CMS, i.e., the Cassandra JVM should use less that 70% of old generation,

Note that the -Xmn sets the heap size for young generation. Depending on how you have compaction workers setup, you want this to be 14 to 13 size of your total heap.

Reduce pressure garbage collection

Cassandra uses native libraries to allocate memory if available.

Ensure JNA and JEMALLOC are installed on Linux machine Amazon AMI

yum install -y jna
yum install -y jemalloc

If you are creating an Amazon AMI image for Cassandra, then you want to install both of these.

In cassandra.yaml set memtable_allocation_type to offheap_objects if JNA and jemalloc are installed and heap_buffers if not.

Modify the memtable space by changing the memtable_heap_space_in_mb and memtable_offheap_space_in_mb in cassandra.yaml can reduce the amount of Java heap space that Cassandra uses.

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came into use as the FreeBSD libc allocator in 2005, and since then it has found its way into numerous applications that rely on its predictable behavior. –jemalloc jemalloc.net/

Java Native Access (JNA) is a community-developed library that provides Java programs easy access to native shared libraries without using the Java Native Interface. JNA’s design aims to provide native access in a natural way with a minimum of effort. No boilerplate or generated glue code is required. –Java Native Access – Wikipedia https://en.wikipedia.org/wiki/Java_Native_Access

References

About Cloudurable™

Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2. We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.

Follow Cloudurable™ at our LinkedIn page, Facebook page, Google plus or Twitter.

More info about Cloudurable

Please take some time to read the Advantage of using Cloudurable™.

Cloudurable provides:

Authors

Written by R. Hightower and JP Azar.

Feedback


We hope you enjoyed this article. Please provide feedback.

About Cloudurable

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS. Cloudurable also provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

                                                                           

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting