Cassandra AWS System Memory Guidelines 2025: Optimizing for Modern Hardware and Workloads

January 9, 2025

                                                                           

System Memory Guidelines for Cassandra AWS - 2025 Edition

What’s New in 2025

The Cassandra memory landscape has evolved significantly:

  1. Modern JVMs - Java 21 LTS with ZGC and Shenandoah GC offer sub-millisecond pause times
  2. AWS Graviton3 - ARM-based processors with DDR5 memory provide 50% better memory bandwidth
  3. Larger heap sizes - Modern GCs handle 100GB+ heaps efficiently
  4. Container deployments - Memory management in Kubernetes requires different approaches
  5. Persistent memory - Intel Optane and similar technologies blur the line between RAM and storage
  6. Tiered storage - Hot data in memory, warm in NVMe, cold in S3
  7. Vector search workloads - New memory requirements for AI/ML applications

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Modern Guidelines for AWS Cassandra (2025)

Minimum Requirements Have Increased

Do not use less than 16GB of memory for the JVM in production. The sweet spot for most workloads is 24-48GB. With modern GCs like ZGC and Shenandoah, you can efficiently use heaps up to 128GB. In the EC2 world, this means starting with m7g.xlarge (16GB) for development and m7g.2xlarge (32GB) or larger for production.

Use Case Instance Type Memory vCPUs Why
Development m7g.xlarge 16 GB 4 Graviton3, good for testing
Small Production m7g.2xlarge 32 GB 8 Balanced compute/memory
Standard Production r7g.2xlarge 64 GB 8 Memory-optimized Graviton3
High Performance r7g.4xlarge 128 GB 16 Large heap with ZGC
Vector Search r7g.8xlarge 256 GB 32 AI/ML workloads
Extreme Performance x2gd.4xlarge 256 GB 16 NVMe + high memory

Modern JVM Garbage Collectors

ZGC - The Game Changer

ZGC (Z Garbage Collector) has matured significantly and is now production-ready for Cassandra:

# ZGC Configuration for Cassandra 5.0+
-XX:+UseZGC
-XX:+UseLargePages
-XX:+AlwaysPreTouch
-XX:ZCollectionInterval=30
-XX:ZAllocationSpikeTolerance=5
-Xms48G
-Xmx48G

Shenandoah GC

Another low-pause option for modern deployments:

# Shenandoah Configuration
-XX:+UseShenandoahGC
-XX:+UseLargePages
-XX:+AlwaysPreTouch
-XX:ShenandoahGCHeuristics=adaptive
-Xms32G
-Xmx32G

G1GC - Still Relevant

G1GC remains a solid choice, especially with improvements in Java 21:

# Modern G1GC Configuration
-XX:+UseG1GC
-XX:MaxGCPauseMillis=300
-XX:G1HeapRegionSize=32M
-XX:InitiatingHeapOccupancyPercent=45
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=4
-Xms32G
-Xmx32G

Memory Allocation in Container Environments

Kubernetes Memory Considerations

When running Cassandra in Kubernetes, memory management becomes more complex:

# k8s-cassandra-memory.yaml
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: cassandra
    resources:
      requests:
        memory: "64Gi"
        cpu: "8"
      limits:
        memory: "64Gi"
        cpu: "8"
    env:
    - name: JVM_OPTS
      value: >-
        -XX:+UseZGC
        -XX:MaxRAMPercentage=75.0
        -XX:InitialRAMPercentage=75.0
        -XX:+AlwaysPreTouch        

Container Memory Breakdown

For a 64GB container:

  • JVM Heap: 48GB (75%)
  • Off-heap: 8GB (12.5%)
  • OS/Container overhead: 8GB (12.5%)

Modern Cassandra Memory Architecture

Java Heap Usage (2025)

Cassandra 5.0 maintains these components in heap:

  • Memtables (can be moved off-heap)
  • Request coordination metadata
  • Compaction metadata
  • Secondary index data
  • Materialized view metadata
  • Vector embeddings (new in 5.0)

Off-Heap Memory Evolution

Modern Cassandra uses off-heap memory more efficiently:

  • Bloom filters
  • Compression metadata
  • Index summaries
  • Row cache (when enabled)
  • Native memory for zero-copy operations
  • Vector index structures

Cassandra 5.0 with Graviton3

AWS Graviton3 processors offer significant advantages:

Graviton3 Optimization

# Graviton3-specific JVM flags
-XX:+UseZGC
-XX:+UseLargePages
-XX:+UseTransparentHugePages
-XX:+UseNUMA
-XX:+AlwaysPreTouch
-XX:MaxDirectMemorySize=24G
-Xms48G
-Xmx48G

# ARM-specific optimizations
-XX:+UseCRC32
-XX:+UseAES
-XX:+UseSHA

Memory Bandwidth Benefits

Graviton3’s DDR5 support provides:

  • 50% more memory bandwidth than Graviton2
  • Better performance for memory-intensive operations
  • Improved compaction throughput
  • Faster streaming and repairs

Modern Memory Sizing Guidelines

2025 JVM Size vs. System Memory

EC2 Instance Type Instance Memory JVM Heap Range Off-heap OS/Buffer Cache
m7g.xlarge 16 GB 8-12 GB 2 GB 2-6 GB
m7g.2xlarge 32 GB 16-24 GB 4 GB 4-12 GB
r7g.2xlarge 64 GB 32-48 GB 8 GB 8-24 GB
r7g.4xlarge 128 GB 64-96 GB 16 GB 16-48 GB
r7g.8xlarge 256 GB 128-192 GB 32 GB 32-96 GB

Vector Search and AI Workloads

Cassandra 5.0 introduces vector search capabilities, requiring different memory patterns:

Vector Index Memory Requirements

# cassandra.yaml for vector workloads
vector_memory_pool_size_mb: 8192
vector_index_cache_size_mb: 4096
memtable_allocation_type: offheap_objects
memtable_offheap_space_in_mb: 16384

Sizing for Vector Workloads

  • Base memory requirements +
  • (number of vectors × dimension × 4 bytes) +
  • 20% overhead for indexes

Example: 10M vectors, 768 dimensions = ~32GB additional memory

Persistent Memory Support

Cassandra now supports Intel Optane and similar persistent memory:

# cassandra.yaml for persistent memory
persistent_memory_directories:
  - /mnt/pmem0
  - /mnt/pmem1
persistent_memory_size_gb: 512
use_persistent_memory_for_commit_log: true

Monitoring and Tuning

Modern Monitoring Stack

# prometheus-jmx-config.yaml
lowercaseOutputName: true
rules:
- pattern: ".*"
  
# Key metrics for 2025
- cassandra_jvm_gc_zgc_*
- cassandra_jvm_memory_*
- cassandra_vector_index_*
- cassandra_persistent_memory_*

Auto-tuning with Machine Learning

New Cassandra versions include ML-based tuning:

# cassandra.yaml
enable_ml_tuning: true
ml_tuning_metrics_interval_ms: 30000
ml_tuning_adjustment_threshold: 0.1

Cost Optimization Strategies

1. Use Spot Instances for Non-Seed Nodes

// CDK example
const spotNodeGroup = new NodeGroup(this, 'SpotNodes', {
  cluster: eksCluster,
  instanceTypes: [
    new InstanceType('r7g.2xlarge'),
    new InstanceType('r6g.2xlarge'),
  ],
  capacityType: CapacityType.SPOT,
  taints: [{
    key: 'spot',
    value: 'true',
    effect: TaintEffect.NO_SCHEDULE,
  }],
});

2. Tiered Storage Architecture

# cassandra.yaml for tiered storage
tiered_storage:
  enabled: true
  tiers:
    - name: hot
      path: /mnt/nvme
      capacity: 1TB
      min_sstable_age_days: 0
    - name: warm
      path: /mnt/ebs
      capacity: 10TB
      min_sstable_age_days: 7
    - name: cold
      type: s3
      bucket: cassandra-cold-data
      min_sstable_age_days: 30

3. Memory Oversubscription for Dev/Test

For non-production environments:

# Allow controlled memory oversubscription
-XX:+UseZGC
-XX:ZUncommitDelay=300
-XX:ZUncommit=true
-XX:MinHeapFreeRatio=10
-XX:MaxHeapFreeRatio=20

Best Practices for 2025

1. Start with ZGC or Shenandoah

Modern GCs eliminate most tuning headaches:

# Production template
-XX:+UseZGC
-XX:+UseLargePages
-XX:+AlwaysPreTouch
-Xms${HEAP_SIZE}
-Xmx${HEAP_SIZE}
-XX:MaxDirectMemorySize=$((HEAP_SIZE/2))

2. Enable Continuous Profiling

# cassandra-env.sh additions
JVM_OPTS="$JVM_OPTS -XX:+FlightRecorder"
JVM_OPTS="$JVM_OPTS -XX:StartFlightRecording=settings=profile,maxsize=100M,maxage=1h"

3. Use Native Libraries

Ensure modern native libraries are installed:

# Amazon Linux 2023
sudo dnf install -y jemalloc libaio numactl-libs
export LD_PRELOAD=/usr/lib64/libjemalloc.so.2

4. NUMA Awareness

For instances with NUMA architecture:

# cassandra-env.sh
NUMACTL_ARGS="--interleave=all"
if which numactl >/dev/null 2>&1; then
    JVM_NUMA_OPTS="-XX:+UseNUMA -XX:+UseNUMAInterleaving"
fi

Troubleshooting Memory Issues

Common Problems in 2025

  1. Vector OOM: Insufficient memory for vector indexes

    # Increase vector memory pool
    vector_memory_pool_size_mb: 16384
    
  2. Container Memory Limits: Kubernetes OOMKilled

    # Ensure proper resource allocation
    resources:
      requests:
        memory: "64Gi"
      limits:
        memory: "64Gi"  # Same as requests to prevent OOM
    
  3. Persistent Memory Fragmentation

    # Regular defragmentation
    nodetool persistent_memory_defragment
    

Future Considerations

As we look beyond 2025:

  • CXL Memory: Compute Express Link will enable memory pooling
  • Quantum-resistant encryption: Higher memory overhead for post-quantum cryptography
  • Edge deployments: Smaller memory footprints for edge computing
  • Serverless Cassandra: Memory-on-demand pricing models

Conclusion

Memory management for Cassandra in 2025 has become both simpler (thanks to modern GCs) and more complex (due to new workload types). The key takeaways:

  1. Start with larger heaps (32GB minimum for production)
  2. Use modern GCs (ZGC or Shenandoah)
  3. Leverage Graviton3 for better price/performance
  4. Plan for vector search memory requirements
  5. Implement tiered storage for cost optimization
  6. Monitor and adjust based on actual workload

The days of spending weeks tuning CMS garbage collector are behind us. Modern Cassandra deployments can achieve consistent sub-millisecond latencies with minimal tuning effort.

About Cloudurable™

Cloudurable™ specializes in modern data platform deployments on AWS. We provide expert consulting, training, and support for Cassandra, Kafka, and cloud-native architectures. Our team stays current with the latest developments to help you optimize your data infrastructure.

Feedback

We hope you found this updated guide helpful. Please provide feedback.

About Cloudurable

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting