AWS Cassandra 2025: Cassandra 5.0, NUMA, and Graviton4 Performance Guide

January 9, 2025

                                                                           

What’s New in 2025

Key Updates and Changes

  • Graviton4 Processors: Up to 40% faster than Graviton3 for databases, 192 cores at 2.8 GHz
  • NUMA Evolution: Two-socket NUMA memory clustering on Graviton4 for improved performance
  • Cassandra 5.0: Enhanced NUMA awareness with improved memory management
  • Instance Types: New R8g, X8g, C8g, M8g, I8g instances with better NUMA support
  • Performance Gains: 30-40% improvement over x86 for Cassandra workloads

Major Architecture Changes

  • Single NUMA Domain: Graviton3 maintains single NUMA domain simplicity
  • Dual NUMA Support: Graviton4 introduces two-socket NUMA clustering
  • Memory Bandwidth: Improved memory controller performance across generations
  • Core Density: Up to 192 physical cores per instance (R8g.48xlarge)
  • ARM Optimization: Better Java performance on ARM architecture

AWS Cassandra 2025 and NUMA Architecture

In 2025, AWS has significantly evolved its NUMA (Non-Uniform Memory Access) support with Graviton4 processors. Understanding NUMA is crucial for optimizing Cassandra 5.0 performance on modern EC2 instances.

NUMA-Enabled Instance Types (2025)

Graviton4 Instances (Two-Socket NUMA)

  • r8g.48xlarge - 192 vCPUs, 1.5 TB memory
  • x8g.48xlarge - 192 vCPUs, 3 TB memory
  • c8g.48xlarge - 192 vCPUs, 384 GB memory
  • m8g.48xlarge - 192 vCPUs, 768 GB memory

Graviton3 Instances (Single NUMA Domain)

  • c7g.16xlarge - 64 vCPUs, 128 GB memory
  • m7g.16xlarge - 64 vCPUs, 256 GB memory
  • r7g.16xlarge - 64 vCPUs, 512 GB memory

Legacy Intel/AMD Instances

  • i3.8xlarge, c4.8xlarge, m4.10xlarge and above still support NUMA

We hope this information on Cassandra NUMA for AWS helps with your 2025 deployments. We also provide Cassandra consulting and Kafka consulting. Please check out our Cassandra training and Kafka training. We specialize in AWS DevOps Automation for Cassandra and Kafka.

Understanding NUMA in 2025

NUMA (Non-Uniform Memory Access) architecture has evolved significantly with Graviton4:

Traditional NUMA: Each CPU socket has its own memory controller. Memory access is faster when CPU and memory are on the same socket (10 CPU cycles vs. 100+ cycles for remote memory).

Graviton3 NUMA: All vCPUs are physical cores in a single NUMA domain running at 2.6 GHz. This simplifies memory management but limits scalability.

Graviton4 NUMA: Two-socket configuration with 192 cores at 2.8 GHz and 1.5 TB memory. This brings traditional NUMA benefits to ARM architecture.

Cassandra 5.0 NUMA Optimization

Cassandra 5.0 includes enhanced NUMA awareness and improved memory management:

JVM Configuration for NUMA

Traditional Approach (Legacy)

# Comment out numactl --interleave in bin/cassandra
# Add to cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
# For Graviton4 instances with dual NUMA
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
JVM_OPTS="$JVM_OPTS -XX:+UseTransparentHugePages"

# For single NUMA domain (Graviton3)
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"

Instance-Specific NUMA Tuning

Graviton4 (R8g.48xlarge)

# Check NUMA topology
numactl --hardware

# Bind Cassandra to specific NUMA nodes
numactl --cpunodebind=0 --membind=0 /path/to/cassandra

# Or use interleave for balanced performance
numactl --interleave=all /path/to/cassandra

Graviton3 (Single NUMA)

# Single NUMA domain - use standard configuration
# No special NUMA binding needed
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"

Memory Management Improvements

Cassandra 5.0 with proper NUMA configuration shows significant improvements:

# cassandra.yaml - NUMA optimizations
# Increase heap size for large instances
max_heap_size: 32g
heap_newsize: 8g

# Optimize for NUMA
native_transport_max_threads: 128
native_transport_max_frame_size_in_mb: 256

# New in Cassandra 5.0 - NUMA-aware allocator
off_heap_memory_allocator: numa_aware

Performance Benchmarks (2025)

Graviton4 vs x86 Performance

Throughput Improvements:

  • Graviton4: 40% faster than x86 for database workloads
  • Graviton3: 30% faster than x86 for Apache Cassandra
  • Memory Bandwidth: 2x improvement on Graviton4

Latency Improvements:

  • P99 Latency: 25% lower on Graviton4
  • Memory Access: 15% faster local memory access
  • Cross-NUMA: 10% faster inter-socket communication

Real-World Performance Data

# Cassandra 5.0 on r8g.48xlarge
# Write throughput: 500K ops/sec
# Read throughput: 800K ops/sec  
# P99 latency: 2.5ms (vs 3.2ms on x86)

# NUMA-aware configuration results
# Memory locality: 95% local access
# CPU utilization: 85% (vs 78% without NUMA)

CPU Pinning Strategies

When to Use CPU Pinning

Recommended for:

  • Mixed workloads on large instances
  • Co-location with Spark, Solr, or other JVMs
  • Latency-sensitive applications
  • Graviton4 instances with dual NUMA

Not Recommended for:

  • Single-application deployments
  • Graviton3 single NUMA instances
  • Small to medium instances

CPU Pinning Configuration

# Graviton4 - Pin to first NUMA node
numactl --cpunodebind=0 --membind=0 \
  --pid $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")

# Check CPU affinity
taskset -p $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")

# Isolate CPU cores for Cassandra
echo "isolcpus=0-95" >> /etc/default/grub
update-grub

Monitoring NUMA Performance

Key Metrics to Monitor

# Check NUMA statistics
numastat -c cassandra

# Monitor memory usage per NUMA node
watch -n 1 "numastat -p $(pgrep -f cassandra)"

# CPU utilization per NUMA node
sar -P ALL 1 5

# Memory bandwidth monitoring
pcm-memory.x -- sleep 5

CloudWatch Custom Metrics

import boto3
import subprocess

def publish_numa_metrics():
    cloudwatch = boto3.client('cloudwatch')
    
    # Get NUMA memory usage
    result = subprocess.run(['numastat', '-c', 'cassandra'], 
                          capture_output=True, text=True)
    
    # Parse and publish metrics
    for line in result.stdout.split('\n'):
        if 'Node' in line:
            node_id = line.split()[1]
            memory_usage = float(line.split()[2])
            
            cloudwatch.put_metric_data(
                Namespace='Cassandra/NUMA',
                MetricData=[
                    {
                        'MetricName': 'MemoryUsage',
                        'Dimensions': [
                            {'Name': 'NumaNode', 'Value': node_id},
                            {'Name': 'InstanceId', 'Value': instance_id}
                        ],
                        'Value': memory_usage,
                        'Unit': 'Bytes'
                    }
                ]
            )

Instance Selection Guide (2025)

Graviton4 Instances

R8g Family (Memory Optimized)

  • Best for: Large datasets, analytics workloads
  • NUMA: Dual-socket configuration
  • Recommendation: Use for clusters with large partition sizes

C8g Family (Compute Optimized)

  • Best for: High-throughput workloads
  • NUMA: Optimized for compute-intensive operations
  • Recommendation: Use for write-heavy workloads

I8g Family (Storage Optimized)

  • Best for: High IOPS requirements
  • NUMA: Optimized for storage throughput
  • Recommendation: Use with local NVMe storage

Graviton3 vs Graviton4 Decision Matrix

Workload Type Graviton3 Graviton4 Reason
Small clusters - Single NUMA sufficient
Large clusters - Dual NUMA beneficial
Memory-intensive - Better memory bandwidth
Cost-sensitive - Lower cost per vCPU
Performance-critical - 40% performance gain

Cassandra 5.0 Configuration Examples

Graviton4 Configuration

# cassandra.yaml for r8g.48xlarge
cluster_name: 'Graviton4Cluster'
num_tokens: 16
initial_token: 

# NUMA-optimized settings
concurrent_reads: 128
concurrent_writes: 128
concurrent_counter_writes: 128

# Memory settings for dual NUMA
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 8192
memtable_offheap_space_in_mb: 8192

# Thread pool settings
native_transport_max_threads: 192
rpc_max_threads: 192

JVM Settings for NUMA

# jvm.options for Graviton4
-XX:+UseG1GC
-XX:+UseNUMA
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32m
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1MixedGCCountTarget=8
-XX:G1OldCSetRegionThresholdPercent=20

# NUMA-specific optimizations
-XX:+UnlockExperimentalVMOptions
-XX:+UseTransparentHugePages
-XX:+AlwaysPreTouch
-XX:+UseLargePages

Best Practices for 2025

NUMA Configuration

  1. Use numactl –hardware to understand your instance topology
  2. Enable JVM NUMA support with -XX:+UseNUMA
  3. Monitor NUMA statistics with numastat
  4. Consider CPU pinning for mixed workloads
  5. Test different configurations under your specific workload

Performance Optimization

  1. Choose instance types based on NUMA requirements
  2. Configure heap sizes appropriately for NUMA nodes
  3. Use local storage when possible (NVMe SSD)
  4. Monitor memory locality to ensure optimal performance
  5. Benchmark different configurations before production

Monitoring and Troubleshooting

  1. Track NUMA memory usage with custom metrics
  2. Monitor cross-NUMA traffic for performance issues
  3. Use profiling tools to identify memory hotspots
  4. Set up alerts for NUMA imbalances
  5. Document configurations for team knowledge sharing

Conclusion

NUMA optimization for Cassandra 5.0 on AWS has become more sophisticated with Graviton4 processors. The dual-socket NUMA configuration provides significant performance benefits for large-scale deployments, while Graviton3’s single NUMA domain offers simplicity for smaller clusters.

Key recommendations for 2025:

  • Use Graviton4 instances for performance-critical workloads
  • Implement proper NUMA configuration for your instance type
  • Monitor NUMA statistics and optimize accordingly
  • Consider CPU pinning for mixed workloads
  • Test configurations thoroughly before production deployment

More info about Cassandra and AWS

Amazon provides comprehensive guidance for running Cassandra on AWS. The AWS Database Blog covers best practices, while the AWS Big Data Blog provides EC2-specific recommendations.

Instaclustr’s performance testing shows real-world Graviton performance benefits.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

About Cloudurable™

Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2.

We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.

More info about Cloudurable

Please take some time to read the Advantage of using Cloudurable™.

Cloudurable provides:

Authors

Written by R. Hightower and JP Azar.

Feedback


We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting