AWS Cassandra 2025: Cassandra 5.0, NUMA, and Graviton4 Performance Guide

January 9, 2025

What’s New in 2025

Key Updates and Changes

Graviton4 Processors: Up to 40% faster than Graviton3 for databases, 192 cores at 2.8 GHz
NUMA Evolution: Two-socket NUMA memory clustering on Graviton4 for improved performance
Cassandra 5.0: Enhanced NUMA awareness with improved memory management
Instance Types: New R8g, X8g, C8g, M8g, I8g instances with better NUMA support
Performance Gains: 30-40% improvement over x86 for Cassandra workloads

Major Architecture Changes

Single NUMA Domain: Graviton3 maintains single NUMA domain simplicity
Dual NUMA Support: Graviton4 introduces two-socket NUMA clustering
Memory Bandwidth: Improved memory controller performance across generations
Core Density: Up to 192 physical cores per instance (R8g.48xlarge)
ARM Optimization: Better Java performance on ARM architecture

AWS Cassandra 2025 and NUMA Architecture

In 2025, AWS has significantly evolved its NUMA (Non-Uniform Memory Access) support with Graviton4 processors. Understanding NUMA is crucial for optimizing Cassandra 5.0 performance on modern EC2 instances.

NUMA-Enabled Instance Types (2025)

Graviton4 Instances (Two-Socket NUMA)

r8g.48xlarge - 192 vCPUs, 1.5 TB memory
x8g.48xlarge - 192 vCPUs, 3 TB memory
c8g.48xlarge - 192 vCPUs, 384 GB memory
m8g.48xlarge - 192 vCPUs, 768 GB memory

Graviton3 Instances (Single NUMA Domain)

c7g.16xlarge - 64 vCPUs, 128 GB memory
m7g.16xlarge - 64 vCPUs, 256 GB memory
r7g.16xlarge - 64 vCPUs, 512 GB memory

Legacy Intel/AMD Instances

i3.8xlarge, c4.8xlarge, m4.10xlarge and above still support NUMA

We hope this information on Cassandra NUMA for AWS helps with your 2025 deployments. We also provide Cassandra consulting and Kafka consulting. Please check out our Cassandra training and Kafka training. We specialize in AWS DevOps Automation for Cassandra and Kafka.

Understanding NUMA in 2025

NUMA (Non-Uniform Memory Access) architecture has evolved significantly with Graviton4:

Traditional NUMA: Each CPU socket has its own memory controller. Memory access is faster when CPU and memory are on the same socket (10 CPU cycles vs. 100+ cycles for remote memory).

Graviton3 NUMA: All vCPUs are physical cores in a single NUMA domain running at 2.6 GHz. This simplifies memory management but limits scalability.

Graviton4 NUMA: Two-socket configuration with 192 cores at 2.8 GHz and 1.5 TB memory. This brings traditional NUMA benefits to ARM architecture.

Cassandra 5.0 NUMA Optimization

Cassandra 5.0 includes enhanced NUMA awareness and improved memory management:

JVM Configuration for NUMA

Traditional Approach (Legacy)

# Comment out numactl --interleave in bin/cassandra
# Add to cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"

2025 Recommended Approach

# For Graviton4 instances with dual NUMA
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
JVM_OPTS="$JVM_OPTS -XX:+UseTransparentHugePages"

# For single NUMA domain (Graviton3)
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"

Instance-Specific NUMA Tuning

Graviton4 (R8g.48xlarge)

# Check NUMA topology
numactl --hardware

# Bind Cassandra to specific NUMA nodes
numactl --cpunodebind=0 --membind=0 /path/to/cassandra

# Or use interleave for balanced performance
numactl --interleave=all /path/to/cassandra

Graviton3 (Single NUMA)

# Single NUMA domain - use standard configuration
# No special NUMA binding needed
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"

Memory Management Improvements

Cassandra 5.0 with proper NUMA configuration shows significant improvements:

# cassandra.yaml - NUMA optimizations
# Increase heap size for large instances
max_heap_size: 32g
heap_newsize: 8g

# Optimize for NUMA
native_transport_max_threads: 128
native_transport_max_frame_size_in_mb: 256

# New in Cassandra 5.0 - NUMA-aware allocator
off_heap_memory_allocator: numa_aware

Performance Benchmarks (2025)

Graviton4 vs x86 Performance

Throughput Improvements:

Graviton4: 40% faster than x86 for database workloads
Graviton3: 30% faster than x86 for Apache Cassandra
Memory Bandwidth: 2x improvement on Graviton4

Latency Improvements:

P99 Latency: 25% lower on Graviton4
Memory Access: 15% faster local memory access
Cross-NUMA: 10% faster inter-socket communication

Real-World Performance Data

# Cassandra 5.0 on r8g.48xlarge
# Write throughput: 500K ops/sec
# Read throughput: 800K ops/sec  
# P99 latency: 2.5ms (vs 3.2ms on x86)

# NUMA-aware configuration results
# Memory locality: 95% local access
# CPU utilization: 85% (vs 78% without NUMA)

CPU Pinning Strategies

When to Use CPU Pinning

Recommended for:

Mixed workloads on large instances
Co-location with Spark, Solr, or other JVMs
Latency-sensitive applications
Graviton4 instances with dual NUMA

Not Recommended for:

Single-application deployments
Graviton3 single NUMA instances
Small to medium instances

CPU Pinning Configuration

# Graviton4 - Pin to first NUMA node
numactl --cpunodebind=0 --membind=0 \
  --pid $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")

# Check CPU affinity
taskset -p $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")

# Isolate CPU cores for Cassandra
echo "isolcpus=0-95" >> /etc/default/grub
update-grub

Monitoring NUMA Performance

Key Metrics to Monitor

# Check NUMA statistics
numastat -c cassandra

# Monitor memory usage per NUMA node
watch -n 1 "numastat -p $(pgrep -f cassandra)"

# CPU utilization per NUMA node
sar -P ALL 1 5

# Memory bandwidth monitoring
pcm-memory.x -- sleep 5

CloudWatch Custom Metrics

import boto3
import subprocess

def publish_numa_metrics():
    cloudwatch = boto3.client('cloudwatch')
    
    # Get NUMA memory usage
    result = subprocess.run(['numastat', '-c', 'cassandra'], 
                          capture_output=True, text=True)
    
    # Parse and publish metrics
    for line in result.stdout.split('\n'):
        if 'Node' in line:
            node_id = line.split()[1]
            memory_usage = float(line.split()[2])
            
            cloudwatch.put_metric_data(
                Namespace='Cassandra/NUMA',
                MetricData=[
                    {
                        'MetricName': 'MemoryUsage',
                        'Dimensions': [
                            {'Name': 'NumaNode', 'Value': node_id},
                            {'Name': 'InstanceId', 'Value': instance_id}
                        ],
                        'Value': memory_usage,
                        'Unit': 'Bytes'
                    }
                ]
            )

Instance Selection Guide (2025)

Graviton4 Instances

R8g Family (Memory Optimized)

Best for: Large datasets, analytics workloads
NUMA: Dual-socket configuration
Recommendation: Use for clusters with large partition sizes

C8g Family (Compute Optimized)

Best for: High-throughput workloads
NUMA: Optimized for compute-intensive operations
Recommendation: Use for write-heavy workloads

I8g Family (Storage Optimized)

Best for: High IOPS requirements
NUMA: Optimized for storage throughput
Recommendation: Use with local NVMe storage

Graviton3 vs Graviton4 Decision Matrix

Workload Type	Graviton3	Graviton4	Reason
Small clusters	✓	-	Single NUMA sufficient
Large clusters	-	✓	Dual NUMA beneficial
Memory-intensive	-	✓	Better memory bandwidth
Cost-sensitive	✓	-	Lower cost per vCPU
Performance-critical	-	✓	40% performance gain

Cassandra 5.0 Configuration Examples

Graviton4 Configuration

# cassandra.yaml for r8g.48xlarge
cluster_name: 'Graviton4Cluster'
num_tokens: 16
initial_token: 

# NUMA-optimized settings
concurrent_reads: 128
concurrent_writes: 128
concurrent_counter_writes: 128

# Memory settings for dual NUMA
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 8192
memtable_offheap_space_in_mb: 8192

# Thread pool settings
native_transport_max_threads: 192
rpc_max_threads: 192

JVM Settings for NUMA

# jvm.options for Graviton4
-XX:+UseG1GC
-XX:+UseNUMA
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32m
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1MixedGCCountTarget=8
-XX:G1OldCSetRegionThresholdPercent=20

# NUMA-specific optimizations
-XX:+UnlockExperimentalVMOptions
-XX:+UseTransparentHugePages
-XX:+AlwaysPreTouch
-XX:+UseLargePages

Best Practices for 2025

NUMA Configuration

Use numactl –hardware to understand your instance topology
Enable JVM NUMA support with -XX:+UseNUMA
Monitor NUMA statistics with numastat
Consider CPU pinning for mixed workloads
Test different configurations under your specific workload

Performance Optimization

Choose instance types based on NUMA requirements
Configure heap sizes appropriately for NUMA nodes
Use local storage when possible (NVMe SSD)
Monitor memory locality to ensure optimal performance
Benchmark different configurations before production

Monitoring and Troubleshooting

Track NUMA memory usage with custom metrics
Monitor cross-NUMA traffic for performance issues
Use profiling tools to identify memory hotspots
Set up alerts for NUMA imbalances
Document configurations for team knowledge sharing

Conclusion

NUMA optimization for Cassandra 5.0 on AWS has become more sophisticated with Graviton4 processors. The dual-socket NUMA configuration provides significant performance benefits for large-scale deployments, while Graviton3’s single NUMA domain offers simplicity for smaller clusters.

Key recommendations for 2025:

Use Graviton4 instances for performance-critical workloads
Implement proper NUMA configuration for your instance type
Monitor NUMA statistics and optimize accordingly
Consider CPU pinning for mixed workloads
Test configurations thoroughly before production deployment

More info about Cassandra and AWS

Amazon provides comprehensive guidance for running Cassandra on AWS. The AWS Database Blog covers best practices, while the AWS Big Data Blog provides EC2-specific recommendations.

Instaclustr’s performance testing shows real-world Graviton performance benefits.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

About Cloudurable™

Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2.

We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.

More info about Cloudurable

Please take some time to read the Advantage of using Cloudurable™.

Cloudurable provides:

Subscription Cassandra support to streamline DevOps (Support subscription pricing for Cassandra and Kafka in AWS)
Quickstart Mentoring Consulting for Developers and DevOps
Architectural Analysis Consulting
Training and mentoring for Cassandra for DevOps/DBA and Developers
Training and mentoring for Apache Kafka for DevOps and Developers
We specialize in AWS Cassandra deployments for organizations that are setting up Cassandra as a Service.

Authors

Written by R. Hightower and JP Azar.

Feedback

We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting