January 9, 2025
What’s New in 2025
Key Updates and Changes
- Graviton4 Processors: Up to 40% faster than Graviton3 for databases, 192 cores at 2.8 GHz
- NUMA Evolution: Two-socket NUMA memory clustering on Graviton4 for improved performance
- Cassandra 5.0: Enhanced NUMA awareness with improved memory management
- Instance Types: New R8g, X8g, C8g, M8g, I8g instances with better NUMA support
- Performance Gains: 30-40% improvement over x86 for Cassandra workloads
Major Architecture Changes
- Single NUMA Domain: Graviton3 maintains single NUMA domain simplicity
- Dual NUMA Support: Graviton4 introduces two-socket NUMA clustering
- Memory Bandwidth: Improved memory controller performance across generations
- Core Density: Up to 192 physical cores per instance (R8g.48xlarge)
- ARM Optimization: Better Java performance on ARM architecture
AWS Cassandra 2025 and NUMA Architecture
In 2025, AWS has significantly evolved its NUMA (Non-Uniform Memory Access) support with Graviton4 processors. Understanding NUMA is crucial for optimizing Cassandra 5.0 performance on modern EC2 instances.
NUMA-Enabled Instance Types (2025)
Graviton4 Instances (Two-Socket NUMA)
r8g.48xlarge
- 192 vCPUs, 1.5 TB memoryx8g.48xlarge
- 192 vCPUs, 3 TB memoryc8g.48xlarge
- 192 vCPUs, 384 GB memorym8g.48xlarge
- 192 vCPUs, 768 GB memory
Graviton3 Instances (Single NUMA Domain)
c7g.16xlarge
- 64 vCPUs, 128 GB memorym7g.16xlarge
- 64 vCPUs, 256 GB memoryr7g.16xlarge
- 64 vCPUs, 512 GB memory
Legacy Intel/AMD Instances
i3.8xlarge
,c4.8xlarge
,m4.10xlarge
and above still support NUMA
We hope this information on Cassandra NUMA for AWS helps with your 2025 deployments. We also provide Cassandra consulting and Kafka consulting. Please check out our Cassandra training and Kafka training. We specialize in AWS DevOps Automation for Cassandra and Kafka.
Understanding NUMA in 2025
NUMA (Non-Uniform Memory Access) architecture has evolved significantly with Graviton4:
Traditional NUMA: Each CPU socket has its own memory controller. Memory access is faster when CPU and memory are on the same socket (10 CPU cycles vs. 100+ cycles for remote memory).
Graviton3 NUMA: All vCPUs are physical cores in a single NUMA domain running at 2.6 GHz. This simplifies memory management but limits scalability.
Graviton4 NUMA: Two-socket configuration with 192 cores at 2.8 GHz and 1.5 TB memory. This brings traditional NUMA benefits to ARM architecture.
Cassandra 5.0 NUMA Optimization
Cassandra 5.0 includes enhanced NUMA awareness and improved memory management:
JVM Configuration for NUMA
Traditional Approach (Legacy)
# Comment out numactl --interleave in bin/cassandra
# Add to cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
2025 Recommended Approach
# For Graviton4 instances with dual NUMA
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
JVM_OPTS="$JVM_OPTS -XX:+UseTransparentHugePages"
# For single NUMA domain (Graviton3)
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
Instance-Specific NUMA Tuning
Graviton4 (R8g.48xlarge)
# Check NUMA topology
numactl --hardware
# Bind Cassandra to specific NUMA nodes
numactl --cpunodebind=0 --membind=0 /path/to/cassandra
# Or use interleave for balanced performance
numactl --interleave=all /path/to/cassandra
Graviton3 (Single NUMA)
# Single NUMA domain - use standard configuration
# No special NUMA binding needed
JVM_OPTS="$JVM_OPTS -XX:+UseNUMA"
Memory Management Improvements
Cassandra 5.0 with proper NUMA configuration shows significant improvements:
# cassandra.yaml - NUMA optimizations
# Increase heap size for large instances
max_heap_size: 32g
heap_newsize: 8g
# Optimize for NUMA
native_transport_max_threads: 128
native_transport_max_frame_size_in_mb: 256
# New in Cassandra 5.0 - NUMA-aware allocator
off_heap_memory_allocator: numa_aware
Performance Benchmarks (2025)
Graviton4 vs x86 Performance
Throughput Improvements:
- Graviton4: 40% faster than x86 for database workloads
- Graviton3: 30% faster than x86 for Apache Cassandra
- Memory Bandwidth: 2x improvement on Graviton4
Latency Improvements:
- P99 Latency: 25% lower on Graviton4
- Memory Access: 15% faster local memory access
- Cross-NUMA: 10% faster inter-socket communication
Real-World Performance Data
# Cassandra 5.0 on r8g.48xlarge
# Write throughput: 500K ops/sec
# Read throughput: 800K ops/sec
# P99 latency: 2.5ms (vs 3.2ms on x86)
# NUMA-aware configuration results
# Memory locality: 95% local access
# CPU utilization: 85% (vs 78% without NUMA)
CPU Pinning Strategies
When to Use CPU Pinning
Recommended for:
- Mixed workloads on large instances
- Co-location with Spark, Solr, or other JVMs
- Latency-sensitive applications
- Graviton4 instances with dual NUMA
Not Recommended for:
- Single-application deployments
- Graviton3 single NUMA instances
- Small to medium instances
CPU Pinning Configuration
# Graviton4 - Pin to first NUMA node
numactl --cpunodebind=0 --membind=0 \
--pid $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")
# Check CPU affinity
taskset -p $(pgrep -f "org.apache.cassandra.service.CassandraDaemon")
# Isolate CPU cores for Cassandra
echo "isolcpus=0-95" >> /etc/default/grub
update-grub
Monitoring NUMA Performance
Key Metrics to Monitor
# Check NUMA statistics
numastat -c cassandra
# Monitor memory usage per NUMA node
watch -n 1 "numastat -p $(pgrep -f cassandra)"
# CPU utilization per NUMA node
sar -P ALL 1 5
# Memory bandwidth monitoring
pcm-memory.x -- sleep 5
CloudWatch Custom Metrics
import boto3
import subprocess
def publish_numa_metrics():
cloudwatch = boto3.client('cloudwatch')
# Get NUMA memory usage
result = subprocess.run(['numastat', '-c', 'cassandra'],
capture_output=True, text=True)
# Parse and publish metrics
for line in result.stdout.split('\n'):
if 'Node' in line:
node_id = line.split()[1]
memory_usage = float(line.split()[2])
cloudwatch.put_metric_data(
Namespace='Cassandra/NUMA',
MetricData=[
{
'MetricName': 'MemoryUsage',
'Dimensions': [
{'Name': 'NumaNode', 'Value': node_id},
{'Name': 'InstanceId', 'Value': instance_id}
],
'Value': memory_usage,
'Unit': 'Bytes'
}
]
)
Instance Selection Guide (2025)
Graviton4 Instances
R8g Family (Memory Optimized)
- Best for: Large datasets, analytics workloads
- NUMA: Dual-socket configuration
- Recommendation: Use for clusters with large partition sizes
C8g Family (Compute Optimized)
- Best for: High-throughput workloads
- NUMA: Optimized for compute-intensive operations
- Recommendation: Use for write-heavy workloads
I8g Family (Storage Optimized)
- Best for: High IOPS requirements
- NUMA: Optimized for storage throughput
- Recommendation: Use with local NVMe storage
Graviton3 vs Graviton4 Decision Matrix
Workload Type | Graviton3 | Graviton4 | Reason |
---|---|---|---|
Small clusters | ✓ | - | Single NUMA sufficient |
Large clusters | - | ✓ | Dual NUMA beneficial |
Memory-intensive | - | ✓ | Better memory bandwidth |
Cost-sensitive | ✓ | - | Lower cost per vCPU |
Performance-critical | - | ✓ | 40% performance gain |
Cassandra 5.0 Configuration Examples
Graviton4 Configuration
# cassandra.yaml for r8g.48xlarge
cluster_name: 'Graviton4Cluster'
num_tokens: 16
initial_token:
# NUMA-optimized settings
concurrent_reads: 128
concurrent_writes: 128
concurrent_counter_writes: 128
# Memory settings for dual NUMA
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 8192
memtable_offheap_space_in_mb: 8192
# Thread pool settings
native_transport_max_threads: 192
rpc_max_threads: 192
JVM Settings for NUMA
# jvm.options for Graviton4
-XX:+UseG1GC
-XX:+UseNUMA
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32m
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1MixedGCCountTarget=8
-XX:G1OldCSetRegionThresholdPercent=20
# NUMA-specific optimizations
-XX:+UnlockExperimentalVMOptions
-XX:+UseTransparentHugePages
-XX:+AlwaysPreTouch
-XX:+UseLargePages
Best Practices for 2025
NUMA Configuration
- Use numactl –hardware to understand your instance topology
- Enable JVM NUMA support with -XX:+UseNUMA
- Monitor NUMA statistics with numastat
- Consider CPU pinning for mixed workloads
- Test different configurations under your specific workload
Performance Optimization
- Choose instance types based on NUMA requirements
- Configure heap sizes appropriately for NUMA nodes
- Use local storage when possible (NVMe SSD)
- Monitor memory locality to ensure optimal performance
- Benchmark different configurations before production
Monitoring and Troubleshooting
- Track NUMA memory usage with custom metrics
- Monitor cross-NUMA traffic for performance issues
- Use profiling tools to identify memory hotspots
- Set up alerts for NUMA imbalances
- Document configurations for team knowledge sharing
Conclusion
NUMA optimization for Cassandra 5.0 on AWS has become more sophisticated with Graviton4 processors. The dual-socket NUMA configuration provides significant performance benefits for large-scale deployments, while Graviton3’s single NUMA domain offers simplicity for smaller clusters.
Key recommendations for 2025:
- Use Graviton4 instances for performance-critical workloads
- Implement proper NUMA configuration for your instance type
- Monitor NUMA statistics and optimize accordingly
- Consider CPU pinning for mixed workloads
- Test configurations thoroughly before production deployment
More info about Cassandra and AWS
Amazon provides comprehensive guidance for running Cassandra on AWS. The AWS Database Blog covers best practices, while the AWS Big Data Blog provides EC2-specific recommendations.
Instaclustr’s performance testing shows real-world Graviton performance benefits.
Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.
About Cloudurable™
Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2.
We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.
More info about Cloudurable
Please take some time to read the Advantage of using Cloudurable™.
Cloudurable provides:
- Subscription Cassandra support to streamline DevOps (Support subscription pricing for Cassandra and Kafka in AWS)
- Quickstart Mentoring Consulting for Developers and DevOps
- Architectural Analysis Consulting
- Training and mentoring for Cassandra for DevOps/DBA and Developers
- Training and mentoring for Apache Kafka for DevOps and Developers
- We specialize in AWS Cassandra deployments for organizations that are setting up Cassandra as a Service.
Authors
Written by R. Hightower and JP Azar.
Feedback
We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).
Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.
TweetApache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting