January 9, 2025
What’s New in 2025
Key Updates and Changes
- EBS GP3 Volumes: 20% cost savings over GP2 with independent IOPS/throughput scaling
- I4g Instances: Graviton2-powered with 30TB NVMe, 15% better compute performance
- I4i vs I4g: 45-60% lower cost per TB with Im4gn/Is4gen families
- Unified Compaction: Cassandra 5.0 reduces storage overhead and improves I/O patterns
- EBS Optimization: Enhanced throughput up to 80 Gbps on latest instance types
Storage Performance Improvements
- GP3 Baseline: 3,000 IOPS and 125 MiB/s regardless of volume size
- GP3 Maximum: Up to 16,000 IOPS and 1,000 MiB/s (4x faster than GP2 max)
- NVMe Performance: I4g delivers up to 7.6 million IOPS per instance
- EBS Elastic Volumes: Live migration between volume types without downtime
- Storage Classes: New archive and deep archive tiers for long-term retention
Cassandra 5.0 AWS Storage Requirements
Cassandra 5.0 performs extensive sequential disk I/O for commit logs and SSTable writes, while requiring random I/O for read operations. The enhanced Unified Compaction strategy in Cassandra 5.0 provides more predictable I/O patterns and reduced storage overhead.
Cassandra Storage Architecture
Cassandra writes to five key areas:
- Commit logs - Sequential writes, high throughput required
- SSTables - Large sequential writes during flush and compaction
- Index files - Random access patterns for key lookups
- Bloom filters - Small files, frequent access
- Vector indexes - New in 5.0, CPU and storage intensive
EBS GP3 vs GP2: The Clear Winner for 2025
GP3 Advantages for Cassandra
Cost Efficiency: 20% lower price per GB than GP2 Performance Baseline: 3,000 IOPS and 125 MiB/s regardless of size Independent Scaling: Scale IOPS and throughput separately from capacity Predictable Performance: No burst credits like GP2
GP3 Configuration Examples
# High IOPS configuration for read-heavy workloads
aws ec2 create-volume \
--size 1000 \
--volume-type gp3 \
--iops 16000 \
--throughput 1000 \
--availability-zone us-west-2a
# Balanced configuration for general Cassandra workloads
aws ec2 create-volume \
--size 500 \
--volume-type gp3 \
--iops 8000 \
--throughput 500 \
--availability-zone us-west-2a
GP2 vs GP3 Comparison
Feature | GP2 | GP3 |
---|---|---|
Baseline IOPS | 3 IOPS/GB (min 100) | 3,000 IOPS |
Max IOPS | 16,000 | 16,000 |
Baseline Throughput | 128-250 MiB/s | 125 MiB/s |
Max Throughput | 250 MiB/s | 1,000 MiB/s |
Performance Scaling | Tied to volume size | Independent |
Burst Credits | Yes | No |
Price | Higher | 20% lower |
I4g Instances: Graviton2-Powered Storage Champions
I4g Instance Family Benefits
Performance: 15% better compute performance than similar storage instances Storage: Up to 30TB local NVMe storage using AWS Nitro SSDs Architecture: Graviton2 ARM processors with optimized memory bandwidth Cost: Better price-performance for storage-optimized workloads
I4g Instance Specifications
# i4g.large
vCPUs: 2
Memory: 16 GiB
Storage: 1 x 468 GB NVMe SSD
Network: Up to 10 Gbps
# i4g.xlarge
vCPUs: 4
Memory: 32 GiB
Storage: 1 x 937 GB NVMe SSD
Network: Up to 10 Gbps
# i4g.2xlarge
vCPUs: 8
Memory: 64 GiB
Storage: 1 x 1,875 GB NVMe SSD
Network: Up to 12 Gbps
# i4g.4xlarge
vCPUs: 16
Memory: 128 GiB
Storage: 1 x 3,750 GB NVMe SSD
Network: Up to 25 Gbps
Cost-Optimized Alternatives
Im4gn Instances: 45% lower cost per TB than I4i Is4gen Instances: 60% lower cost per TB than I4i Trade-off: Lower compute performance for significant storage cost savings
EBS vs Instance Storage Decision Matrix (2025)
Choose EBS GP3 When:
- Flexibility needed: Snapshots, volume resizing, cross-AZ mounting
- Cost optimization: High IOPS without proportional storage capacity
- Operational simplicity: Managed durability and availability
- Dynamic scaling: Ability to adjust performance without instance changes
Choose NVMe Instance Storage When:
- Maximum performance: Highest IOPS and lowest latency required
- Predictable workload: Consistent high I/O requirements
- Cost at scale: Large storage capacity with high performance needs
- Temporary data: Acceptable data loss on instance termination
Storage Configuration Best Practices
Separate Volumes for Different Workloads
# Commit log volume (high sequential writes)
aws ec2 create-volume \
--size 100 \
--volume-type gp3 \
--iops 4000 \
--throughput 250 \
--availability-zone us-west-2a
# Data volume (mixed I/O patterns)
aws ec2 create-volume \
--size 1000 \
--volume-type gp3 \
--iops 12000 \
--throughput 750 \
--availability-zone us-west-2a
File System Optimization
XFS Configuration (Recommended):
# Create XFS with optimal settings for Cassandra
sudo mkfs.xfs -f -K -d agcount=64 /dev/nvme1n1
# Mount with performance options
sudo mount -o noatime,nobarrier,logbufs=8,logbsize=32k,largeio,inode64,swalloc /dev/nvme1n1 /cassandra/data
EXT4 Alternative:
# Create EXT4 with large inodes
sudo mkfs.ext4 -F -O extent,uninit_bg,dir_index -E lazy_itable_init=1 -I 256 /dev/nvme1n1
# Mount with performance options
sudo mount -o noatime,data=writeback,barrier=0,nobh /dev/nvme1n1 /cassandra/data
Cassandra 5.0 Storage Configuration
Unified Compaction Strategy Configuration
# cassandra.yaml - Optimized for storage efficiency
compaction:
class_name: UnifiedCompactionStrategy
num_shards: 8 # Match number of data directories
target_sstable_size_in_mb: 1024
base_time_seconds: 60
overlap_inclusion_criteria: breadth_first
# Data directory configuration
data_file_directories:
- /cassandra/data1
- /cassandra/data2
- /cassandra/data3
- /cassandra/data4
# Commit log on separate volume
commitlog_directory: /cassandra/commitlog
Memory Table Configuration
# Optimized for GP3 performance
memtable_allocation_type: offheap_objects
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 4096
memtable_flush_writers: 8 # Match vCPU count
Performance Monitoring and Optimization
EBS Performance Monitoring
# Monitor EBS metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name VolumeReadOps \
--dimensions Name=VolumeId,Value=vol-1234567890abcdef0 \
--start-time 2025-01-09T00:00:00Z \
--end-time 2025-01-09T01:00:00Z \
--period 300 \
--statistics Sum
# Check for EBS optimization
aws ec2 describe-instance-attribute \
--instance-id i-1234567890abcdef0 \
--attribute ebsOptimized
Storage Performance Testing
# Test sequential write performance (commit log)
fio --name=seq-write --rw=write --bs=64k --size=10G --numjobs=1 --runtime=300
# Test random read performance (SSTable reads)
fio --name=rand-read --rw=randread --bs=4k --size=10G --numjobs=8 --runtime=300
# Test mixed workload
fio --name=mixed --rw=randrw --rwmixread=70 --bs=4k --size=10G --numjobs=4 --runtime=300
Encryption and Security
EBS Encryption with KMS
# Create encrypted GP3 volume
aws ec2 create-volume \
--size 1000 \
--volume-type gp3 \
--iops 10000 \
--throughput 500 \
--encrypted \
--kms-key-id arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012 \
--availability-zone us-west-2a
Performance Impact of Encryption
EBS Encryption: No performance impact (hardware-accelerated) Instance Store Encryption: Use dm-crypt with minimal overhead Application-level Encryption: Avoid due to 20-30% CPU overhead
Cost Optimization Strategies
GP3 Cost Optimization
# Calculate GP3 vs GP2 costs
def calculate_storage_costs(volume_size_gb, iops_required, throughput_mb):
# GP2 pricing (must size for IOPS)
gp2_size_for_iops = max(volume_size_gb, iops_required / 3)
gp2_cost = gp2_size_for_iops * 0.10 # $0.10 per GB
# GP3 pricing (independent sizing)
gp3_base_cost = volume_size_gb * 0.08 # $0.08 per GB
extra_iops = max(0, iops_required - 3000)
extra_throughput = max(0, throughput_mb - 125)
gp3_iops_cost = extra_iops * 0.005 # $0.005 per IOPS
gp3_throughput_cost = extra_throughput * 0.040 # $0.040 per MB/s
gp3_total_cost = gp3_base_cost + gp3_iops_cost + gp3_throughput_cost
savings = gp2_cost - gp3_total_cost
return gp2_cost, gp3_total_cost, savings
# Example: 500GB volume with 8000 IOPS requirement
gp2_cost, gp3_cost, savings = calculate_storage_costs(500, 8000, 400)
print(f"GP2 cost: ${gp2_cost:.2f}")
print(f"GP3 cost: ${gp3_cost:.2f}")
print(f"Monthly savings: ${savings:.2f}")
Instance Selection for Cost Efficiency
# Small clusters (< 1TB per node)
Instance: i4g.2xlarge
Storage: 1 x 1.875 TB NVMe
Use case: High performance, moderate capacity
# Medium clusters (1-4TB per node)
Instance: m6i.4xlarge + GP3 volumes
Storage: Multiple GP3 volumes
Use case: Flexible capacity, good performance
# Large clusters (> 4TB per node)
Instance: i4g.8xlarge
Storage: 2 x 7.5 TB NVMe
Use case: Maximum performance and capacity
Advanced Storage Patterns
JBOD Configuration for Maximum Performance
# Multiple data directories for parallel I/O
data_file_directories:
- /cassandra/data1 # GP3 volume 1
- /cassandra/data2 # GP3 volume 2
- /cassandra/data3 # GP3 volume 3
- /cassandra/data4 # GP3 volume 4
# Each volume independently optimized
# Volume 1-4: 500GB each, 4000 IOPS, 250 MB/s
# Total: 2TB, 16000 IOPS, 1000 MB/s
Tiered Storage Strategy
# Hot data: GP3 with high IOPS
aws ec2 create-volume --size 500 --volume-type gp3 --iops 16000 --throughput 1000
# Warm data: GP3 standard performance
aws ec2 create-volume --size 2000 --volume-type gp3 --iops 3000 --throughput 125
# Cold data: SC1 for archival
aws ec2 create-volume --size 10000 --volume-type sc1
Migration Strategies
GP2 to GP3 Migration
# Live migration without downtime
aws ec2 modify-volume \
--volume-id vol-1234567890abcdef0 \
--volume-type gp3 \
--iops 8000 \
--throughput 500
# Monitor migration progress
aws ec2 describe-volumes-modifications \
--volume-ids vol-1234567890abcdef0
Instance Store to EBS Migration
# 1. Create EBS snapshots of data
nodetool snapshot
# 2. Copy snapshots to EBS volumes
aws s3 sync /cassandra/data s3://backup-bucket/snapshots/
# 3. Restore on new EBS-backed instances
aws s3 sync s3://backup-bucket/snapshots/ /cassandra/data/
We hope this blog post on AWS Storage requirements for Cassandra 5.0 running in EC2/AWS is helpful. Cloudurable specializes in AWS DevOps Automation for Cassandra and Kafka. Cloudurable provides Cassandra consulting and Kafka consulting to get you setup fast in AWS with CloudFormation and CloudWatch. Check out our AWS-centric Cassandra training and Kafka training.
Storage Scaling Strategies
Read Performance Optimization
Multi-pronged approach for scaling Cassandra read speeds:
- Horizontal scaling: Add more nodes to distribute load
- Instance store upgrade: Move to I4g instances for maximum IOPS
- EBS optimization: Use GP3 with provisioned IOPS (up to 16,000)
- JBOD configuration: Multiple volumes per node for parallel I/O
- Cache optimization: Increase key cache and row cache sizes
- Query optimization: Improve partition key design and query patterns
- Materialized views: Create optimized read paths for specific queries
Write Performance Optimization
# Cassandra 5.0 write optimization
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
# Memtable configuration for high writes
memtable_allocation_type: offheap_objects
memtable_flush_writers: 8 # Match instance vCPU count
Monitoring and Alerting
CloudWatch Custom Metrics
import boto3
import psutil
def publish_storage_metrics():
cloudwatch = boto3.client('cloudwatch')
# Disk utilization
disk_usage = psutil.disk_usage('/cassandra/data')
utilization = (disk_usage.used / disk_usage.total) * 100
cloudwatch.put_metric_data(
Namespace='Cassandra/Storage',
MetricData=[
{
'MetricName': 'DiskUtilization',
'Value': utilization,
'Unit': 'Percent'
}
]
)
# I/O metrics
io_stats = psutil.disk_io_counters()
cloudwatch.put_metric_data(
Namespace='Cassandra/Storage',
MetricData=[
{
'MetricName': 'ReadIOPS',
'Value': io_stats.read_count,
'Unit': 'Count'
},
{
'MetricName': 'WriteIOPS',
'Value': io_stats.write_count,
'Unit': 'Count'
}
]
)
Performance Thresholds
# CloudWatch Alarms
DiskUtilization: > 80%
ReadLatency: > 10ms P99
WriteLatency: > 5ms P99
IOWait: > 20%
QueueDepth: > 32
Best Practices Summary
Storage Type Selection
- Default choice: EBS GP3 for flexibility and cost efficiency
- High performance: I4g instances for maximum IOPS
- Cost optimization: Im4gn/Is4gen for large capacity needs
- Hybrid approach: Combine GP3 and instance store strategically
Configuration Optimization
- Separate volumes: Isolate commit logs from data volumes
- File system: Use XFS with optimized mount options
- JBOD: Multiple volumes for parallel I/O when possible
- Encryption: Use EBS encryption for data at rest security
Monitoring Requirements
- Disk utilization: Monitor space usage and growth trends
- I/O metrics: Track IOPS, throughput, and latency
- Compaction: Monitor compaction queue and duration
- Performance: Regular load testing and optimization
The combination of GP3 volumes and I4g instances provides the best price-performance for Cassandra 5.0 workloads in 2025, offering significant cost savings while maintaining high performance for demanding database applications.
Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.
More info about Cassandra and AWS
Read more about Cassandra AWS with this slide deck.
Amazon provides comprehensive guidance for running Cassandra with EBS.
About Cloudurable™
Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2.
We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.
More info about Cloudurable
Please take some time to read the Advantage of using Cloudurable™.
Cloudurable provides:
- Subscription Cassandra support to streamline DevOps (Support subscription pricing for Cassandra and Kafka in AWS)
- Quickstart Mentoring Consulting for Developers and DevOps
- Architectural Analysis Consulting
- Training and mentoring for Cassandra for DevOps/DBA and Developers
- Training and mentoring for Kafka for DevOps and Developers
- We specialize in AWS Cassandra deployments for organizations that are setting up Cassandra as a Service.
Authors
Written by R. Hightower and JP Azar.
References
- AWS EBS GP3 vs GP2 Comparison
- Amazon EBS Volume Types Guide
- I4g Instance Specifications
- EBS Optimization Best Practices
- Cassandra 5.0 Storage Features
Feedback
We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).
Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.
TweetApache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting