Cassandra 5.0 AWS Storage Requirements: GP3, I4g Instances, and Performance Optimization

January 9, 2025

                                                                           

What’s New in 2025

Key Updates and Changes

  • EBS GP3 Volumes: 20% cost savings over GP2 with independent IOPS/throughput scaling
  • I4g Instances: Graviton2-powered with 30TB NVMe, 15% better compute performance
  • I4i vs I4g: 45-60% lower cost per TB with Im4gn/Is4gen families
  • Unified Compaction: Cassandra 5.0 reduces storage overhead and improves I/O patterns
  • EBS Optimization: Enhanced throughput up to 80 Gbps on latest instance types

Storage Performance Improvements

  • GP3 Baseline: 3,000 IOPS and 125 MiB/s regardless of volume size
  • GP3 Maximum: Up to 16,000 IOPS and 1,000 MiB/s (4x faster than GP2 max)
  • NVMe Performance: I4g delivers up to 7.6 million IOPS per instance
  • EBS Elastic Volumes: Live migration between volume types without downtime
  • Storage Classes: New archive and deep archive tiers for long-term retention

Cassandra 5.0 AWS Storage Requirements

Cassandra 5.0 performs extensive sequential disk I/O for commit logs and SSTable writes, while requiring random I/O for read operations. The enhanced Unified Compaction strategy in Cassandra 5.0 provides more predictable I/O patterns and reduced storage overhead.

Cassandra Storage Architecture

Cassandra writes to five key areas:

  • Commit logs - Sequential writes, high throughput required
  • SSTables - Large sequential writes during flush and compaction
  • Index files - Random access patterns for key lookups
  • Bloom filters - Small files, frequent access
  • Vector indexes - New in 5.0, CPU and storage intensive

EBS GP3 vs GP2: The Clear Winner for 2025

GP3 Advantages for Cassandra

Cost Efficiency: 20% lower price per GB than GP2 Performance Baseline: 3,000 IOPS and 125 MiB/s regardless of size Independent Scaling: Scale IOPS and throughput separately from capacity Predictable Performance: No burst credits like GP2

GP3 Configuration Examples

# High IOPS configuration for read-heavy workloads
aws ec2 create-volume \
  --size 1000 \
  --volume-type gp3 \
  --iops 16000 \
  --throughput 1000 \
  --availability-zone us-west-2a

# Balanced configuration for general Cassandra workloads
aws ec2 create-volume \
  --size 500 \
  --volume-type gp3 \
  --iops 8000 \
  --throughput 500 \
  --availability-zone us-west-2a

GP2 vs GP3 Comparison

Feature GP2 GP3
Baseline IOPS 3 IOPS/GB (min 100) 3,000 IOPS
Max IOPS 16,000 16,000
Baseline Throughput 128-250 MiB/s 125 MiB/s
Max Throughput 250 MiB/s 1,000 MiB/s
Performance Scaling Tied to volume size Independent
Burst Credits Yes No
Price Higher 20% lower

I4g Instances: Graviton2-Powered Storage Champions

I4g Instance Family Benefits

Performance: 15% better compute performance than similar storage instances Storage: Up to 30TB local NVMe storage using AWS Nitro SSDs Architecture: Graviton2 ARM processors with optimized memory bandwidth Cost: Better price-performance for storage-optimized workloads

I4g Instance Specifications

# i4g.large
vCPUs: 2
Memory: 16 GiB
Storage: 1 x 468 GB NVMe SSD
Network: Up to 10 Gbps

# i4g.xlarge  
vCPUs: 4
Memory: 32 GiB
Storage: 1 x 937 GB NVMe SSD
Network: Up to 10 Gbps

# i4g.2xlarge
vCPUs: 8
Memory: 64 GiB
Storage: 1 x 1,875 GB NVMe SSD
Network: Up to 12 Gbps

# i4g.4xlarge
vCPUs: 16
Memory: 128 GiB
Storage: 1 x 3,750 GB NVMe SSD
Network: Up to 25 Gbps

Cost-Optimized Alternatives

Im4gn Instances: 45% lower cost per TB than I4i Is4gen Instances: 60% lower cost per TB than I4i Trade-off: Lower compute performance for significant storage cost savings

EBS vs Instance Storage Decision Matrix (2025)

Choose EBS GP3 When:

  • Flexibility needed: Snapshots, volume resizing, cross-AZ mounting
  • Cost optimization: High IOPS without proportional storage capacity
  • Operational simplicity: Managed durability and availability
  • Dynamic scaling: Ability to adjust performance without instance changes

Choose NVMe Instance Storage When:

  • Maximum performance: Highest IOPS and lowest latency required
  • Predictable workload: Consistent high I/O requirements
  • Cost at scale: Large storage capacity with high performance needs
  • Temporary data: Acceptable data loss on instance termination

Storage Configuration Best Practices

Separate Volumes for Different Workloads

# Commit log volume (high sequential writes)
aws ec2 create-volume \
  --size 100 \
  --volume-type gp3 \
  --iops 4000 \
  --throughput 250 \
  --availability-zone us-west-2a

# Data volume (mixed I/O patterns)
aws ec2 create-volume \
  --size 1000 \
  --volume-type gp3 \
  --iops 12000 \
  --throughput 750 \
  --availability-zone us-west-2a

File System Optimization

XFS Configuration (Recommended):

# Create XFS with optimal settings for Cassandra
sudo mkfs.xfs -f -K -d agcount=64 /dev/nvme1n1

# Mount with performance options
sudo mount -o noatime,nobarrier,logbufs=8,logbsize=32k,largeio,inode64,swalloc /dev/nvme1n1 /cassandra/data

EXT4 Alternative:

# Create EXT4 with large inodes
sudo mkfs.ext4 -F -O extent,uninit_bg,dir_index -E lazy_itable_init=1 -I 256 /dev/nvme1n1

# Mount with performance options  
sudo mount -o noatime,data=writeback,barrier=0,nobh /dev/nvme1n1 /cassandra/data

Cassandra 5.0 Storage Configuration

Unified Compaction Strategy Configuration

# cassandra.yaml - Optimized for storage efficiency
compaction:
  class_name: UnifiedCompactionStrategy
  num_shards: 8  # Match number of data directories
  target_sstable_size_in_mb: 1024
  base_time_seconds: 60
  overlap_inclusion_criteria: breadth_first

# Data directory configuration
data_file_directories:
  - /cassandra/data1
  - /cassandra/data2
  - /cassandra/data3
  - /cassandra/data4

# Commit log on separate volume
commitlog_directory: /cassandra/commitlog

Memory Table Configuration

# Optimized for GP3 performance
memtable_allocation_type: offheap_objects
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 4096
memtable_flush_writers: 8  # Match vCPU count

Performance Monitoring and Optimization

EBS Performance Monitoring

# Monitor EBS metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/EBS \
  --metric-name VolumeReadOps \
  --dimensions Name=VolumeId,Value=vol-1234567890abcdef0 \
  --start-time 2025-01-09T00:00:00Z \
  --end-time 2025-01-09T01:00:00Z \
  --period 300 \
  --statistics Sum

# Check for EBS optimization
aws ec2 describe-instance-attribute \
  --instance-id i-1234567890abcdef0 \
  --attribute ebsOptimized

Storage Performance Testing

# Test sequential write performance (commit log)
fio --name=seq-write --rw=write --bs=64k --size=10G --numjobs=1 --runtime=300

# Test random read performance (SSTable reads)
fio --name=rand-read --rw=randread --bs=4k --size=10G --numjobs=8 --runtime=300

# Test mixed workload
fio --name=mixed --rw=randrw --rwmixread=70 --bs=4k --size=10G --numjobs=4 --runtime=300

Encryption and Security

EBS Encryption with KMS

# Create encrypted GP3 volume
aws ec2 create-volume \
  --size 1000 \
  --volume-type gp3 \
  --iops 10000 \
  --throughput 500 \
  --encrypted \
  --kms-key-id arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012 \
  --availability-zone us-west-2a

Performance Impact of Encryption

EBS Encryption: No performance impact (hardware-accelerated) Instance Store Encryption: Use dm-crypt with minimal overhead Application-level Encryption: Avoid due to 20-30% CPU overhead

Cost Optimization Strategies

GP3 Cost Optimization

# Calculate GP3 vs GP2 costs
def calculate_storage_costs(volume_size_gb, iops_required, throughput_mb):
    # GP2 pricing (must size for IOPS)
    gp2_size_for_iops = max(volume_size_gb, iops_required / 3)
    gp2_cost = gp2_size_for_iops * 0.10  # $0.10 per GB

    # GP3 pricing (independent sizing)
    gp3_base_cost = volume_size_gb * 0.08  # $0.08 per GB
    extra_iops = max(0, iops_required - 3000)
    extra_throughput = max(0, throughput_mb - 125)
    
    gp3_iops_cost = extra_iops * 0.005  # $0.005 per IOPS
    gp3_throughput_cost = extra_throughput * 0.040  # $0.040 per MB/s
    
    gp3_total_cost = gp3_base_cost + gp3_iops_cost + gp3_throughput_cost
    
    savings = gp2_cost - gp3_total_cost
    return gp2_cost, gp3_total_cost, savings

# Example: 500GB volume with 8000 IOPS requirement
gp2_cost, gp3_cost, savings = calculate_storage_costs(500, 8000, 400)
print(f"GP2 cost: ${gp2_cost:.2f}")
print(f"GP3 cost: ${gp3_cost:.2f}") 
print(f"Monthly savings: ${savings:.2f}")

Instance Selection for Cost Efficiency

# Small clusters (< 1TB per node)
Instance: i4g.2xlarge
Storage: 1 x 1.875 TB NVMe
Use case: High performance, moderate capacity

# Medium clusters (1-4TB per node)  
Instance: m6i.4xlarge + GP3 volumes
Storage: Multiple GP3 volumes
Use case: Flexible capacity, good performance

# Large clusters (> 4TB per node)
Instance: i4g.8xlarge  
Storage: 2 x 7.5 TB NVMe
Use case: Maximum performance and capacity

Advanced Storage Patterns

JBOD Configuration for Maximum Performance

# Multiple data directories for parallel I/O
data_file_directories:
  - /cassandra/data1  # GP3 volume 1
  - /cassandra/data2  # GP3 volume 2
  - /cassandra/data3  # GP3 volume 3
  - /cassandra/data4  # GP3 volume 4

# Each volume independently optimized
# Volume 1-4: 500GB each, 4000 IOPS, 250 MB/s
# Total: 2TB, 16000 IOPS, 1000 MB/s

Tiered Storage Strategy

# Hot data: GP3 with high IOPS
aws ec2 create-volume --size 500 --volume-type gp3 --iops 16000 --throughput 1000

# Warm data: GP3 standard performance  
aws ec2 create-volume --size 2000 --volume-type gp3 --iops 3000 --throughput 125

# Cold data: SC1 for archival
aws ec2 create-volume --size 10000 --volume-type sc1

Migration Strategies

GP2 to GP3 Migration

# Live migration without downtime
aws ec2 modify-volume \
  --volume-id vol-1234567890abcdef0 \
  --volume-type gp3 \
  --iops 8000 \
  --throughput 500

# Monitor migration progress
aws ec2 describe-volumes-modifications \
  --volume-ids vol-1234567890abcdef0

Instance Store to EBS Migration

# 1. Create EBS snapshots of data
nodetool snapshot

# 2. Copy snapshots to EBS volumes
aws s3 sync /cassandra/data s3://backup-bucket/snapshots/

# 3. Restore on new EBS-backed instances
aws s3 sync s3://backup-bucket/snapshots/ /cassandra/data/

We hope this blog post on AWS Storage requirements for Cassandra 5.0 running in EC2/AWS is helpful. Cloudurable specializes in AWS DevOps Automation for Cassandra and Kafka. Cloudurable provides Cassandra consulting and Kafka consulting to get you setup fast in AWS with CloudFormation and CloudWatch. Check out our AWS-centric Cassandra training and Kafka training.

Storage Scaling Strategies

Read Performance Optimization

Multi-pronged approach for scaling Cassandra read speeds:

  1. Horizontal scaling: Add more nodes to distribute load
  2. Instance store upgrade: Move to I4g instances for maximum IOPS
  3. EBS optimization: Use GP3 with provisioned IOPS (up to 16,000)
  4. JBOD configuration: Multiple volumes per node for parallel I/O
  5. Cache optimization: Increase key cache and row cache sizes
  6. Query optimization: Improve partition key design and query patterns
  7. Materialized views: Create optimized read paths for specific queries

Write Performance Optimization

# Cassandra 5.0 write optimization
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32

# Memtable configuration for high writes
memtable_allocation_type: offheap_objects
memtable_flush_writers: 8  # Match instance vCPU count

Monitoring and Alerting

CloudWatch Custom Metrics

import boto3
import psutil

def publish_storage_metrics():
    cloudwatch = boto3.client('cloudwatch')
    
    # Disk utilization
    disk_usage = psutil.disk_usage('/cassandra/data')
    utilization = (disk_usage.used / disk_usage.total) * 100
    
    cloudwatch.put_metric_data(
        Namespace='Cassandra/Storage',
        MetricData=[
            {
                'MetricName': 'DiskUtilization',
                'Value': utilization,
                'Unit': 'Percent'
            }
        ]
    )
    
    # I/O metrics
    io_stats = psutil.disk_io_counters()
    cloudwatch.put_metric_data(
        Namespace='Cassandra/Storage',
        MetricData=[
            {
                'MetricName': 'ReadIOPS',
                'Value': io_stats.read_count,
                'Unit': 'Count'
            },
            {
                'MetricName': 'WriteIOPS', 
                'Value': io_stats.write_count,
                'Unit': 'Count'
            }
        ]
    )

Performance Thresholds

# CloudWatch Alarms
DiskUtilization: > 80%
ReadLatency: > 10ms P99
WriteLatency: > 5ms P99  
IOWait: > 20%
QueueDepth: > 32

Best Practices Summary

Storage Type Selection

  1. Default choice: EBS GP3 for flexibility and cost efficiency
  2. High performance: I4g instances for maximum IOPS
  3. Cost optimization: Im4gn/Is4gen for large capacity needs
  4. Hybrid approach: Combine GP3 and instance store strategically

Configuration Optimization

  1. Separate volumes: Isolate commit logs from data volumes
  2. File system: Use XFS with optimized mount options
  3. JBOD: Multiple volumes for parallel I/O when possible
  4. Encryption: Use EBS encryption for data at rest security

Monitoring Requirements

  1. Disk utilization: Monitor space usage and growth trends
  2. I/O metrics: Track IOPS, throughput, and latency
  3. Compaction: Monitor compaction queue and duration
  4. Performance: Regular load testing and optimization

The combination of GP3 volumes and I4g instances provides the best price-performance for Cassandra 5.0 workloads in 2025, offering significant cost savings while maintaining high performance for demanding database applications.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

More info about Cassandra and AWS

Read more about Cassandra AWS with this slide deck.

Amazon provides comprehensive guidance for running Cassandra with EBS.

About Cloudurable™

Cloudurable™: streamline DevOps/DBA for Cassandra running on AWS. Cloudurable™ provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support Cassandra in production running in EC2.

We also teach advanced Cassandra courses which teaches how one could develop, support and deploy Cassandra to production in AWS EC2 for Developers and DevOps/DBA. We also provide Cassandra consulting and Cassandra training.

More info about Cloudurable

Please take some time to read the Advantage of using Cloudurable™.

Cloudurable provides:

Authors

Written by R. Hightower and JP Azar.

References

Feedback


We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting