Cloudurable

Cassandra Cluster Tutorial 5 (2025) - Modern AWS Cassandra Deployment with CDK, EKS, and Infrastructure as Code

This Cassandra tutorial is designed for developers and DevOps/SRE teams who want to deploy production-ready Cassandra clusters in AWS using modern practices and tools available in 2025.

What’s New in 2025

The landscape of deploying Cassandra on AWS has evolved significantly:

AWS CDK v2 has become the standard for infrastructure as code, offering type-safe infrastructure definitions
Kubernetes operators like K8ssandra provide production-ready Cassandra deployments
AWS Graviton3 processors offer 40% better price-performance for Cassandra workloads
Container-based deployments are now the norm, with EKS Anywhere for hybrid deployments
Service mesh integration with AWS App Mesh provides advanced traffic management
AWS Systems Manager replaces bastion hosts for secure access
GitOps workflows with AWS CodeCommit and FluxCD for infrastructure management

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Cassandra 5.0 AWS CPU Requirements: Graviton4, ZGC, and Performance Optimization

What’s New in 2025

Key Updates and Changes

Cassandra 5.0: Enhanced CPU utilization with improved compaction and streaming
Graviton4 Processors: 40% better performance for database workloads
ZGC Integration: Low-latency garbage collection for improved response times
Instance Types: New I8g, R8g, C8g families optimized for Cassandra workloads
Compaction Improvements: Better concurrent compactor defaults and tuning

Major Performance Enhancements

Unified Compaction: Reduced CPU overhead in Cassandra 5.0
Vector Search: CPU-intensive operations requiring additional cores
Streaming Performance: Improved parallel processing for data migration
Memory Management: Better allocation strategies reducing CPU pressure
ARM Optimization: Native ARM64 support for Graviton processors

Cassandra 5.0 CPU Requirements in AWS Cloud

Cassandra 5.0 is highly concurrent and can utilize as many CPU cores as available when configured correctly. Understanding CPU requirements is crucial for optimal performance on AWS EC2 instances.

Cassandra 5.0 AWS Storage Requirements: GP3, I4g Instances, and Performance Optimization

What’s New in 2025

Key Updates and Changes

EBS GP3 Volumes: 20% cost savings over GP2 with independent IOPS/throughput scaling
I4g Instances: Graviton2-powered with 30TB NVMe, 15% better compute performance
I4i vs I4g: 45-60% lower cost per TB with Im4gn/Is4gen families
Unified Compaction: Cassandra 5.0 reduces storage overhead and improves I/O patterns
EBS Optimization: Enhanced throughput up to 80 Gbps on latest instance types

Storage Performance Improvements

GP3 Baseline: 3,000 IOPS and 125 MiB/s regardless of volume size
GP3 Maximum: Up to 16,000 IOPS and 1,000 MiB/s (4x faster than GP2 max)
NVMe Performance: I4g delivers up to 7.6 million IOPS per instance
EBS Elastic Volumes: Live migration between volume types without downtime
Storage Classes: New archive and deep archive tiers for long-term retention

Cassandra 5.0 AWS Storage Requirements

Cassandra 5.0 performs extensive sequential disk I/O for commit logs and SSTable writes, while requiring random I/O for read operations. The enhanced Unified Compaction strategy in Cassandra 5.0 provides more predictable I/O patterns and reduced storage overhead.

Cassandra 5.0 Cluster Setup 2025: Docker, Vagrant, and Cloud-Native DevOps

What’s New in 2025

Key Updates and Changes

Cassandra 5.0: Vector search, SAI indexes, unified compaction strategy
Container-First: Docker and Kubernetes have replaced most Vagrant workflows
Cloud-Native: Multi-cloud deployment with infrastructure as code
ARM Support: Native ARM64 support for Apple Silicon and AWS Graviton
Observability: Enhanced monitoring with OpenTelemetry and Prometheus

Major Platform Evolution

Docker Compose: Simplified multi-container orchestration
Kubernetes: Production-ready Cassandra operators
Testcontainers: Integration testing with ephemeral containers
Colima/Podman: Docker alternatives for development
GitOps: Infrastructure managed through Git workflows

The modern approach to Cassandra cluster development has evolved significantly since 2017. While Vagrant remains useful for certain scenarios, container-based development has become the standard for 2025.

Cassandra 5.0 Cluster Tutorial 2025: Ansible Automation for DevOps Tasks

What’s New in 2025

Key Updates and Changes

Cassandra 5.0: Storage Attached Indexes (SAI), Vector Search, Unified Compaction
Ansible 2.19: Event-driven automation, enhanced cloud integrations
VirtualBox Compatibility: Use 6.1.x with Vagrant 2.4.1 for stability
Security First: Ansible Vault and external secret managers now standard
Infrastructure as Code: Git-based workflows with Ansible Collections

Deprecated Features

Cassandra 3.x is end-of-life
Legacy Ansible inventory formats
Manual SSH key management (use automation)
Static inventories for cloud environments

Cassandra Tutorial: Setting up Ansible for our Cassandra Database Cluster for DevOps/DBA tasks

Ansible is a key DevOps/DBA tool for managing backups and rolling upgrades to the Cassandra cluster in AWS/EC2. Ansible uses ssh, so you do not have to install an agent to use it. In 2025, Ansible remains the preferred automation tool with improved event-driven capabilities.

Cassandra AWS System Memory Guidelines 2025: Optimizing for Modern Hardware and Workloads

System Memory Guidelines for Cassandra AWS - 2025 Edition

What’s New in 2025

The Cassandra memory landscape has evolved significantly:

Modern JVMs - Java 21 LTS with ZGC and Shenandoah GC offer sub-millisecond pause times
AWS Graviton3 - ARM-based processors with DDR5 memory provide 50% better memory bandwidth
Larger heap sizes - Modern GCs handle 100GB+ heaps efficiently
Container deployments - Memory management in Kubernetes requires different approaches
Persistent memory - Intel Optane and similar technologies blur the line between RAM and storage
Tiered storage - Hot data in memory, warm in NVMe, cold in S3
Vector search workloads - New memory requirements for AI/ML applications

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Cloud DevOps 2025: Packer, Ansible, SSH and AWS/EC2

What’s New in 2025

Key Updates and Changes

New EC2 Instance Types: M7i, C7i, and R7i families now available with up to 15% better price-performance
Packer Updates: Version 1.11 with predictable plugin loading and HCP integration
Ansible Best Practices: Enhanced aws_ec2 plugin with improved security and performance features
EBS Volume Evolution: GP3 volumes now standard, offering 20% cost savings over GP2
HashiCorp Updates: Terraform AWS Provider 6.0 with multi-region support
Security Enhancements: AWS Verified Access for SSH/RDP, enhanced IAM with ECR Policy v2

Deprecated Features and Migration Notes

GP2 to GP3 Migration: GP2 volumes should be migrated to GP3 for cost savings
EC2 Dynamic Inventory: Old ec2.py script deprecated in favor of aws_ec2 plugin
Instance Types: Consider upgrading from M6i to M7i instances for better performance
Packer AWS Builder: Continue using amazon-ebs builder with updated authentication methods

Cloud DevOps: Using Packer, Ansible/SSH and AWS command line tools to create and DBA manage EC2 Cassandra instances in AWS.

This article is useful for developers and DevOps/DBA staff who want to create AWS AMI images and manage those EC2 instances with Ansible. Although this article is part of a series about setting up the Cassandra Database images and doing DevOps/DBA with Cassandra clusters, the topics we cover apply to AWS DevOps in general - even if you don’t use Cassandra at all.

AWS Cassandra Cluster Tutorial 5: Setting up Cassandra Cluster in AWS/EC2

April 8, 2017

Cassandra Cluster Tutorial 5 - Cassandra AWS Cluster with CloudFormation, bastion host, Ansible and the aws-command line

This Cassandra tutorial is useful for developers and DevOps/DBA staff who want to launch a Cassandra cluster in AWS.

The cassandra-image project has been using Vagrant and Ansible to set up a Cassandra Cluster for local testing. Then we used Packer, Ansible and EC2. We used Packer to create AWS images in the last tutorial. In this tutorial, we will use CloudFormation to create a VPC, Subnets, security groups and more to launch a Cassandra cluster in EC2 using the AWS AMI image we created with Packer in the last article. The next two tutorials after this one, will set up Cassandra to work in multiple AZs and multiple regions using custom snitches for Cassandra.

Cassandra AWS System Memory Guidelines

March 15, 2017

System Memory Guidelines for Cassandra AWS

Basic guidelines for AWS Cassandra

Do not use less than 8GB of memory for the JVM. The more RAM the better. Use G1GC. SSTable are first stored in memory and then written to disk sequentially. The larger the SSTable the less scanning that needs to be done while reading and determining if a key is in an SSTable using a bloom filter. In the EC2 world this equates to an m4.xlarge (16GB of memory), and you need some memory for the OS, specifically the IO buffers. The i2.xlarge and d2.xlarge are the smallest in their family and exceed the min memory requirement (and then some).

AWS Cassandra: Cassandra, NUMA and EC2

AWS Cassandra and NUMA

The i3.8xlarge, c4.8xlarge, m4.10xlarge, and above EC2 instance types use more than 1 CPU, which means NUMA controls are available.

A good read on this is from Al Tolbert’s blog post.

The quickest way to tell if a machine is NUMA is to run “numactl –hardware”. -Al Tobey blog post on Cassandra tuning

NUMA stands for Non-Uniform Memory Architecture. Modern x86 CPUs contain an integrated memory controller. Multi-socket system, have two memory controllers. Each CPU gets a share of the memory. If one CPU socket needs memory that another CPU socket has, the memory is transferred. Transferring this memory between CPUs is more expensive than if the memory only existed in one CPUs memory. When a JVM thread only uses memory local to one CPU, things go fast, and if not slower (10 CPU cycles vs. 100 or some order of magnitude).

Cassandra AWS CPU Guidelines

Cassandra CPU requirements in AWS Cloud

Cassandra is highly concurrent. Cassandra nodes can uses as many CPU cores as available if configured correctly.

What are vCPUs and ECUs?

An Amazon EC2 vCPU is a hyper thread, often referred to as a virtual core. Think of it as a physical thread of execution. It is able to run one thread at a time (which of course could be swapped out).

An Amazon ECU is some made up term that AWS used to use which was the power of the Intel Pentium chip that they used on the earliest incarnations of EC2. 50 ECU would be like 50 Pentium chips from a bygone era. Ignore ECUs.

Cassandra AWS Storage Requirements

Cassandra AWS Storage Requirements

Cassandra does a lot sequential disk IO for the commit log and writing out SSTable. You still need random I/O for read operations. The more read operations that are cache misses, the more your EBS volumes need IOPS.

Cassandra writes to four areas

commit logs
SSTable
an index file
a bloom filter

Consider EC2 instance store instead of EBS for Cassandra

AWS provides EC2 instance local storage called instance storage which is not available with all EC2 instance types, and Elastic Block Store (EBS). Instance storage does not have to go over a SAN or Intranet, instead it uses the local hardware bus. Instance storage is right there on the server you are renting. The downside of EC2 instance storage is the expense, and it is not as flexible as EBS. Due to historic problems with EBS, it used to be the only real option for running Cassandra in AWS. EBS has a reputation for degrading performance over time. Some of this has likely been fixed with enhanced EBS, but instance storage is more reliable.

What is Cassandra?

What is Cassandra?

Cassandra is a linearly scalable, open source NoSQL database. Cassandra uses log-structured merge-tree, which makes Cassandra one of the best NoSQL options for high-throughput writes. Cassandra delivers continuous availability, with operational simplicity. Unlike many other NoSQL solutions, Cassandra is a master-less, peer-to-peer, distributed clustered store. Each node knows about the cluster network topology via the gossip protocol.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Part 2 Setting up Ansible and ssh for Cassandra Database Cluster DevOps

March 11, 2017

Cassandra Cluster Tutorial 3: Part 2 of 2

Setting up Ansible and SSH for our Cassandra Database Cluster for DevOps/DBA Tasks

This tutorial series centers on how DevOps/DBA tasks with the Cassandra Database. As we mentioned before, Ansible and ssh are essential DevOps/DBA tools for common DBA/DevOps tasks whilst working with Cassandra Clusters. Please read part 1 before reading part 2.

In part 1, we set up Ansible for our Cassandra Database Cluster to automate common DevOps/DBA tasks. As part of this setup, we created an ssh key and then set up our instances with this key so we could use ssh, scp, and most importantly ansible. We also created an ansible playbook to install keys on our Cassandra nodes from a bastion host that we set up with Vagrant.

Setting up Ansible/SSH for Cassandra Database Cluster DevOps Part 1

March 11, 2017

Cassandra Cluster Tutorial 3: Part 1 of 2

Setting up Ansible/SSH for our Cassandra Database Cluster for DevOps/DBA Tasks

Ansible and ssh are essential DevOps/DBA tools for common DBA/DevOps tasks like managing backups, rolling upgrades to the Cassandra cluster in AWS/EC2, and so much more. An excellent aspect of Ansible is that it uses ssh, so you do not have to install an agent to use Ansible.

This article series centers on how DevOps/DBA tasks with the Cassandra Database. However the use of Ansible for DevOps/DBA transcends its use with the Cassandra Database, so this article is good information for any DevOps/DBA or Developer that needs to manage groups of instances, boxes, hosts whether they be on-prem bare-metal, dev boxes, or in the AWS cloud. You don’t need to be setting up Cassandra to get use of this article.

Cloud DevOps: Packer, Ansible, SSH and AWS/EC2

March 3, 2017

Cloud DevOps: Using Packer, Ansible/SSH and AWS command line tools to create and DBA manage EC2 Cassandra instances in AWS.

This article is useful for developers and DevOps/DBA staff who want to create AWS AMI images and manage those EC2 instances with Ansible. Although this article is part of a series about setting up the Cassandra Database images and doing DevOps/DBA with Cassandra clusters, the topics we cover apply to AWS DevOps in general - even if you don’t use Cassandra at all.

Cassandra Cluster Tutorial: Setting up Ansible for our Cassandra Database Cluster to do DevOps tasks