Cloudurable

AWS Cassandra 2025: Cassandra 5.0, NUMA, and Graviton4 Performance Guide

What’s New in 2025

Key Updates and Changes

  • Graviton4 Processors: Up to 40% faster than Graviton3 for databases, 192 cores at 2.8 GHz
  • NUMA Evolution: Two-socket NUMA memory clustering on Graviton4 for improved performance
  • Cassandra 5.0: Enhanced NUMA awareness with improved memory management
  • Instance Types: New R8g, X8g, C8g, M8g, I8g instances with better NUMA support
  • Performance Gains: 30-40% improvement over x86 for Cassandra workloads

Major Architecture Changes

  • Single NUMA Domain: Graviton3 maintains single NUMA domain simplicity
  • Dual NUMA Support: Graviton4 introduces two-socket NUMA clustering
  • Memory Bandwidth: Improved memory controller performance across generations
  • Core Density: Up to 192 physical cores per instance (R8g.48xlarge)
  • ARM Optimization: Better Java performance on ARM architecture

AWS Cassandra 2025 and NUMA Architecture

In 2025, AWS has significantly evolved its NUMA (Non-Uniform Memory Access) support with Graviton4 processors. Understanding NUMA is crucial for optimizing Cassandra 5.0 performance on modern EC2 instances.

Continue reading

AWS Cassandra Cluster Tutorial 5 (2025): Modern Cassandra Deployment with CDK, EKS, and Infrastructure as Code

Cassandra Cluster Tutorial 5 (2025) - Modern AWS Cassandra Deployment with CDK, EKS, and Infrastructure as Code

This Cassandra tutorial is designed for developers and DevOps/SRE teams who want to deploy production-ready Cassandra clusters in AWS using modern practices and tools available in 2025.

What’s New in 2025

The landscape of deploying Cassandra on AWS has evolved significantly:

  1. AWS CDK v2 has become the standard for infrastructure as code, offering type-safe infrastructure definitions
  2. Kubernetes operators like K8ssandra provide production-ready Cassandra deployments
  3. AWS Graviton3 processors offer 40% better price-performance for Cassandra workloads
  4. Container-based deployments are now the norm, with EKS Anywhere for hybrid deployments
  5. Service mesh integration with AWS App Mesh provides advanced traffic management
  6. AWS Systems Manager replaces bastion hosts for secure access
  7. GitOps workflows with AWS CodeCommit and FluxCD for infrastructure management

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Cassandra 5.0 AWS CPU Requirements: Graviton4, ZGC, and Performance Optimization

What’s New in 2025

Key Updates and Changes

  • Cassandra 5.0: Enhanced CPU utilization with improved compaction and streaming
  • Graviton4 Processors: 40% better performance for database workloads
  • ZGC Integration: Low-latency garbage collection for improved response times
  • Instance Types: New I8g, R8g, C8g families optimized for Cassandra workloads
  • Compaction Improvements: Better concurrent compactor defaults and tuning

Major Performance Enhancements

  • Unified Compaction: Reduced CPU overhead in Cassandra 5.0
  • Vector Search: CPU-intensive operations requiring additional cores
  • Streaming Performance: Improved parallel processing for data migration
  • Memory Management: Better allocation strategies reducing CPU pressure
  • ARM Optimization: Native ARM64 support for Graviton processors

Cassandra 5.0 CPU Requirements in AWS Cloud

Cassandra 5.0 is highly concurrent and can utilize as many CPU cores as available when configured correctly. Understanding CPU requirements is crucial for optimal performance on AWS EC2 instances.

Continue reading

Cassandra 5.0 AWS Storage Requirements: GP3, I4g Instances, and Performance Optimization

What’s New in 2025

Key Updates and Changes

  • EBS GP3 Volumes: 20% cost savings over GP2 with independent IOPS/throughput scaling
  • I4g Instances: Graviton2-powered with 30TB NVMe, 15% better compute performance
  • I4i vs I4g: 45-60% lower cost per TB with Im4gn/Is4gen families
  • Unified Compaction: Cassandra 5.0 reduces storage overhead and improves I/O patterns
  • EBS Optimization: Enhanced throughput up to 80 Gbps on latest instance types

Storage Performance Improvements

  • GP3 Baseline: 3,000 IOPS and 125 MiB/s regardless of volume size
  • GP3 Maximum: Up to 16,000 IOPS and 1,000 MiB/s (4x faster than GP2 max)
  • NVMe Performance: I4g delivers up to 7.6 million IOPS per instance
  • EBS Elastic Volumes: Live migration between volume types without downtime
  • Storage Classes: New archive and deep archive tiers for long-term retention

Cassandra 5.0 AWS Storage Requirements

Cassandra 5.0 performs extensive sequential disk I/O for commit logs and SSTable writes, while requiring random I/O for read operations. The enhanced Unified Compaction strategy in Cassandra 5.0 provides more predictable I/O patterns and reduced storage overhead.

Continue reading

Cassandra 5.0 Cluster Setup 2025: Docker, Vagrant, and Cloud-Native DevOps

What’s New in 2025

Key Updates and Changes

  • Cassandra 5.0: Vector search, SAI indexes, unified compaction strategy
  • Container-First: Docker and Kubernetes have replaced most Vagrant workflows
  • Cloud-Native: Multi-cloud deployment with infrastructure as code
  • ARM Support: Native ARM64 support for Apple Silicon and AWS Graviton
  • Observability: Enhanced monitoring with OpenTelemetry and Prometheus

Major Platform Evolution

  • Docker Compose: Simplified multi-container orchestration
  • Kubernetes: Production-ready Cassandra operators
  • Testcontainers: Integration testing with ephemeral containers
  • Colima/Podman: Docker alternatives for development
  • GitOps: Infrastructure managed through Git workflows

The modern approach to Cassandra cluster development has evolved significantly since 2017. While Vagrant remains useful for certain scenarios, container-based development has become the standard for 2025.

Continue reading

Cassandra 5.0 Cluster Tutorial 2025: Ansible Automation for DevOps Tasks

What’s New in 2025

Key Updates and Changes

  • Cassandra 5.0: Storage Attached Indexes (SAI), Vector Search, Unified Compaction
  • Ansible 2.19: Event-driven automation, enhanced cloud integrations
  • VirtualBox Compatibility: Use 6.1.x with Vagrant 2.4.1 for stability
  • Security First: Ansible Vault and external secret managers now standard
  • Infrastructure as Code: Git-based workflows with Ansible Collections

Deprecated Features

  • Cassandra 3.x is end-of-life
  • Legacy Ansible inventory formats
  • Manual SSH key management (use automation)
  • Static inventories for cloud environments

Cassandra Tutorial: Setting up Ansible for our Cassandra Database Cluster for DevOps/DBA tasks

Ansible is a key DevOps/DBA tool for managing backups and rolling upgrades to the Cassandra cluster in AWS/EC2. Ansible uses ssh, so you do not have to install an agent to use it. In 2025, Ansible remains the preferred automation tool with improved event-driven capabilities.

Continue reading

Cassandra AWS System Memory Guidelines 2025: Optimizing for Modern Hardware and Workloads

System Memory Guidelines for Cassandra AWS - 2025 Edition

What’s New in 2025

The Cassandra memory landscape has evolved significantly:

  1. Modern JVMs - Java 21 LTS with ZGC and Shenandoah GC offer sub-millisecond pause times
  2. AWS Graviton3 - ARM-based processors with DDR5 memory provide 50% better memory bandwidth
  3. Larger heap sizes - Modern GCs handle 100GB+ heaps efficiently
  4. Container deployments - Memory management in Kubernetes requires different approaches
  5. Persistent memory - Intel Optane and similar technologies blur the line between RAM and storage
  6. Tiered storage - Hot data in memory, warm in NVMe, cold in S3
  7. Vector search workloads - New memory requirements for AI/ML applications

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Cloud DevOps 2025: Packer, Ansible, SSH and AWS/EC2

What’s New in 2025

Key Updates and Changes

  • New EC2 Instance Types: M7i, C7i, and R7i families now available with up to 15% better price-performance
  • Packer Updates: Version 1.11 with predictable plugin loading and HCP integration
  • Ansible Best Practices: Enhanced aws_ec2 plugin with improved security and performance features
  • EBS Volume Evolution: GP3 volumes now standard, offering 20% cost savings over GP2
  • HashiCorp Updates: Terraform AWS Provider 6.0 with multi-region support
  • Security Enhancements: AWS Verified Access for SSH/RDP, enhanced IAM with ECR Policy v2

Deprecated Features and Migration Notes

  • GP2 to GP3 Migration: GP2 volumes should be migrated to GP3 for cost savings
  • EC2 Dynamic Inventory: Old ec2.py script deprecated in favor of aws_ec2 plugin
  • Instance Types: Consider upgrading from M6i to M7i instances for better performance
  • Packer AWS Builder: Continue using amazon-ebs builder with updated authentication methods

Cloud DevOps: Using Packer, Ansible/SSH and AWS command line tools to create and DBA manage EC2 Cassandra instances in AWS.

This article is useful for developers and DevOps/DBA staff who want to create AWS AMI images and manage those EC2 instances with Ansible. Although this article is part of a series about setting up the Cassandra Database images and doing DevOps/DBA with Cassandra clusters, the topics we cover apply to AWS DevOps in general - even if you don’t use Cassandra at all.

Continue reading

AWS Cassandra Cluster Tutorial 5: Setting up Cassandra Cluster in AWS/EC2

Cassandra Cluster Tutorial 5 - Cassandra AWS Cluster with CloudFormation, bastion host, Ansible and the aws-command line

This Cassandra tutorial is useful for developers and DevOps/DBA staff who want to launch a Cassandra cluster in AWS.

The cassandra-image project has been using Vagrant and Ansible to set up a Cassandra Cluster for local testing. Then we used Packer, Ansible and EC2. We used Packer to create AWS images in the last tutorial. In this tutorial, we will use CloudFormation to create a VPC, Subnets, security groups and more to launch a Cassandra cluster in EC2 using the AWS AMI image we created with Packer in the last article. The next two tutorials after this one, will set up Cassandra to work in multiple AZs and multiple regions using custom snitches for Cassandra.

Continue reading

Cassandra AWS System Memory Guidelines

System Memory Guidelines for Cassandra AWS

Basic guidelines for AWS Cassandra

Do not use less than 8GB of memory for the JVM. The more RAM the better. Use G1GC. SSTable are first stored in memory and then written to disk sequentially. The larger the SSTable the less scanning that needs to be done while reading and determining if a key is in an SSTable using a bloom filter. In the EC2 world this equates to an m4.xlarge (16GB of memory), and you need some memory for the OS, specifically the IO buffers. The i2.xlarge and d2.xlarge are the smallest in their family and exceed the min memory requirement (and then some).

Continue reading

AWS Cassandra: Cassandra, NUMA and EC2

AWS Cassandra and NUMA

The i3.8xlarge, c4.8xlarge, m4.10xlarge, and above EC2 instance types use more than 1 CPU, which means NUMA controls are available.

A good read on this is from Al Tolbert’s blog post.

The quickest way to tell if a machine is NUMA is to run “numactl –hardware”. -Al Tobey blog post on Cassandra tuning

NUMA stands for Non-Uniform Memory Architecture. Modern x86 CPUs contain an integrated memory controller. Multi-socket system, have two memory controllers. Each CPU gets a share of the memory. If one CPU socket needs memory that another CPU socket has, the memory is transferred. Transferring this memory between CPUs is more expensive than if the memory only existed in one CPUs memory. When a JVM thread only uses memory local to one CPU, things go fast, and if not slower (10 CPU cycles vs. 100 or some order of magnitude).

Continue reading

Cassandra AWS CPU Guidelines

Cassandra CPU requirements in AWS Cloud

Cassandra is highly concurrent. Cassandra nodes can uses as many CPU cores as available if configured correctly.

What are vCPUs and ECUs?

An Amazon EC2 vCPU is a hyper thread, often referred to as a virtual core. Think of it as a physical thread of execution. It is able to run one thread at a time (which of course could be swapped out).

An Amazon ECU is some made up term that AWS used to use which was the power of the Intel Pentium chip that they used on the earliest incarnations of EC2. 50 ECU would be like 50 Pentium chips from a bygone era. Ignore ECUs.

Continue reading

Cassandra AWS Storage Requirements

Cassandra AWS Storage Requirements

Cassandra does a lot sequential disk IO for the commit log and writing out SSTable. You still need random I/O for read operations. The more read operations that are cache misses, the more your EBS volumes need IOPS.

Cassandra writes to four areas

  • commit logs
  • SSTable
  • an index file
  • a bloom filter

Consider EC2 instance store instead of EBS for Cassandra

AWS provides EC2 instance local storage called instance storage which is not available with all EC2 instance types, and Elastic Block Store (EBS). Instance storage does not have to go over a SAN or Intranet, instead it uses the local hardware bus. Instance storage is right there on the server you are renting. The downside of EC2 instance storage is the expense, and it is not as flexible as EBS. Due to historic problems with EBS, it used to be the only real option for running Cassandra in AWS. EBS has a reputation for degrading performance over time. Some of this has likely been fixed with enhanced EBS, but instance storage is more reliable.

Continue reading

What is Cassandra?

What is Cassandra?

Cassandra is a linearly scalable, open source NoSQL database. Cassandra uses log-structured merge-tree, which makes Cassandra one of the best NoSQL options for high-throughput writes. Cassandra delivers continuous availability, with operational simplicity. Unlike many other NoSQL solutions, Cassandra is a master-less, peer-to-peer, distributed clustered store. Each node knows about the cluster network topology via the gossip protocol.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

Continue reading

Part 2 Setting up Ansible and ssh for Cassandra Database Cluster DevOps

Cassandra Cluster Tutorial 3: Part 2 of 2

Setting up Ansible and SSH for our Cassandra Database Cluster for DevOps/DBA Tasks

This tutorial series centers on how DevOps/DBA tasks with the Cassandra Database. As we mentioned before, Ansible and ssh are essential DevOps/DBA tools for common DBA/DevOps tasks whilst working with Cassandra Clusters. Please read part 1 before reading part 2.

In part 1, we set up Ansible for our Cassandra Database Cluster to automate common DevOps/DBA tasks. As part of this setup, we created an ssh key and then set up our instances with this key so we could use ssh, scp, and most importantly ansible. We also created an ansible playbook to install keys on our Cassandra nodes from a bastion host that we set up with Vagrant.

Continue reading

Setting up Ansible/SSH for Cassandra Database Cluster DevOps Part 1

Cassandra Cluster Tutorial 3: Part 1 of 2

Setting up Ansible/SSH for our Cassandra Database Cluster for DevOps/DBA Tasks

Ansible and ssh are essential DevOps/DBA tools for common DBA/DevOps tasks like managing backups, rolling upgrades to the Cassandra cluster in AWS/EC2, and so much more. An excellent aspect of Ansible is that it uses ssh, so you do not have to install an agent to use Ansible.

This article series centers on how DevOps/DBA tasks with the Cassandra Database. However the use of Ansible for DevOps/DBA transcends its use with the Cassandra Database, so this article is good information for any DevOps/DBA or Developer that needs to manage groups of instances, boxes, hosts whether they be on-prem bare-metal, dev boxes, or in the AWS cloud. You don’t need to be setting up Cassandra to get use of this article.

Continue reading

Cloud DevOps: Packer, Ansible, SSH and AWS/EC2

Cloud DevOps: Using Packer, Ansible/SSH and AWS command line tools to create and DBA manage EC2 Cassandra instances in AWS.

This article is useful for developers and DevOps/DBA staff who want to create AWS AMI images and manage those EC2 instances with Ansible. Although this article is part of a series about setting up the Cassandra Database images and doing DevOps/DBA with Cassandra clusters, the topics we cover apply to AWS DevOps in general - even if you don’t use Cassandra at all.

Continue reading

Cassandra Cluster Tutorial: Setting up Ansible for our Cassandra Database Cluster to do DevOps tasks

Cassandra Tutorial: Setting up Ansible for our Cassandra Database Cluster for DevOps/DBA tasks

Ansible is a key DevOps/DBA tool for managing backups and rolling upgrades to the Cassandra cluster in AWS/EC2. Ansible uses ssh, so you do not have to install an agent to use it.

This article series focuses on DevOps/DBA tasks with the Cassandra Database. The use of Ansible for DevOps/DBA goes beyond the Cassandra Database. This article helps any DevOps/DBA or Developer that needs to manage groups of instances, boxes, or hosts. These can be on-prem bare-metal, dev boxes, or in the Cloud. You don’t need to be setting up Cassandra to benefit from this article.

Continue reading

Spark Tutorial: Introduction to BigData Analytics with Apache Spark Part 1

Introduction to BigData Analytics with Apache Spark Part 1

By Fadi Maalouli and R.H.

Spark Overview

Apache Spark, an open source cluster computing system, is growing fast. Apache Spark has a growing ecosystem of libraries and framework to enable advanced data analytics. Apache Spark’s rapid success is due to its power and and ease-of-use. It is more productive and has faster runtime than the typical MapReduce BigData based analytics. Apache Spark provides in-memory, distributed computing. It has APIs in Java, Scala, Python, and R. The Spark Ecosystem is shown below.

Continue reading

Spark Tutorial: Spark SQL from Java and Python with Cassandra

Analytics with Apache Spark Tutorial Part 2 : Spark SQL

Using Spark SQL from Python and Java

Combining Cassandra and Spark

By Fadi Maalouli and R.H.

Spark, a very powerful tool for real-time analytics, is very popular. In the first part of this series on Spark we introduced Spark. We covered Spark’s history, and explained RDDs (which are used to partition data in the Spark cluster). We also covered the Apache Spark Ecosystem.

Continue reading

                                                                           

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting