Cassandra 5.0 Cluster Setup 2025: Docker, Vagrant, and Cloud-Native DevOps

January 9, 2025

                                                                           

What’s New in 2025

Key Updates and Changes

  • Cassandra 5.0: Vector search, SAI indexes, unified compaction strategy
  • Container-First: Docker and Kubernetes have replaced most Vagrant workflows
  • Cloud-Native: Multi-cloud deployment with infrastructure as code
  • ARM Support: Native ARM64 support for Apple Silicon and AWS Graviton
  • Observability: Enhanced monitoring with OpenTelemetry and Prometheus

Major Platform Evolution

  • Docker Compose: Simplified multi-container orchestration
  • Kubernetes: Production-ready Cassandra operators
  • Testcontainers: Integration testing with ephemeral containers
  • Colima/Podman: Docker alternatives for development
  • GitOps: Infrastructure managed through Git workflows

The modern approach to Cassandra cluster development has evolved significantly since 2017. While Vagrant remains useful for certain scenarios, container-based development has become the standard for 2025.

Modern Cassandra Development Approaches

Docker Compose has largely replaced Vagrant for local development:

# docker-compose.yml
version: '3.8'

services:
  cassandra-node1:
    image: cassandra:5.0
    container_name: cassandra-node1
    environment:
      - CASSANDRA_SEEDS=cassandra-node1,cassandra-node2,cassandra-node3
      - CASSANDRA_CLUSTER_NAME=test-cluster
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_RACK=rack1
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
    volumes:
      - cassandra-data1:/var/lib/cassandra
    ports:
      - "9042:9042"
    networks:
      - cassandra-network

  cassandra-node2:
    image: cassandra:5.0
    container_name: cassandra-node2
    environment:
      - CASSANDRA_SEEDS=cassandra-node1,cassandra-node2,cassandra-node3
      - CASSANDRA_CLUSTER_NAME=test-cluster
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_RACK=rack2
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
    volumes:
      - cassandra-data2:/var/lib/cassandra
    networks:
      - cassandra-network
    depends_on:
      - cassandra-node1

  cassandra-node3:
    image: cassandra:5.0
    container_name: cassandra-node3
    environment:
      - CASSANDRA_SEEDS=cassandra-node1,cassandra-node2,cassandra-node3
      - CASSANDRA_CLUSTER_NAME=test-cluster
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_RACK=rack3
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
    volumes:
      - cassandra-data3:/var/lib/cassandra
    networks:
      - cassandra-network
    depends_on:
      - cassandra-node1

networks:
  cassandra-network:
    driver: bridge

volumes:
  cassandra-data1:
  cassandra-data2:
  cassandra-data3:

Do you like this article? Please check out our Cassandra training and Kafka training. We specialize in AWS DevOps Automation for Cassandra and Kafka.

Quick Start Commands

# Start the cluster
docker-compose up -d

# Check cluster status
docker exec cassandra-node1 nodetool status

# Access CQL shell
docker exec -it cassandra-node1 cqlsh

# Scale cluster (add more nodes)
docker-compose up -d --scale cassandra-node2=2

Kubernetes Deployment (Production Ready)

For production-like local development, use Kubernetes:

# cassandra-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: cassandra
---
# cassandra-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: cassandra
  namespace: cassandra
spec:
  clusterIP: None
  selector:
    app: cassandra
  ports:
    - port: 9042
      name: cql
---
# cassandra-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  namespace: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
        image: cassandra:5.0
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 9042
          name: cql
        resources:
          limits:
            cpu: "2"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 2Gi
        env:
        - name: CASSANDRA_SEEDS
          value: "cassandra-0.cassandra.cassandra.svc.cluster.local"
        - name: CASSANDRA_CLUSTER_NAME
          value: "test-cluster"
        - name: CASSANDRA_DC
          value: "datacenter1"
        - name: CASSANDRA_RACK
          value: "rack1"
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        volumeMounts:
        - name: cassandra-data
          mountPath: /var/lib/cassandra
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Deploy to Kubernetes

# Create namespace and deploy
kubectl apply -f cassandra-namespace.yaml
kubectl apply -f cassandra-service.yaml
kubectl apply -f cassandra-statefulset.yaml

# Check status
kubectl get pods -n cassandra
kubectl exec -it cassandra-0 -n cassandra -- nodetool status

Testcontainers for Integration Testing

Modern development uses Testcontainers for integration testing:

// CassandraIntegrationTest.java
@SpringBootTest
@TestMethodOrder(OrderAnnotation.class)
class CassandraIntegrationTest {
    
    @Container
    static final CassandraContainer<?> cassandra = new CassandraContainer<>("cassandra:5.0")
            .withExposedPorts(9042)
            .withInitScript("init-schema.cql")
            .withReuse(true);
    
    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.data.cassandra.contact-points", cassandra::getHost);
        registry.add("spring.data.cassandra.port", cassandra::getFirstMappedPort);
        registry.add("spring.data.cassandra.local-datacenter", () -> "datacenter1");
    }
    
    @Test
    @Order(1)
    void testConnection() {
        assertTrue(cassandra.isRunning());
        assertEquals(9042, cassandra.getFirstMappedPort());
    }
    
    @Test
    @Order(2)
    void testCRUDOperations() {
        // Test your Cassandra operations
        CqlSession session = CqlSession.builder()
            .addContactPoint(new InetSocketAddress(cassandra.getHost(), cassandra.getFirstMappedPort()))
            .withLocalDatacenter("datacenter1")
            .build();
            
        // Execute test queries
        session.execute("SELECT * FROM system.local");
        session.close();
    }
}

Modern Vagrant Alternative (When Containers Won’t Work)

For scenarios requiring full VM isolation:

# Vagrantfile for 2025
Vagrant.configure("2") do |config|
  config.vm.box = "almalinux/9"
  
  # Use libvirt or VMware instead of VirtualBox
  config.vm.provider "libvirt" do |libvirt|
    libvirt.memory = 4096
    libvirt.cpus = 4
    libvirt.nested = true
  end
  
  # Modern provisioning with Ansible
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provision/cassandra.yml"
    ansible.extra_vars = {
      cassandra_version: "5.0",
      cluster_name: "test-cluster",
      enable_vector_search: true
    }
  end
  
  # Define nodes with modern approach
  (1..3).each do |i|
    config.vm.define "cassandra-node#{i}" do |node|
      node.vm.network "private_network", ip: "192.168.50.#{i+10}"
      node.vm.hostname = "cassandra-node#{i}"
      
      # Use cloud-init for configuration
      node.vm.provision "shell", inline: <<-SHELL
        cloud-init clean
        /opt/cassandra/bin/cassandra-cloud \
          -cluster-name test-cluster \
          -client-address 192.168.50.#{i+10} \
          -cluster-address 192.168.50.#{i+10} \
          -cluster-seeds 192.168.50.11,192.168.50.12,192.168.50.13 \
          -enable-vector-search true
      SHELL
    end
  end
end

Cloud-Native Configuration Management

Modern Ansible Playbook

# provision/cassandra.yml
---
- name: Configure Cassandra 5.0 Cluster
  hosts: all
  become: yes
  vars:
    cassandra_version: "5.0"
    cluster_name: "test-cluster"
    enable_vector_search: true
    
  tasks:
    - name: Install OpenJDK 17
      package:
        name: java-17-openjdk
        state: present
        
    - name: Download Cassandra 5.0
      get_url:
        url: "https://downloads.apache.org/cassandra/{{ cassandra_version }}/apache-cassandra-{{ cassandra_version }}-bin.tar.gz"
        dest: /tmp/cassandra.tar.gz
        
    - name: Extract Cassandra
      unarchive:
        src: /tmp/cassandra.tar.gz
        dest: /opt/
        remote_src: yes
        creates: /opt/apache-cassandra-{{ cassandra_version }}
        
    - name: Create Cassandra symlink
      file:
        src: /opt/apache-cassandra-{{ cassandra_version }}
        dest: /opt/cassandra
        state: link
        
    - name: Configure Cassandra
      template:
        src: cassandra.yaml.j2
        dest: /opt/cassandra/conf/cassandra.yaml
        
    - name: Create systemd service
      template:
        src: cassandra.service.j2
        dest: /etc/systemd/system/cassandra.service
        
    - name: Enable and start Cassandra
      systemd:
        name: cassandra
        state: started
        enabled: yes
        daemon_reload: yes

Modern SystemD Service

# templates/cassandra.service.j2
[Unit]
Description=Apache Cassandra 5.0
After=network.target

[Service]
Type=notify
User=cassandra
Group=cassandra
ExecStart=/opt/cassandra/bin/cassandra -f
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=always
RestartSec=10
NotifyAccess=all

# Security enhancements
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/cassandra /var/log/cassandra

# Resource limits
LimitNOFILE=100000
LimitMEMLOCK=infinity
LimitAS=infinity

[Install]
WantedBy=multi-user.target

Monitoring and Observability (2025)

Prometheus Integration

# docker-compose.monitoring.yml
version: '3.8'

services:
  cassandra-exporter:
    image: instaclustr/cassandra-exporter:0.9.10
    environment:
      - CONFIG_FILE=/etc/cassandra-exporter/config.yml
    ports:
      - "9500:9500"
    depends_on:
      - cassandra-node1
      
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-storage:/var/lib/grafana
      
volumes:
  grafana-storage:

OpenTelemetry Configuration

# cassandra.yaml additions for 2025
diagnostic_events_enabled: true
full_query_logging_enabled: true

# OpenTelemetry configuration
jvm_opts:
  - "-javaagent:/opt/cassandra/agents/opentelemetry-javaagent.jar"
  - "-Dotel.service.name=cassandra"
  - "-Dotel.exporter.otlp.endpoint=http://jaeger:14250"

Security Best Practices (2025)

Container Security

# Dockerfile for secure Cassandra image
FROM cassandra:5.0

# Create non-root user
RUN groupadd -r cassandra && useradd -r -g cassandra cassandra

# Security hardening
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Set secure permissions
RUN chown -R cassandra:cassandra /var/lib/cassandra /var/log/cassandra
RUN chmod 700 /var/lib/cassandra

USER cassandra
EXPOSE 9042 7000 7001 7199

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD cqlsh -e "SELECT now() FROM system.local"

Network Security

# Network policies for Kubernetes
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cassandra-network-policy
  namespace: cassandra
spec:
  podSelector:
    matchLabels:
      app: cassandra
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: cassandra
    ports:
    - protocol: TCP
      port: 9042
    - protocol: TCP
      port: 7000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: cassandra
    ports:
    - protocol: TCP
      port: 7000

Performance Testing and Validation

Modern Load Testing

# Using cassandra-stress with modern parameters
docker exec cassandra-node1 cassandra-stress write \
  n=1000000 \
  -mode native cql3 \
  -rate threads=50 \
  -node cassandra-node1,cassandra-node2,cassandra-node3
  
# Vector search testing (Cassandra 5.0)
docker exec cassandra-node1 cassandra-stress user \
  profile=vector-search.yaml \
  n=100000 \
  -rate threads=10

Automated Testing Pipeline

# .github/workflows/cassandra-test.yml
name: Cassandra Integration Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    
    - name: Set up JDK 17
      uses: actions/setup-java@v4
      with:
        java-version: '17'
        distribution: 'temurin'
        
    - name: Start Cassandra with Docker Compose
      run: |
        docker-compose up -d
        ./scripts/wait-for-cassandra.sh
                
    - name: Run integration tests
      run: |
        ./mvnw test -Dspring.profiles.active=integration
                
    - name: Collect logs
      if: failure()
      run: |
        docker-compose logs > cassandra-logs.txt
                
    - name: Upload logs
      if: failure()
      uses: actions/upload-artifact@v4
      with:
        name: cassandra-logs
        path: cassandra-logs.txt

We hope this blog post on modern Cassandra cluster setup is useful. We find it essential for current DevOps practices. We also provide Cassandra consulting and Kafka consulting to get you setup fast in AWS with CloudFormation and CloudWatch. Check out our Cassandra training and Kafka training. Cloudurable specializes in AWS DevOps Automation for Cassandra and Kafka.

Migration from Legacy Vagrant Setup

Step-by-Step Migration

# 1. Export existing Vagrant cluster data
vagrant ssh node0 -c "nodetool snapshot"
vagrant ssh node0 -c "tar -czf /tmp/cassandra-backup.tar.gz /var/lib/cassandra/data"

# 2. Convert to Docker Compose
docker-compose up -d

# 3. Restore data to new cluster
docker cp /tmp/cassandra-backup.tar.gz cassandra-node1:/tmp/
docker exec cassandra-node1 tar -xzf /tmp/cassandra-backup.tar.gz -C /var/lib/cassandra/

# 4. Restart cluster
docker-compose restart

Configuration Comparison

Feature Legacy Vagrant Modern Docker Kubernetes
Startup Time 5-10 minutes 30-60 seconds 1-2 minutes
Resource Usage High (full VMs) Low (containers) Medium (pods)
Networking Complex NAT Simple bridge Service mesh
Persistence VM disk Volumes PVCs
Scaling Manual Compose scale Auto-scaling

Cloud Deployment Integration

AWS EKS Integration

# cassandra-cluster-eks.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: cassandra
  namespace: cassandra
spec:
  chart: cassandra
  repo: https://charts.bitnami.com/bitnami
  valuesContent: |-
    cluster:
      name: test-cluster
      datacenter: us-west-2
      seedCount: 3
    replicaCount: 3
    resources:
      requests:
        memory: 4Gi
        cpu: 2
      limits:
        memory: 8Gi
        cpu: 4
    persistence:
      enabled: true
      storageClass: gp3
      size: 100Gi    

Multi-Cloud Deployment

# terraform/main.tf
module "cassandra_cluster" {
  source = "./modules/cassandra"
  
  providers = {
    aws = aws.us-west-2
    gcp = google.us-central1
  }
  
  cluster_name = "multi-cloud-cluster"
  replication_factor = 3
  
  aws_config = {
    instance_type = "r6g.2xlarge"
    availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
  }
  
  gcp_config = {
    machine_type = "n2-highmem-8"
    zones = ["us-central1-a", "us-central1-b", "us-central1-c"]
  }
}

Best Practices Summary

Development Environment Selection

  1. Small projects: Docker Compose for simplicity
  2. Microservices: Kubernetes for production parity
  3. Integration testing: Testcontainers for ephemeral clusters
  4. CI/CD: GitHub Actions with container-based testing
  5. Legacy systems: Modern Vagrant with updated provisioning

Performance Optimization

  1. Use SSD storage: Even in development environments
  2. Allocate sufficient memory: 4GB minimum per node
  3. Enable JVM tuning: G1GC with modern settings
  4. Monitor resource usage: Prometheus + Grafana
  5. Test with realistic data: Use cassandra-stress

Security Considerations

  1. Container security: Non-root users, minimal images
  2. Network isolation: Proper network policies
  3. Secrets management: Kubernetes secrets or Vault
  4. TLS everywhere: Client and internode encryption
  5. Regular updates: Keep base images current

The evolution from Vagrant to container-based development represents a significant improvement in developer productivity and operational consistency. While Vagrant remains useful for specific use cases, the modern approach emphasizes containers, orchestration, and cloud-native practices.

Cloudurable provides Cassandra training, Cassandra consulting, Cassandra support and helps setting up Cassandra clusters in AWS.

About Cloudurableâ„¢

Cloudurableâ„¢: streamline DevOps and DBA for the Cassandra Database running on AWS provides AMIs, CloudWatch Monitoring, CloudFormation templates and monitoring tools to support the Cassandra Database in production running in Amazon AWS.

We also teach advanced Cassandra Database courses which teaches how one could develop, perform DBA tasks, support and deploy Cassandra to production in AWS EC2.

More info

Please take some time to read the Advantage of using Cloudurableâ„¢ for Amazon Cassandra deployments.

Cloudurable provides:

Authors

Written by R. Hightower and JP Azar.

Feedback


We hope you enjoyed this article. Please provide [feedback](https://cloudurable.com/contact/index.html).
#### About Cloudurable Cloudurable provides [Cassandra training](https://cloudurable.com/cassandra-course/index.html "Onsite, Instructor-Led, Cassandra Training"), [Cassandra consulting](https://cloudurable.com/kafka-aws-consulting/index.html "Cassandra professional services"), [Cassandra support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Cassandra clusters in AWS](https://cloudurable.com/services/index.html). Cloudurable also provides [Kafka training](https://cloudurable.com/kafka-training/index.html "Onsite, Instructor-Led, Kafka Training"), [Kafka consulting](https://cloudurable.com/kafka-aws-consulting/index.html), [Kafka support](https://cloudurable.com/subscription_support/index.html) and helps [setting up Kafka clusters in AWS](https://cloudurable.com/services/index.html).

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting