Benefits of Using Cloudurable Subscription Support for Cassandra

For more details on the subscription support or pricing please contact us or call ((415) 758-1113) or write info@cloudurable.com.

Benefits of Subscription Cassandra Support and Cassandra as a Service

The Cassandra database is a robust datastore, and running a performant, efficient cluster instance requires experience. Cloudurable™ gives you that experience. We have baked proven practices into our AMIs and Cassandra tools, which means you don’t have to guess whether it’s a good idea to integrate with AWS alerting (or anything else). We know it is, and we provide a Cassandra that integrates well and works out of the box with AWS.

What’s more, we provide a team that can help you quickly diagnose and troubleshoot your Cassandra issues. And since we provide templates and a standard setup, our engineers know right where to dive in and start looking.

We can also help you setup Cassandra on Amazon (AWS Cassandra), and to provide a cloud data platform based on Cassandra. We can help you implement Cassandra as a Service for your organization.

This article provides a detailed perspective of benefits and advantages. To get a higher level view on the advantages of using Cloudurable™ read this.

Cassandra subscription support is just one of the services that Cloudurable™ provides. Our Subscription Cassandra Support pricing is very straightforward and affordable and scales up as your needs increase which makes it easy to get started with Cassandra running in EC2 while not breaking your budget.

Cloudurable™ provides images (AMIs) and utilities to build images that have the following features to fully realize Amazon Cassandra deployments.

  • Integration with AWS metrics
  • Integration with AWS alerting
  • Easily deploy and resize EC2 instances without manually changing config
  • Utilities to aid in configuration and cluster formation/deployment
  • Installed AWS and Cassandra command line tools to assist in backing up Cassandra to S3 and Glacier, taking EBS volume snapshots and other administrative tasks.
  • Configuring watchdogs to restart Cassandra if the Cassandra process dies

The Amazon Cassandra images contain the following software components:

  • Ergonomic cloud configuration™ of Cassandra based on the size of the EC2 image (adjusts for memory, CPU cores, disks attached).
  • systemd service daemon that sends operating system KPIs to Amazon CloudWatch
  • systemd journald, daemon log forwarder to send operating systems logs to Amazon CloudWatch
  • systemd logstash forwarder to send Cassandra logs to Amazon CloudWatch Logs
  • systemd Cassandra KPI daemon to send Cassandra KPIs to Amazon CloudWatch Metrics
  • configure Cassandra as a systemd service daemon with a watchdog (restarts if Cassandra crashes)
  • Enterprise Level Support Only - installs advanced watchdog and health check system that can detect failures and remediate problems.

The systemd daemons are all written by Cloudurable, and the source code is available. These daemons run as small systemd services. They have little overhead. The standard configuration uses systemd which comes with all new version of Linux (Ubuntu, CentOS, Debian, etc.). The Enterprise support level launches the Cloudurable Cassandra health check service (enterprise watchdog) which also has little overhead. Also since we are using systemd, all of the systems shut down nicely, giving Cassandra a chance to do a clean shutdown.

AWS Cassandra as a Service: Cassandra Metrics and Cassandra Log aggregation gathering system

To get the full advantage of the AWS ecosystem you need to aggregate critical KPIs and logs into AWS CloudWatch. AWS allows you to set up custom dashboards for KPIs and logs. But more importantly, you can set up alerts, and triggers that invoke AWS Lambdas that help remediate or diagnose issues. Unlike many other metrics collection systems, the metrics and logs are not passive. Logs and metrics sent to AWS CloudWatch become actionable and reactive. This is very crucial for your Amazon Cassandra deployments.

Using CloudWatch can be done in the AWS environment, so you leverage the full synergy of the AWS ecosystem. AWS provides tools to extract logs and metrics into Elastic Search / Kibana / ELK for additional analysis of problems or logs. AWS also allows you to query logs real time and set up rules and trigger for alerting.

Watchdogs for non-stop Cassandra

For rock solid reliability and resilience, Cloudurable™ Cassandra EC2 instances are started and monitored by the systemd watchdog service. The systemd watchdog ensures that the Cassandra service is running and if the Cassandra process stops, the systemd system will restart Cassandra.

The key point here is the watchdogs automatically restarts Cassandra if the JVM ever exits or crashes. If you want to stop Cassandra, you need to tell the systemd (sudo systemctl stop cassandra ) to stop the instance. The watchdog system is the one built into the OS. The systemd system provides full support for supervisor (software) watchdog support for individual system services. Since we configure Cassandra as a systemd service, we get this watchdog support.

Since we repeat the journald logs (journald is part of systemd) to AWS CloudWatch, any restarts get recorded automatically, and we can create alerts or remediate actions that handle these. Knowing your instances are restarting due to bugs, anomalies, or other problems is a critical KPI.

Watchdogs are essential for optimizing uptime of a Cassandra cluster. To learn more about why watchdogs make Cassandra more resilient read this detail description that walks through some sample scenarios.

Why you need a watchdog

Cassandra is likely at the heart of your microservices operational data storage needs. You might have a Cassandra Keyspace per microservice, and you might have five or six microservices talking to 6 Keyspaces.

  • What if those microservices start getting 2 or 3x more traffic than normal?
  • What if there is a denial of service attack?
  • What if there is just a bug that got past the load testing regressions test of a new version of a new microservice?
  • Or worse what if there are just catastrophic anomalies?

Templates and scripts

We have Cassandra CloudFormation templates, packer provisioning, utilities, Ansible playbooks, and example scripts to:

  • create VPCs,
  • setup VPC sharing,
  • work with encrypting EBS volumes with KMS,
  • backup Cassandra to S3,
  • setting up JBODs for higher read throughput,
  • update Cassandra / perform rolling upgrades

Our team can help your team setup Cassandra directly or just guide. Then we can continue to support you through our subscription support plans.
We keep a knowledge base of best practices, pitfalls, and needs/wants of various customers which we use to help you solve your Cassandra / AWS problems and try to bake back into the product offering to simplify debugging and maintaining Cassandra clusters running on AWS/EC2.

Enterprise Level Support

The Enterprise Watchdog is responsible for collecting reports on the reasons for Cassandra restarts and providing crash data for later analysis. The Enterprise Watchdog works with the Cassandra KPI collection to allow graceful restarts if anomalies or health problems are detected in the JVM or with the Cassandra KPIs.

Essentially, if you have Enterprise Level Support in addition to the systemd watchdog, Cloudurable™ has a second health check watchdog that continually monitors the health of the Cassandra server via its KPIs (key performance metrics), and heartbeat responses and restarts the Cassandra server if is becomes unresponsive.

We ensure your using the right tools for Cassandra running in AWS to support all of your DevOps needs.

The enterprise watchdog system is responsible for monitoring the health of the Cassandra server running on the EC2 instance, restarting it if necessary, and reporting on the error conditions on any restart.

For example, the Cloudurable Cassandra health check service (Enterprise watchdog system) can also determine when Cassandra is about to crash, then do the following:

  • stop the Cassandra instance from taking additional requests,
  • perform a heap dump,
  • perform a thread dump of the Cassandra system
  • upload this critical debugging / root-cause analysis data points to S3 bucket for later analysis

The ideas behind the enterprise health check system is years of doing support of distributed systems. The system is locking up. Can you do a thread dump? The system is running out of memory. Can you do a heap dump? KPIs and what the system was doing (heap dump, thread dump) is essential for diagnosing issues. Having this information is the difference between guessing or analyzing and truly solving problems. If your software application and services are critical, then you need this.

More info

Cloudurable™ provides:

Contact us

For more details on the subscription support or pricing please contact us or call ((415) 758-1113) or write info@cloudurable.com.