Cassandra Cluster Tutorial: Setting up Ansible for our Cassandra Database Cluster to do DevOps tasks

Cassandra Tutorial: Setting up Ansible for our Cassandra Database Cluster for DevOps/DBA tasks

Ansible is a key DevOps/DBA tool for managing backups and rolling upgrades to the Cassandra cluster in AWS/EC2. Ansible uses ssh, so you do not have to install an agent to use it.

This article series focuses on DevOps/DBA tasks with the Cassandra Database. The use of Ansible for DevOps/DBA goes beyond the Cassandra Database. This article helps any DevOps/DBA or Developer that needs to manage groups of instances, boxes, or hosts. These can be on-prem bare-metal, dev boxes, or in the Cloud. You don’t need to be setting up Cassandra to benefit from this article.

Continue reading

Spark Tutorial: Introduction to BigData Analytics with Apache Spark Part 1

Introduction to BigData Analytics with Apache Spark Part 1

By Fadi Maalouli and R.H.

Spark Overview

Apache Spark, an open source cluster computing system, is growing fast. Apache Spark has a growing ecosystem of libraries and framework to enable advanced data analytics. Apache Spark’s rapid success is due to its power and and ease-of-use. It is more productive and has faster runtime than the typical MapReduce BigData based analytics. Apache Spark provides in-memory, distributed computing. It has APIs in Java, Scala, Python, and R. The Spark Ecosystem is shown below.

Continue reading

Spark Tutorial: Spark SQL from Java and Python with Cassandra

Analytics with Apache Spark Tutorial Part 2 : Spark SQL

Using Spark SQL from Python and Java

Combining Cassandra and Spark

By Fadi Maalouli and R.H.

Spark, a very powerful tool for real-time analytics, is very popular. In the first part of this series on Spark we introduced Spark. We covered Spark’s history, and explained RDDs (which are used to partition data in the Spark cluster). We also covered the Apache Spark Ecosystem.

Continue reading

Spark Tutorial: Spark Streaming with Kafka and MLib

In this part of Spark’s tutorial (part 3), we will introduce two important components of Spark’s Ecosystem: Spark Streaming and MLlib.

Display - Edit

Spark Streaming

By Fadi Maalouli and R.H.

Spark Streaming is a real-time processing tool, that has a high level API, is fault tolerant, and is easy to integrate with SQL DataFrames and GraphX.

On a high level Spark Streaming works by running receivers that receive data from for example S3, Cassandra, Kafka etc… and it divides these data into blocks, then pushes these blocks into Spark, then Spark will work with these blocks of data as RDDs, from here you get your results. The following diagram will demonstrate the process:

Continue reading

Setting up a Cassandra cluster with SSL for client and cluster transports for DevOps

Setting up client and cluster SSL transport for a Cassandra cluster

This articles is a Cassandra tutorial on Cassandra setup for SSL and CQL clients, as well as installing Cassandra with SSL configured on a series of Linux servers.

Cassandra allows you to secure the client transport (CQL) as well as the cluster transport (storage transport).

SSL/TLS have some overhead. This is especially true in the JVM world which is not as performant for handling SSL/TLS unless you are using Netty/OpenSSl integration.

Continue reading

Setting up a Cassandra cluster with cassandra image and cassandra cloud project with Vagrant for DevOps

The cassandra-image project creates CentOS Cassandra Database images for docker, virtualbox/vagrant and AWS/EC2 using best practices for Cassandra OS setup. It is nice to use vagrant and/or docker for local development. At this time it is hard to develop systemd services using Docker so we use Vagrant. Since we do a lot of that, we like to use Vagrant.

Vagrant is important for developers and DevOps not to mention Cassandra DBAs.

The cassandra-image project packages systemd utilities

Continue reading

Systemd dependencies example

We use systemd unit quite a bit. Getting dependencies correct can be tricky. We use systemd to start up Cloudurable Cassandra config scripts. We use systemd to start up Cassandra/Kafka, and to shut Cassandra/Kafka down nicely.

Since systemd is pervasive in all new mainstream Linux distributions, you can see that systemd is an important concept for DevOps.

We wrote this little example to try to understand how systemd dependencies work, and explain it to others.

Continue reading

High-Speed Microservices

Microservices Architecture | High-Speed Microservices

This article endeavors to explain high-speed microservices architecture. If you are unfamiliar with the term microservices, you may want to first read this blog post on microservices by Michael Brunton and if have more time on your hands this one by James Lewis and Martin Fowler.

High-speed microservices is a philosophy and set of patterns for building services that can readily back mobile and web applications at scale. It uses a scale up and out versus just a scale-out model to do more with less hardware. A scale-up and out model uses in-memory operational data, efficient queue hand-off, and async calls to handle more calls on a single node.

Continue reading

Microservices Architecture

Microservices

The term “Microservices Architecture” is now a popular trend. Unlike many trends, this one seems to have some momentum and is more about how people are developing services versus vendors commandeering and needlessly complicating something simple. For example, SOA started off as a rather simple set of concepts and became something vast and complex. Services are excellent. Web Services are good. SOA has a bad reputation and is associated with being overly complicated (WSDL, BPEL, WS-blah, etc.). Microservices is not SOA. In fact, it in many ways it is directly the opposite. For example, SOA often embraces WSDL which is a very strongly typed and rigid way to define a service endpoint. WSDL and XML schema takes all of the X out of XML.

Continue reading

Notes on Cassandra OS setup and optimizations for deploying in EC2/AWS

Notes on Cassandra OS setup and optimizations for deploying in EC2/AWS

Disk concerns

These are important concepts for developers and DevOps who are responsible for developing Cassandra based applications and services.

Cassandra writes to four areas

  • commit logs
  • SSTable
  • an index file
  • a bloom filter

The compaction process of SSTable data makes heavy use of the disk. LeveledCompactionStrategy may need 10 to 20% overhead. SizeTieredCompactionStrategy worse case is 50% overhead needed to perform compaction. Keep this in mind while sizing disks. If you are doing a high-update use case, LeveledCompactionStrategy is the best solution if you want to limit the total disk size used at any point in time and to optimize reads as the row will be spread across less (up to ten times less) SSTables. LeveledCompactionStrategy requires more IO and processing time for compactions. If in doubt, use LeveledCompactionStrategy.

Continue reading

AWS VPC

Understanding what AWS provides for setting up private networks, security groups and more is important for anyone who calls themselves DevOps.

AWS allows you to define a software defined network. You do this with Amazon Virtual Private Cloud (Amazon VPC). You can define subnets, ingress rules, security groups, NAT gateways, Internet gateways, and more.

Amazon VPC

A VPC is a virtual private cloud. You can create multiple Amazon VPCs within a region that spans multiple availability zones. A VPC is an isolated area to deploy instances.

Continue reading

Backup/Recovery with EBS

Understanding what AWS provides for backing up EBS volumes is an important concept for DevOps.

Data safety with EBS - Backup/Recovery (Snapshots)

Amazon EBS allows you to easily backup data. You do this by taking snapshots. Snapshots are point-in-time backups. Data written to an EBS volume can be periodically used to create a snapshot. Snapshots provide incremental backups of your data. Snapshots just saves the blocks that have changed. Only changed blocks since the last snapshot are saved in the new snapshot.

Continue reading

EC2 Compute

Understanding what AWS/EC2 provides for provisioning on-demand computing is essential for all DevOps.

Amazon Elastic Compute Cloud (Amazon EC2)

Amazon EC2 is AWS primary web service that provides resizable compute capacity in the cloud.

EC2 Compute

Compute is computational power needed for your use case. Amazon EC2 allows add compute resources through its Web Service API. EC2 allows you to launch instances. An instance is a server and you can install whatever software you need for your service or web application: NGINX, Apache httpd, Cassandra, Kafka, etc. When you launch a virtual server, an instance in EC2 speak, you can use it as you like just like you would a server in your datacenter. You pay for the compute power that you use. There are different instance types with various ranges of CPU, RAM, IO, and networking power. You pay for compute resources by the hour. You can use more instances and you can reserve instances for longer periods of time for a price break.

Continue reading

Learn about Kafka Architecture

Learning about the Kafka Streaming Platform

Slideshare Kafka Architecture. PDF: Introduction to Kafka Architecture.

Kafka Training

Training for DevOps, Architects and Developers

This Kafka training course teaches the basics of the Apache Kafka distributed streaming platform. The Apache Kafka distributed streaming platform is one of the most powerful and widely used reliable streaming platforms. Kafka is a fault tolerant, highly scalable and used for log aggregation, stream processing, event sources and commit logs. Kafka is used by LinkedIn, Yahoo, Twitter, Square, Uber, Box, PayPal, Etsy and more to enable stream processing, online messaging, facilitate in-memory computing by providing a distributed commit log, data collection for big data and so much more.

Continue reading

Most bang for your buck with AWS Elastic Block Store (EBS)

Getting the most bang for your buck with AWS Elastic Block Store (EBS)

Understanding what AWS/EC2 provides for provisioning on-demand storage is critical for DevOps. Companies waste tons by over provisioning AWS.

Amazon Elastic Block Store

Amazon Web Services (AWS) provides Amazon Elastic Block Store (Amazon EBS) for EC2 instance storage. EBS is the virtual hard drives and SSDs for your servers running in the cloud. Amazon EBS volumes are automatically replicated, and it is easy to take snapshots of volumes to back them up in a known state. The replication happens within an availability zone (AZ).

Continue reading

Learn about Kafka Architecture with Java examples

Learning about the Kafka Streaming Platform with simple Java examples

This picks up where the last blog post left off. We added a multi-server version of the Kafka setup. Then we wrote some simple sample Java Kafka producer and a Java Kafka consumer.

Slideshare Kafka Architecture. PDF: Introduction to Kafka Architecture.

Continue reading

Reactive Microservices Architecture

Reactive Microservices Architecture

Many disciplines of software development came to the same conclusion. They are building systems that react to modern demands on services. Reactive services live up to the Reactive Manifesto. Reactive microservices are built to be robust, resilient, flexible and written with modern hardware, virtualization, rich web clients and mobile clients in mind.

By the original definition of microservices, all microservices are reactive. A microservices that is not reactive is akin a bird without wings or a fish who can’t swim.

Continue reading

                                                                           

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting