
What is Apache Kafka?

What is Kafka? Kafka’s growth is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000). Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both).

Continue reading

Kafka, Avro Serialization and the Schema Registry

Kafka Tutorial: Kafka, Avro Serialization and the Schema Registry Confluent Schema Registry stores Avro Schemas for Kafka producers and consumers. The Schema Registry and provides RESTful interface for managing Avro schemas It allows the storage of a history of schemas which are versioned. the Confluent Schema Registry supports checking schema compatibility for Kafka. Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Continue reading

Understanding Apache Avro: Avro Introduction for Big Data and Data Streaming Architectures

Avro Introduction for Big Data and Data Streaming Architectures Apache Avro™ is a data serialization system. Avro provides data structures, binary data format, container file format to store persistent data, and provides RPC capabilities. Avro does not require code generation to use and integrates well with JavaScript, Python, Ruby, C, C#, C++ and Java. Avro gets used in the Hadoop ecosystem as well as by Kafka. Avro is similar to Thrift, Protocol Buffers, JSON, etc.

Continue reading

Kinesis vs. Kafka

Kinesis vs. Kafka Kinesis works with streaming data. Stock prices Game data (scores from game) Social network data Geospatial data like Uber data where you are IOT sensors Kafka works with streaming data too. Kinesis Streams is like Kafka Core. Kinesis Analytics is like Kafka Streams. A Kinesis Shard is like Kafka Partition. They are similar and get used in similar use cases. Data is stored in Kinesis for default 24 hours, and you can increase that up to 7 days.

Continue reading

Kafka Broker Startup Scripts

Running a Kafka Broker Starting brokers in Kafka is pretty straightforward, here are some simple quick start instructions. But as developers, we want to do at least a little more than just the basics. For instance my first needs were to start multiple brokers on the same machine, and also to enable JMX. Out of the box, you can simply rely on the supplied Each broker needs a unique id and needs a unique port.

Continue reading

Kafka Tutorial with Examples

Kafka Tutorial Kafka Tutorial for the Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.

Continue reading

AWS Cassandra Cluster Tutorial 5: Setting up Cassandra Cluster in AWS/EC2

Cassandra Cluster Tutorial 5 - Cassandra AWS Cluster with CloudFormation, bastion host, Ansible and the aws-command line This Cassandra tutorial is useful for developers and DevOps/DBA staff who want to launch a Cassandra cluster in AWS. The cassandra-image project has been using Vagrant and Ansible to set up a Cassandra Cluster for local testing. Then we used Packer, Ansible and EC2. We used Packer to create AWS images in the last tutorial.

Continue reading

Configuring metricsd to setup a disk alarm

What is MetricsD? Metricsd is a golang program that gathers metrics from instance an AWS EC2 node and reports these metrics to places such as AWS / CloudWatch. Metrics collected include disk space, cpu activity, memory allocation, Cassandra KPIs. MetricsD is most often run as a systemd process. Disk Gatherer reports to AWS / CloudWatch, sets alarms or sends emails. The Disk Gatherer reports disk state information to AWS / CloudWatch, sets alarms in AWS / CloudWatch or sends emails.

Continue reading

Cassandra AWS System Memory Guidelines

System Memory Guidelines for Cassandra AWS Basic guidelines for AWS Cassandra Do not use less than 8GB of memory for the JVM. The more RAM the better. Use G1GC. SSTable are first stored in memory and then written to disk sequentially. The larger the SSTable the less scanning that needs to be done while reading and determining if a key is in an SSTable using a bloom filter. In the EC2 world this equates to an m4.

Continue reading

AWS Cassandra: Cassandra, NUMA and EC2

AWS Cassandra and NUMA The i3.8xlarge, c4.8xlarge, m4.10xlarge, and above EC2 instance types use more than 1 CPU, which means NUMA controls are available. A good read on this is from Al Tolbert’s blog post. The quickest way to tell if a machine is NUMA is to run “numactl –hardware”. -Al Tobey blog post on Cassandra tuning NUMA stands for Non-Uniform Memory Architecture. Modern x86 CPUs contain an integrated memory controller.

Continue reading


Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting