Kafka Training

Kafka Tutorial: Creating Advanced Kafka Producers in Java

Kafka Tutorial 13: Creating Advanced Kafka Producers in Java

In this tutorial, you are going to create advanced Kafka Producers.

Before you start

The prerequisites to this tutorial are

This tutorial picks up right where Kafka Tutorial Part 11: Writing a Kafka Producer example in Java and Kafka Tutorial Part 12: Writing a Kafka Consumer example in Java left off. In the last two tutorial, we created simple Java example that creates a Kafka producer and a consumer.

Continue reading

Kafka Tutorial: Creating Advanced Kafka Consumers in Java

Kafka Tutorial 14: Creating Advanced Kafka Consumers in Java

In this tutorial, you are going to create advanced Kafka Consumers.

UNDER CONSTRUCTION.

Before you start

The prerequisites to this tutorial are

This tutorial picks up right where Kafka Tutorial Part 11: Writing a Kafka Producer example in Java left off. In the last tutorial, we created advanced Java producers, now we will do the same with Consumers.

Continue reading

Kafka Architecture: Log Compaction

Kafka Architecture: Log Compaction

This post really picks off from our series on Kafka architecture which includes Kafka topics architecture, Kafka producer architecture, Kafka consumer architecture and Kafka ecosystem architecture.

This article is heavily inspired by the Kafka section on design around log compaction. You can think of it as the cliff notes about Kafka design around log compaction.

Kafka can delete older records based on time or size of a log. Kafka also supports log compaction for record key compaction. Log compaction means that Kafka will keep the latest version of a record and delete the older versions during a log compaction.

Continue reading

Kafka Architecture: Low Level

If you are not sure what Kafka is, see What is Kafka?.

Kafka Architecture: Low-Level Design

This post really picks off from our series on Kafka architecture which includes Kafka topics architecture, Kafka producer architecture, Kafka consumer architecture and Kafka ecosystem architecture.

This article is heavily inspired by the Kafka section on design. You can think of it as the cliff notes.


Kafka Design Motivation

LinkedIn engineering built Kafka to support real-time analytics. Kafka was designed to feed analytics system that did real-time processing of streams. LinkedIn developed Kafka as a unified platform for real-time handling of streaming data feeds. The goal behind Kafka, build a high-throughput streaming data platform that supports high-volume event streams like log aggregation, user activity, etc.

Continue reading

Kafka Tutorial: Creating a Kafka Consumer in Java

Kafka Tutorial: Writing a Kafka Consumer in Java

In this tutorial, you are going to create simple Kafka Consumer. This consumer consumes messages from the Kafka Producer you wrote in the last tutorial. This tutorial demonstrates how to process records from a Kafka topic with a Kafka Consumer.

This tutorial describes how Kafka Consumers in the same group divide up and share partitions while each consumer group appears to get its own copy of the same data.

Continue reading

Kafka Tutorial: Creating a Kafka Producer in Java

Kafka Tutorial: Writing a Kafka Producer in Java

In this tutorial, we are going to create simple Java example that creates a Kafka producer. You create a new replicated Kafka topic called my-example-topic, then you create a Kafka producer that uses this topic to send records. You will send records with the Kafka producer. You will send records synchronously. Later, you will send records asynchronously.

Before you start

Prerequisites to this tutorial are Kafka from the command line and Kafka clustering and failover basics.

Continue reading

Kafka Tutorial: Kafka clusters, Kafka consumer failover, and Kafka broker failover

If you are not sure what Kafka is, start here “What is Kafka?”.

Getting started with Kafka cluster tutorial

Understanding Kafka Failover

This Kafka tutorial picks up right where the first Kafka tutorial from the command line left off. The first tutorial has instructions on how to run ZooKeeper and use Kafka utils.

In this tutorial, we are going to run many Kafka Nodes on our development laptop so that you will need at least 16 GB of RAM for local dev machine. You can run just two servers if you have less memory than 16 GB. We are going to create a replicated topic. We then demonstrate consumer failover and broker failover. We also demonstrate load balancing Kafka consumers. We show how, with many groups, Kafka acts like a Publish/Subscribe. But, when we put all of our consumers in the same group, Kafka will load share the messages to the consumers in the same group (more like a queue than a topic in a traditional MOM sense).

Continue reading

Kafka Tutorial: Using Kafka from the command line

If you are not sure what Kafka is, start here “What is Kafka?”.

Getting started with Kafka tutorial

Let’s show a simple example using producers and consumers from the Kafka command line.

Download Kafka 0.10.2.x from the Kafka download page. Later versions will likely work, but this was example was done with 0.10.2.x.

We assume that you have Java SDK 1.8.x installed.

We unzipped the Kafka download and put it in ~/kafka-training/, and then renamed the Kafka install folder to kafka. Please do the same.

Continue reading

Kafka Architecture: Consumers

Kafka Consumer Architecture - Consumer Groups and subscriptions

This article covers some lower level details of Kafka consumer architecture. It is a continuation of the Kafka Architecture, Kafka Topic Architecture, and Kafka Producer Architecture articles.

This article covers Kafka Consumer Architecture with a discussion consumer groups and how record processing is shared among a consumer group as well as failover for Kafka consumers.

Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Continue reading

Kafka Architecture: Producers

Kafka Producer Architecture - Picking the partition of records

This article covers some lower level details of Kafka producer architecture. It is a continuation of the Kafka Architecture and Kafka Topic Architecture articles.

This article covers Kafka Producer Architecture with a discussion of how a partition is chosen, producer cadence, and partitioning strategies.

Kafka Producers

Kafka producers send records to topics. The records are sometimes referred to as messages.
The producer picks which partition to send a record to per topic. The producer can send records round-robin. The producer could implement priority systems based on sending records to certain partitions based on the priority of the record.

Continue reading

Kafka Topic Architecture

Kafka Topic Architecture - Replication, Failover and Parallel Processing

This article covers some lower level details of Kafka topic architecture. It is a continuation of the Kafka Architecture article.

This article covers Kafka Topic’s Architecture with a discussion of how partitions are used for fail-over and parallel processing.


Kafka Topics, Logs, Partitions

Recall that a Kafka topic is a named stream of records. Kafka stores topics in logs. A topic log is broken up into partitions. Kafka spreads log’s partitions across multiple servers or disks. Think of a topic as a category, stream name or feed.

Continue reading

Kafka vs. JMS

Kafka vs JMS, SQS, RabbitMQ Messaging

Is Kafka a queue or a publish and subscribe system? Yes. It can be both.

Kafka is like a queue for consumer groups, which we cover later. Basically, Kafka is a queue system per consumer group so it can do load balancing like JMS, RabbitMQ, etc.

Kafka is like topics in JMS, RabbitMQ, and other MOM systems for multiple consumer groups. Kafka has topics and producers publish to the topics and the subscribers (Consumer Groups) read from the topics.

Continue reading

Kafka Architecture

If you are not sure what Kafka is, see What is Kafka?.

Kafka Architecture

Kafka consists of Records, Topics, Consumers, Producers, Brokers, Logs, Partitions, and Clusters. Records can have key (optional), value and timestamp. Kafka Records are immutable. A Kafka Topic is a stream of records ("/orders", "/user-signups"). You can think of a Topic as a feed name. A topic has a Log which is the topic’s storage on disk. A Topic Log is broken up into partitions and segments. The Kafka Producer API is used to produce streams of data records. The Kafka Consumer API is used to consume a stream of records from Kafka. A Broker is a Kafka server that runs in a Kafka Cluster. Kafka Brokers form a cluster. The Kafka Cluster consists of many Kafka Brokers on many servers. Broker sometimes refer to more of a logical system or as Kafka as a whole.

Continue reading

The Kafka Ecosystem - Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry

The Kafka Ecosystem - Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry

The core of Kafka is the brokers, topics, logs, partitions, and cluster. The core also consists of related tools like MirrorMaker. The aforementioned is Kafka as it exists in Apache.

The Kafka ecosystem consists of Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry. Most of the additional pieces of the Kafka ecosystem comes from Confluent and is not part of Apache.

Continue reading

What is Apache Kafka?

What is Kafka?

Kafka’s growth is exploding, more than 1/3 of all Fortune 500 companies use Kafka. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000). Kafka is used for real-time streams of data, used to collect big data or to do real time analysis or both). Kafka is used with in-memory microservices to provide durability and it can be used to feed events to CEP (complex event streaming systems), and IOT/IFTTT style automation systems.

Continue reading

Kafka, Avro Serialization and the Schema Registry

Kafka Tutorial: Kafka, Avro Serialization and the Schema Registry

Confluent Schema Registry stores Avro Schemas for Kafka producers and consumers. The Schema Registry and provides RESTful interface for managing Avro schemas It allows the storage of a history of schemas which are versioned. the Confluent Schema Registry supports checking schema compatibility for Kafka.

Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Continue reading

Understanding Apache Avro: Avro Introduction for Big Data and Data Streaming Architectures

Avro Introduction for Big Data and Data Streaming Architectures

Apache Avro™ is a data serialization system. Avro provides data structures, binary data format, container file format to store persistent data, and provides RPC capabilities. Avro does not require code generation to use and integrates well with JavaScript, Python, Ruby, C, C#, C++ and Java. Avro gets used in the Hadoop ecosystem as well as by Kafka.

Avro is similar to Thrift, Protocol Buffers, JSON, etc. Avro does not require code generation. Avro needs less encoding as part of the data since it stores names and types in the schema reducing duplication. Avro supports the evolution of schemas.

Continue reading

Kafka Broker Startup Scripts

Running a Kafka Broker

Starting brokers in Kafka is pretty straightforward, here are some simple quick start instructions. But as developers, we want to do at least a little more than just the basics. For instance my first needs were to start multiple brokers on the same machine, and also to enable JMX.

Out of the box, you can simply rely on the supplied server.properties Each broker needs a unique id and needs a unique port. These are the corresponding properties from server.properties:

Continue reading

Kafka Tutorial with Examples

Kafka Tutorial

Kafka Tutorial for the Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.

Continue reading

Learn about Kafka Architecture

Learning about the Kafka Streaming Platform

Slideshare Kafka Architecture. PDF: Introduction to Kafka Architecture.

Kafka Training

Training for DevOps, Architects and Developers

This Kafka training course teaches the basics of the Apache Kafka distributed streaming platform. The Apache Kafka distributed streaming platform is one of the most powerful and widely used reliable streaming platforms. Kafka is a fault tolerant, highly scalable and used for log aggregation, stream processing, event sources and commit logs. Kafka is used by LinkedIn, Yahoo, Twitter, Square, Uber, Box, PayPal, Etsy and more to enable stream processing, online messaging, facilitate in-memory computing by providing a distributed commit log, data collection for big data and so much more.

Continue reading

                                                                           

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting