Kafka Training Detailed Course Outline

This Kafka training course teaches the basics of the Apache Kafka distributed streaming platform. The Apache Kafka distributed streaming platform is one of the most powerful and widely used reliable streaming platforms. Kafka is a fault tolerant, highly scalable and used for log aggregation, stream processing, event sources and commit logs. Kafka is used by LinkedIn, Yahoo, Twitter, Square, Uber, Box, PayPal, Etsy and more to enable stream processing, online messaging, facilitate in-memory computing by providing a distributed commit log, data collection for big data and so much more.

Course Outline - Kafka Training

Session 1: Kafka Introduction

Architecture
Overview of key concepts
Overview of ZooKeeper
Cluster, Nodes, Kafka Brokers
Consumers, Producers, Logs, Partitions, Records, Keys
Partitions for write throughput
Partitions for Consumer parallelism (multi-threaded consumers)
Replicas, Followers, Leaders
How to scale writes
Disaster recovery
Performance profile of Kafka
Consumer Groups, “High Water Mark”, what do consumers see
Consumer load balancing and fail-over
Working with Partitions for parallel processing and resiliency
Brief Overview of Kafka Streams, Kafka Connectors, Kafka REST

Lab Kafka Setup single node, single ZooKeeper

Create a topic
Produce and consume messages from the command line

Lab Set up Kafka multi-broker cluster

Configure and set up three servers
Create a topic with replication and partitions
Produce and consume messages from the command line

Session 2: Writing Kafka Producers Basics

Introduction to Producer Java API and basic configuration

Lab Write Kafka Java Producer

Create topic from command line
View topic layout of partitions topology from command line
View log details
Use ./kafka-replica-verification.sh to verify replication is correct

Session 3: Writing Kafka Consumers Basics

Introduction to Consumer Java API and basic configuration

Lab Write Java Consumer
View how far behind the consumer is from the command line
Force failover and verify new leaders are chosen

Session 4: Low-level Kafka Architecture

Motivation Focus on high-throughput
Embrace file system / OS caches and how this impacts OS setup and usage
File structure on disk and how data is written
Kafka Producer load balancing details
Producer Record batching by size and time
Producer async commit and commit (flush, close)
Pull vs poll and backpressure
Compressions via message batches (unified compression to server, disk and consumer)
Consumer poll batching, long poll
Consumer Trade-offs of requesting larger batches
Consumer Liveness and fail over redux
Managing consumer position (auto-commit, async commit and sync commit)
Messaging At most once, At least once, Exactly once
Performance trade-offs message delivery semantics
Performance trade-offs of poll size
Replication, Quorums, ISRs, committed records
Failover and leadership election
Log compaction by key
Failure scenarios

Session 5: Writing Advanced Kafka Producers

Using batching (time/size)
Using compression
Async producers and sync producers
Commit and async commit
Default partitioning (round robin no key, partition on key if key)
Controlling which partition records are written to (custom partitioning)
Message routing to a particular partition (use cases for this)
Advanced Producer configuration

Lab 1: Write Kafka Advanced Producer

Use message batching and compression

Lab 2: Use round-robin partition

Lab 3: Use a custom message routing scheme

Session 6: Writing Advanced Kafka Consumers

Adjusting poll read size
Implementing at most once message semantics using Java API
Implementing at least once message semantics using Java API
Implementing as close as we can get to exactly once Java API
Re-consume messages that are already consumed
Using ConsumerRebalanceListener to start consuming from a certain offset (consumer.seek*)
Assigning a consumer a specific partition (use cases for this)

Lab 1 Write Java Advanced Consumer

Lab 2 Adjusting poll read size

Lab 3 Implementing at most once message semantics using Java API

Lab 4 Implementing at least once message semantics using Java API

Lab 5 Implementing as close as we can get to exactly once Java API

Session 7: Schema Management in Kafka

Avro overview
Avro Schemas
Flexible Schemas with JSON and defensive programming
Using Kafka’s Schema Registry
Topic Schema management
Validation of schema
Prevent producers that don’t align with topic schema

Lab1 Topic Schema management
Validation of schema
Prevent Consumer from accepting unexpected schema / defensive programming
Prevent producers from sending messages that don’t align with schema registry

Session 8: Kafka Security

SSL for Encrypting transport and Authentication
Setting up keys
Using SSL for authentication instead of username/password
Setup keystore for transport encryption
Setup truststore for authentication
Producer to server encryption
Consumer to server encryption
Kafka broker to Kafka broker encryption
SASL for Authentication
Overview of SASL
Integrating SASL with Active Directory
Securing ZooKeeper

Optional Lab setting up SSL for transport encryption from Consumer to Kafka broker

Session 9: Kafka Disaster Recovery

Mirror Maker, cluster replication to another DC/region
Deploying partitions spread over racks or AZs
Using Mirror Maker to setup mirroring from DC/region to DC/region

Optional Lab: Setup mirror maker running locally

Session 10: Kafka Cluster Admin and Ops

OS config and hardware selection (EC2 instance type selection)
Monitoring Kafka KPIs
Monitoring Consumer Lag (consumer group inspection)
Log Retention and Compaction
Fault tolerant Cluster
Growing your cluster
Reassign partitions
Broker configuration details
Topic configuration details
Producer configuration details
Consumer configuration details
ZooKeeper configuration details
Tools to managing ZooKeeper
Accessing JMX from command line
Using dump-log segment from command line
Replaying a log (replay log producer)
Re-consume messages that are already consumed
Setting Consumer Group Offset from command line
Kafka Migration Tool − migrate a broker from one version to another
Mirror Maker − Mirroring one Kafka cluster to another (one DC/region to another DC/region)
Consumer Offset Checker − Displays Consumer Group, Topic, Partitions, Offset, logSize, Owner for the specified set of Topics, Partitions and Consumer Group

Optional Lab Kafka Admin
Use JMX tool to look at Kafka metrics
Use Offset Checker to set offset for a particular consumer group
Use replaying a log to send messages from one topic to another

Optional Session 11: Kafka AWS

Brief overview of VPC, EC2, CloudFormation
Brief overview to CloudWatch and sending custom metrics from Kafka JMXTool
AWS clustering networking (up to 10GBE), placement groups
Deploying Kafka to private subnet in AWS VPC
Setting up NACL, routes and security groups
Data at Rest Encryption using AWS KMS
EBS considerations for Kafka (performance profiles, IOPs, JBOD)
EBS KMS
Local volume IOPs vs EBS
Brief review of AWS cmd line tools
Deploying Kafka to AWS to survive a single AZ failure
Deploying Kafka to AWS using a cluster mirroring in multi-regions
Brief overview of AWS ELBs
Using Kafka REST proxy behind an ELB
Optional Lab setup Kafka cluster with ZooKeeper, Kafka Brokers, Consumers and Producers
CloudFormation to setup subnet (or specify subnet)
AWS command line to create machines
Configuration setting and throttles for bringing up a clean Kafka instance
Spin Kafka Broker down
Monitor Kafka broker is cleanly shut down
Create new Kafka Broker (no data)
Create Kafka Broker with snapshot of log
Spin up the second cluster
Connect the second cluster to the first cluster with mirror maker
Monitor lag from the first cluster to the second cluster
Shut the first cluster down
Connect producers and consumer to the first cluster

Optional Lab setup Kafka mirror maker spanning two AWS regions

Optional Lab setup rolling re-hydration of Kafka Broker Nodes

Optional Session 12: Kafka REST Proxy

Using the REST API to write a Producer 
Using the REST API to write a Consumer

Optional Lab Writing REST Producer

Optional Lab Writing REST Consumer

Optional Session 13: Kafka Connect

Kafka Connect Basics 
Modes of Working: Standalone and Distributed 
Configuring Connectors
Tracking Kafka Connector Offsets 

Lab using Kafka Connect

Optional Session 14: Kafka Streams

Overview of Kafka Streams 
Kafka Streams Fundamentals 
Kafka Streams Application
Working with low-level Streams
Working with Kafka Streams DSL

Optional Lab low-level streams

Optional Lab Streams DSL

Kafka Onsite Training Course Detailed Outline

Kafka Training Detailed Course Outline

Course Outline - Kafka Training

Session 1: Kafka Introduction

Lab Kafka Setup single node, single ZooKeeper

Lab Set up Kafka multi-broker cluster

Session 2: Writing Kafka Producers Basics

Lab Write Kafka Java Producer

Session 3: Writing Kafka Consumers Basics

Lab Write Java Consumer

Session 4: Low-level Kafka Architecture

Session 5: Writing Advanced Kafka Producers

Lab 1: Write Kafka Advanced Producer

Lab 2: Use round-robin partition

Lab 3: Use a custom message routing scheme

Session 6: Writing Advanced Kafka Consumers

Lab 1 Write Java Advanced Consumer

Lab 2 Adjusting poll read size

Lab 3 Implementing at most once message semantics using Java API

Lab 4 Implementing at least once message semantics using Java API

Lab 5 Implementing as close as we can get to exactly once Java API

Session 7: Schema Management in Kafka

Lab1 Topic Schema management

Session 8: Kafka Security

Optional Lab setting up SSL for transport encryption from Consumer to Kafka broker

Session 9: Kafka Disaster Recovery

Optional Lab: Setup mirror maker running locally

Session 10: Kafka Cluster Admin and Ops

Optional Lab Kafka Admin

Optional Session 11: Kafka AWS

Optional Lab setup Kafka mirror maker spanning two AWS regions

Optional Lab setup rolling re-hydration of Kafka Broker Nodes

Optional Session 12: Kafka REST Proxy

Optional Lab Writing REST Producer

Optional Lab Writing REST Consumer

Optional Session 13: Kafka Connect

Lab using Kafka Connect

Optional Session 14: Kafka Streams

Optional Lab low-level streams

Optional Lab Streams DSL