Kafka Architecture: Consumers - 2025 Edition

January 9, 2025

                                                                           

🚀 What’s New in This 2025 Update

Major Updates and Changes

  • KIP-848 Protocol - Revolutionary consumer rebalancing without global pauses
  • Elimination of Rebalance Downtime - Consumers continue processing during rebalances
  • Queue Semantics - Native point-to-point messaging (early access)
  • KRaft-Based Coordination - Simplified group management without ZooKeeper
  • Metadata Rebootstrap - Automatic recovery from metadata failures
  • Enhanced Scalability - Support for larger consumer groups

Deprecated Features

  • ZooKeeper-based coordination - Completely removed
  • Legacy rebalance protocols - Replaced by KIP-848
  • Pre-2.1 client protocols - No longer supported
  • Old consumer group management tools - Updated for KRaft

Ready to build resilient, high-performance consumer applications? Let’s explore how Kafka 4.0 revolutionizes consumer architecture.

Kafka Consumer Architecture - Consumer Groups and Subscriptions

mindmap
  root((Kafka Consumer Architecture))
    Consumer Groups
      Group Coordination
      Offset Management
      Load Balancing
      KIP-848 Protocol
    Processing Models
      At-least-once
      Exactly-once
      Queue Semantics
    Failover & Recovery
      Partition Reassignment
      Consumer Heartbeats
      Automatic Recovery
    Advanced Features
      Threading Models
      Batch Processing
      Metadata Bootstrap
      Performance Tuning

This deep dive into Kafka consumer architecture builds on Kafka Architecture, Kafka Topic Architecture, and Kafka Producer Architecture.

Modern Kafka consumers leverage the revolutionary KIP-848 protocol for seamless rebalancing, enabling massive scale without sacrificing availability.

Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Kafka Consumer Groups

Consumer groups transform Kafka from a simple messaging system into a distributed processing powerhouse. Group consumers by:

  • Business function - Orders processing, analytics, alerting
  • Processing type - Real-time, batch, streaming
  • Destination system - Databases, data lakes, microservices

Each consumer group maintains independent progress through topics, enabling multiple applications to process the same data stream at their own pace.

Core Concepts:

  • Unique Group ID - Identifies the consumer group
  • Subscription Model - Groups subscribe to one or more topics
  • Per-Partition Offsets - Each group tracks progress independently
  • Exclusive Consumption - Only one consumer per partition within a group
  • Load Balancing - Automatic distribution of partitions

Kafka Consumer Groups Architecture

Kafka Architecture: Kafka Consumer Groups

flowchart TB
  subgraph Topic["Topic: Orders"]
    P0[Partition 0]
    P1[Partition 1]
    P2[Partition 2]
    P3[Partition 3]
  end
  
  subgraph CG1["Consumer Group: Analytics"]
    C1A[Consumer 1A]
    C2A[Consumer 2A]
    C1A --> P0
    C1A --> P1
    C2A --> P2
    C2A --> P3
  end
  
  subgraph CG2["Consumer Group: Billing"]
    C1B[Consumer 1B]
    C2B[Consumer 2B]
    C3B[Consumer 3B]
    C1B --> P0
    C2B --> P1
    C2B --> P2
    C3B --> P3
  end
  
  subgraph Offsets["Offset Tracking"]
    O1[Analytics: P0=1000, P1=1200...]
    O2[Billing: P0=800, P1=950...]
  end
  
  classDef partition fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333
  classDef consumer fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  classDef offset fill:#fff9c4,stroke:#f9a825,stroke-width:1px,color:#333333
  
  class P0,P1,P2,P3 partition
  class C1A,C2A,C1B,C2B,C3B consumer
  class O1,O2 offset

Step-by-Step Explanation:

  • Topic has 4 partitions distributed across brokers
  • Analytics group uses 2 consumers processing 2 partitions each
  • Billing group uses 3 consumers with uneven distribution
  • Each group maintains independent offset tracking
  • Groups can process at different speeds

Revolutionary KIP-848: Zero-Downtime Rebalancing

The game-changing KIP-848 protocol eliminates the stop-the-world rebalancing that plagued earlier Kafka versions:

stateDiagram-v2
  [*] --> Stable: All partitions assigned
  
  state Traditional {
    StableOld --> Rebalancing: Member change
    Rebalancing --> StopProcessing: All consumers pause
    StopProcessing --> Reassign: New assignments
    Reassign --> Resume: All restart
    Resume --> StableOld
  }
  
  state "KIP-848 Protocol" as KIP848 {
    StableNew --> Incremental: Member change
    Incremental --> ContinueProcessing: Others keep working
    ContinueProcessing --> TargetedReassign: Only affected partitions
    TargetedReassign --> StableNew: Seamless transition
  }
  
  note right of Traditional
    Old: Full stop for any change
    Downtime proportional to group size
  end note
  
  note right of KIP848
    New: Incremental changes only
    Near-zero downtime
  end note
  
  classDef traditional fill:#ffcdd2,stroke:#e53935
  classDef modern fill:#c8e6c9,stroke:#43a047
  
  class Traditional traditional
  class KIP848 modern

Step-by-Step Explanation:

  • Traditional protocol stops all consumers for any membership change
  • KIP-848 allows unaffected consumers to continue processing
  • Only partitions that need reassignment are moved
  • Result: Massive reduction in rebalancing impact

Consumer Load Sharing

Kafka automatically distributes partitions across consumers for optimal load balancing:

classDiagram
  class ConsumerGroup {
    +groupId: string
    +members: Consumer[]
    +coordinator: Broker
    +protocol: "KIP-848"
    +sessionTimeout: int
    +heartbeatInterval: int
    +assignPartitions(): void
    +rebalance(): void
  }
  
  class Consumer {
    +consumerId: string
    +assignedPartitions: Partition[]
    +lastHeartbeat: timestamp
    +commitOffset(partition, offset): void
    +poll(): Records[]
  }
  
  class Partition {
    +topicName: string
    +partitionId: int
    +currentOffset: long
    +highWatermark: long
    +leader: Broker
  }
  
  class GroupCoordinator {
    +managedGroups: ConsumerGroup[]
    +handleJoinGroup(): void
    +handleLeaveGroup(): void
    +handleHeartbeat(): void
    +triggerRebalance(): void
  }
  
  ConsumerGroup "1" *-- "many" Consumer
  Consumer "1" *-- "many" Partition
  GroupCoordinator "1" *-- "many" ConsumerGroup

Step-by-Step Explanation:

  • ConsumerGroup manages member consumers
  • GroupCoordinator handles membership changes
  • Consumers get exclusive access to assigned partitions
  • Dynamic rebalancing maintains even distribution

Key behaviors:

  • New consumer joins → Partitions redistribute for balance
  • Consumer fails → Its partitions transfer to survivors
  • Dynamic membership → Kafka protocol handles changes automatically
  • Fair distribution → Each consumer gets approximately equal load

Kafka Consumer Failover

Modern Kafka ensures reliable message processing through sophisticated failover:

Offset Commit Strategies:

  • Auto-commit - Periodic automatic offset commits
  • Manual commit - Application controls when to commit
  • Transactional - Exactly-once semantics with transactions

Failure Scenarios:

  1. Consumer crashes before commit:

    • Next consumer starts from last committed offset
    • Some messages reprocessed (at-least-once)
    • Make processing idempotent
  2. Consumer crashes after processing:

    • Offset already committed
    • No message loss or duplication
    • Clean failover
  3. Network partition:

    • Session timeout triggers reassignment
    • Healthy consumers take over partitions
    • Processing continues with minimal disruption

Advanced Offset Management

Kafka 4.0 stores offsets in the internal __consumer_offsets topic with enhancements:

flowchart LR
  subgraph Consumer["Consumer Application"]
    Process[Process Records]
    Commit[Commit Offset]
  end
  
  subgraph Kafka["Kafka Cluster"]
    OP[Offset Partition<br>Compacted Log]
    Coordinator[Group Coordinator]
  end
  
  Process -->|1. Process batch| Commit
  Commit -->|2. Send commit| Coordinator
  Coordinator -->|3. Write offset| OP
  Coordinator -->|4. Acknowledge| Commit
  
  style Consumer fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  style Kafka fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333

Features:

  • Log compaction - Only latest offset per partition retained
  • Replication - Offset data replicated for durability
  • Fast recovery - Consumers resume from exact position
  • Group metadata - Stores consumer group configuration

Consumer Visibility: High Watermark

Understanding what consumers can see is crucial:

flowchart TB
  subgraph Partition["Topic Partition"]
    M1[Message 1<br>Offset: 100]
    M2[Message 2<br>Offset: 101]
    M3[Message 3<br>Offset: 102]
    HW[High Watermark<br>Offset: 102]
    M4[Message 4<br>Offset: 103<br>⚠️ Unreplicated]
    LEO[Log End Offset<br>Offset: 104]
  end
  
  subgraph Replicas["Replication Status"]
    R1[Replica 1: 103 ✓]
    R2[Replica 2: 102 ✓]
    R3[Replica 3: 102 ✓]
  end
  
  Consumer -->|Can read| M1
  Consumer -->|Can read| M2
  Consumer -->|Can read| M3
  Consumer -.->|Cannot read| M4
  
  style M1,M2,M3 fill:#c8e6c9,stroke:#43a047,stroke-width:1px,color:#333333
  style M4 fill:#ffcdd2,stroke:#e53935,stroke-width:1px,color:#333333
  style HW fill:#fff9c4,stroke:#f9a825,stroke-width:2px,color:#333333

Key Points:

  • High Watermark = Last fully replicated offset
  • Consumers read up to High Watermark only
  • Guarantees data durability before consumption
  • Log End Offset = Where producers write next

Consumer to Partition Cardinality

Optimal partition assignment ensures maximum parallelism:

Kafka Architecture: Consumer Group to Partitions

Kafka Architecture: Consumer Group Consumers to Partitions

Assignment Rules:

  1. One consumer per partition within a group
  2. Extra consumers remain idle as hot standbys
  3. Fewer consumers than partitions → Some handle multiple
  4. Automatic rebalancing maintains even distribution

Best Practices:

  • Set partition count ≥ max expected consumers
  • Use consistent partition counts across topics
  • Monitor consumer lag per partition
  • Plan for peak load scenarios

Multi-threaded Kafka Consumers

Threading Models

1. Thread per Consumer (Recommended)

// Each thread runs one consumer
class ConsumerThread implements Runnable {
    private final KafkaConsumer<String, String> consumer;
    
    public void run() {
        while (!closed) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
            processRecords(records);
            consumer.commitSync();
        }
    }
}

Benefits:

  • Simple offset management
  • Clear failure boundaries
  • Kafka handles partition assignment
  • Easy to scale horizontally

2. Consumer with Worker Threads

// One consumer, multiple processing threads
class MultiThreadedConsumer {
    private final KafkaConsumer<String, String> consumer;
    private final ExecutorService executor;
    
    public void run() {
        while (!closed) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
            
            // Submit records to worker threads
            for (ConsumerRecord<String, String> record : records) {
                executor.submit(() -> processRecord(record));
            }
            
            // Complex: Must track completion before commit
            waitForCompletion();
            consumer.commitSync();
        }
    }
}

Challenges:

  • Complex offset management
  • Ordering guarantees difficult
  • Error handling complexity
  • Use only for CPU-intensive processing

Native Queue Semantics (KIP-932)

Kafka 4.0 introduces early-access queue semantics for point-to-point messaging:

flowchart LR
  subgraph Traditional["Pub/Sub Model"]
    PT[Topic] --> CG1[Group 1]
    PT --> CG2[Group 2]
    PT --> CG3[Group 3]
  end
  
  subgraph Queue["Queue Semantics"]
    Q[Queue] --> C1[Consumer 1]
    Q -.->|Exclusive| C2[Consumer 2]
    Q -.->|Exclusive| C3[Consumer 3]
  end
  
  note1[All groups get all messages]
  note2[Each message to exactly one consumer]
  
  Traditional -.-> note1
  Queue -.-> note2
  
  style Traditional fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  style Queue fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333

Kafka Consumer Architecture Review

What is a consumer group?

A logical grouping of consumers that coordinate to consume a topic. Each group maintains independent offsets and represents a unique subscription to the topic’s data.

Does each consumer group have its own offset?

Yes. Every consumer group tracks its own offset for each partition, enabling multiple independent consumers of the same data.

When can a consumer see a record?

Only after the record is fully replicated to all in-sync replicas (ISRs) and the High Watermark advances past that offset.

What happens if there are more consumers than partitions?

Extra consumers remain idle as hot standbys, ready to take over if active consumers fail. Design partition count based on expected parallelism.

What happens with multiple consumer threads?

Each thread can manage exclusive partitions (recommended) or share partition processing (complex). Thread-per-consumer provides simplest operation and clearest failure boundaries.

Best Practices for 2025

  1. Embrace KIP-848 - Enjoy zero-downtime rebalancing
  2. Plan partition counts - Set based on max parallelism needs
  3. Use idempotent processing - Handle at-least-once delivery
  4. Monitor consumer lag - Track performance per partition
  5. Implement health checks - Detect stuck consumers early
  6. Tune session timeouts - Balance failure detection vs stability
  7. Consider queue semantics - Evaluate for point-to-point use cases
  8. Test failure scenarios - Verify failover behavior
  9. Use thread-per-consumer - Simplify offset management
  10. Leverage metrics - Monitor rebalance frequency and duration

About Cloudurable

Transform your Kafka deployment with expert guidance. Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting