Kafka Architecture: Producers

January 9, 2025

🚀 What’s New in This 2025 Update

Major Updates and Changes

Metadata Bootstrapping - KIP-1102 enables automatic metadata recovery
Enhanced Protocol Resilience - Improved error handling and recovery
Mandatory Modern Protocols - Requires broker 2.1+ for Java clients
KRaft Performance Benefits - Reduced latency with ZooKeeper removal
Strengthened Best Practices - Focus on idempotency and transactions
Clear Upgrade Path - KIP-1124 migration guidance

Deprecated Features

❌ Pre-2.1 protocol versions - Old client protocols removed
❌ Legacy compatibility modes - Modern protocols required
❌ ZooKeeper-based metadata - Replaced by KRaft

Ready to build high-performance, resilient producers? Let’s master Kafka producer architecture in the modern era.

Kafka Producer Architecture - Mastering Partition Selection

mindmap
  root((Kafka Producer Architecture))
    Partitioning
      Default Strategies
      Custom Partitioners
      Key-Based Routing
      Round-Robin
    Performance
      Batching
      Compression
      Pipelining
      Async Operations
    Reliability
      Durability (acks)
      Idempotency
      Transactions
      Retries
    Advanced Features
      Metadata Bootstrap
      Error Recovery
      Custom Serializers
      Interceptors

Building on Kafka Architecture and Kafka Topic Architecture, this deep dive explores how producers efficiently deliver data at scale.

Modern Kafka producers provide powerful guarantees while maintaining blazing performance through intelligent batching, compression, and partition selection.

Kafka Producers: The Data Gateway

Kafka producers transform your applications into streaming powerhouses. Key responsibilities:

Partition Selection - Determines data distribution
Serialization - Converts objects to bytes
Compression - Reduces network overhead
Batching - Optimizes throughput
Reliability - Ensures message delivery

The producer’s superpower? It picks the partition. This simple decision enables:

Ordered processing - Same key → same partition
Load distribution - Even spread across partitions
Custom routing - Business logic-based placement

Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Producer Architecture Visualization

Kafka Architecture: Kafka Producers

flowchart TB
  subgraph Producer["Producer Application"]
    A[Application Code]
    S[Serializer]
    P[Partitioner]
    B[Record Accumulator]
    C[Sender Thread]
  end
  
  subgraph Topic["Topic: Orders"]
    P0[Partition 0<br>Leader: Broker 1]
    P1[Partition 1<br>Leader: Broker 2]
    P2[Partition 2<br>Leader: Broker 3]
  end
  
  A -->|1. Create Record| S
  S -->|2. Serialize| P
  P -->|3. Select Partition| B
  B -->|4. Batch Records| C
  C -->|5. Send Batch| P0
  C -->|5. Send Batch| P1
  C -->|5. Send Batch| P2
  
  classDef producer fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  classDef partition fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333
  
  class A,S,P,B,C producer
  class P0,P1,P2 partition

Step-by-Step Explanation:

Application creates a ProducerRecord with key, value, and optional headers
Serializer converts key and value to byte arrays
Partitioner selects target partition based on key or round-robin
Record Accumulator batches records by partition
Sender thread transmits batches to partition leaders

Partition Selection Strategies

1. Default Partitioner (Key-Based)

// Records with same key always go to same partition
producer.send(new ProducerRecord<>("orders", 
    "customer-123",  // key determines partition
    orderData));     // value

// Hash(key) % numPartitions = target partition

Benefits:

Ordering guarantee per key
Co-location of related data
Predictable distribution

2. Round-Robin (No Key)

// Records distributed evenly across partitions
producer.send(new ProducerRecord<>("metrics", 
    null,           // no key
    metricData));   // round-robin distribution

Benefits:

Even load distribution
Maximum parallelism
Simple implementation

3. Custom Partitioner

public class PriorityPartitioner implements Partitioner {
    public int partition(String topic, Object key, byte[] keyBytes,
                        Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        
        // High priority to partition 0, others round-robin
        if (isPriorityCustomer(key)) {
            return 0;
        }
        
        // Round-robin for non-priority
        return Utils.toPositive(Utils.murmur2(keyBytes)) % 
               (partitions.size() - 1) + 1;
    }
}

Use Cases:

Priority routing - VIP traffic isolation
Geographic routing - Data locality
Hot partition avoidance - Custom load balancing

Write Cadence and Consistency

stateDiagram-v2
  [*] --> Creating: Create Record
  Creating --> Serializing: Serialize
  Serializing --> Partitioning: Select Partition
  Partitioning --> Batching: Add to Batch
  
  state Batching {
    [*] --> Accumulating
    Accumulating --> Ready: Batch Full or Timeout
  }
  
  Batching --> Sending: Send Batch
  
  state Sending {
    [*] --> InFlight
    InFlight --> WaitingAcks: Sent to Leader
    
    state WaitingAcks {
      Acks0 --> Success: Fire & Forget
      Acks1 --> Success: Leader Acknowledged
      AcksAll --> Success: All ISRs Acknowledged
    }
  }
  
  Sending --> [*]: Complete
  
  classDef creating fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  classDef sending fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333
  classDef acks fill:#c8e6c9,stroke:#43a047,stroke-width:1px,color:#333333
  
  class Creating,Serializing,Partitioning creating
  class Sending,InFlight,WaitingAcks sending
  class Acks0,Acks1,AcksAll acks

Step-by-Step Explanation:

Producer creates and serializes records
Partitioner assigns records to partitions
Batching accumulates records for efficiency
Sender transmits batches to partition leaders
Acknowledgment level determines durability

Durability Levels (acks)

acks=0 (Fire and Forget)

props.put(ProducerConfig.ACKS_CONFIG, "0");

Fastest performance
No durability guarantee
Use case: Metrics, logs where loss is acceptable

acks=1 (Leader Acknowledgment)

props.put(ProducerConfig.ACKS_CONFIG, "1");

Balanced performance/durability
Leader confirms write
Use case: Most applications

acks=all (Full Replication)

props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.MIN_IN_SYNC_REPLICAS_CONFIG, "2");

Strongest durability
All ISRs confirm write
Use case: Financial transactions, critical data

Advanced Producer Features

Idempotent Producers

Prevents duplicate messages during retries:

props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
// Automatically sets:
// - acks=all
// - retries=Integer.MAX_VALUE
// - max.in.flight.requests.per.connection=5

classDiagram
  class IdempotentProducer {
    +producerId: long
    +epoch: short
    +sequenceNumber: int
    +send(record): Future
    +initTransactions(): void
  }
  
  class ProducerRecord {
    +topic: string
    +partition: int
    +key: K
    +value: V
    +headers: Headers
    +timestamp: long
  }
  
  class RecordMetadata {
    +offset: long
    +partition: int
    +timestamp: long
    +serializedKeySize: int
    +serializedValueSize: int
  }
  
  IdempotentProducer "1" --> "many" ProducerRecord : sends
  ProducerRecord "1" --> "1" RecordMetadata : produces

Benefits:

Exactly-once delivery per partition
Automatic retry handling
No application-level deduplication

Transactional Producers

Atomic writes across partitions:

props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "order-processor-1");

producer.initTransactions();

try {
    producer.beginTransaction();
    
    // Send multiple records atomically
    producer.send(new ProducerRecord<>("orders", order));
    producer.send(new ProducerRecord<>("inventory", update));
    producer.send(new ProducerRecord<>("notifications", alert));
    
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

Use Cases:

Multi-topic updates - Order + Inventory + Payment
Exactly-once streaming - Kafka Streams applications
Event sourcing - Atomic event sequences

Compression Options

// ZSTD - Best compression ratio (recommended)
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "zstd");

// LZ4 - Fastest compression
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Snappy - Good balance
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

// GZIP - High compression, slower
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");

Batching Configuration

// Batch size in bytes
props.put(ProducerConfig.BATCH_SIZE_CONFIG, "16384");

// Linger time before sending partial batch
props.put(ProducerConfig.LINGER_MS_CONFIG, "10");

// Memory for batching
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, "33554432");

Metadata Bootstrap and Resilience

KIP-1102 introduces automatic metadata recovery:

flowchart LR
  subgraph Producer
    M[Metadata Cache]
    R[Request Handler]
    E[Error Handler]
  end
  
  subgraph Cluster
    B1[Broker 1]
    B2[Broker 2]
    B3[Broker 3]
  end
  
  M -->|Timeout| E
  E -->|Rebootstrap| R
  R -->|Fetch Metadata| B1
  R -->|Fetch Metadata| B2
  R -->|Fetch Metadata| B3
  B1 -->|Update| M
  
  style E fill:#ffcdd2,stroke:#e53935,stroke-width:1px,color:#333333
  style M fill:#c8e6c9,stroke:#43a047,stroke-width:1px,color:#333333

Benefits:

Automatic recovery from metadata failures
Improved resilience during cluster changes
No manual intervention required

Performance Optimization

flowchart TB
  subgraph "Performance Factors"
    A[Batching<br>Amortize overhead]
    B[Compression<br>Reduce network I/O]
    C[Pipelining<br>Multiple in-flight]
    D[Async Sends<br>Non-blocking]
  end
  
  subgraph "Metrics"
    M1[Throughput<br>Records/sec]
    M2[Latency<br>End-to-end time]
    M3[CPU Usage<br>Compression cost]
    M4[Network<br>Bandwidth usage]
  end
  
  A --> M1
  B --> M4
  C --> M1
  D --> M2
  B --> M3
  
  classDef factor fill:#bbdefb,stroke:#1976d2,stroke-width:1px,color:#333333
  classDef metric fill:#e1bee7,stroke:#8e24aa,stroke-width:1px,color:#333333
  
  class A,B,C,D factor
  class M1,M2,M3,M4 metric

Producer Architecture Review

Can producers write faster than consumers?

Yes. Producers and consumers operate independently. Kafka’s durability ensures data persists until consumed, regardless of consumption speed.

What’s the default partition strategy without a key?

Round-robin distribution ensures even load across all partitions when no key is specified.

What’s the default partition strategy with a key?

Records with the same key always route to the same partition using: hash(key) % numPartitions

Who picks the partition?

The producer exclusively controls partition selection through its configured or custom partitioner.

What’s the recommended durability setting?

Use acks=all with min.insync.replicas=2 for critical data. This ensures data replicates before acknowledgment.

Best Practices for 2025

Enable idempotence - Default for all producers
Use transactions - For multi-topic atomic operations
Choose ZSTD compression - Best compression ratio
Tune batching - Balance latency vs throughput
Implement custom partitioners - For special routing needs
Monitor producer metrics - Track performance and errors
Handle errors gracefully - Implement retry logic
Use async sends - With proper callback handling
Configure adequate memory - For batching buffers
Test failure scenarios - Verify resilience

Next Steps

Continue exploring Kafka architecture with Kafka Consumer Architecture to understand how consumers process the data your producers send.

About Cloudurable

Master Kafka production deployments with expert guidance. Cloudurable provides Kafka training, Kafka consulting, Kafka support and helps setting up Kafka clusters in AWS.

Check out our new GoLang course. We provide onsite Go Lang training which is instructor led.

comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting

🚀 What’s New in This 2025 Update

Major Updates and Changes

Deprecated Features

Kafka Producer Architecture - Mastering Partition Selection

Kafka Producers: The Data Gateway

Producer Architecture Visualization

Partition Selection Strategies

1. Default Partitioner (Key-Based)

2. Round-Robin (No Key)

3. Custom Partitioner

Write Cadence and Consistency

Durability Levels (acks)

Advanced Producer Features

Idempotent Producers

Transactional Producers

Compression Options

Batching Configuration

Metadata Bootstrap and Resilience

Performance Optimization

Producer Architecture Review

Can producers write faster than consumers?

What’s the default partition strategy without a key?

What’s the default partition strategy with a key?

Who picks the partition?

What’s the recommended durability setting?

Best Practices for 2025

Next Steps

About Cloudurable

Search

Share

Follow

Categories

Tags

Kafka Architecture: Producers - 2025 Edition

🚀 What’s New in This 2025 Update

Major Updates and Changes

Deprecated Features

Kafka Producer Architecture - Mastering Partition Selection

Kafka Producers: The Data Gateway

Producer Architecture Visualization

Partition Selection Strategies

1. Default Partitioner (Key-Based)

2. Round-Robin (No Key)

3. Custom Partitioner

Write Cadence and Consistency

Durability Levels (acks)

Advanced Producer Features

Idempotent Producers

Transactional Producers

Compression Options

Batching Configuration

Metadata Bootstrap and Resilience

Performance Optimization

Producer Architecture Review

Can producers write faster than consumers?

What’s the default partition strategy without a key?

What’s the default partition strategy with a key?

Who picks the partition?

What’s the recommended durability setting?

Best Practices for 2025

Next Steps

Related Content

About Cloudurable

Search

Share

Follow

Categories

Tags