November 6, 2024
This article appeared on LinkedIn on Feb 24th, 2018.
The Kafka Ecosystem - Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry
Rick HightowerEngineering Consultant focused on AI
February 24, 2018
The core of Kafka is the brokers, topics, logs, partitions, and cluster. The core also consists of related tools like MirrorMaker. The aforementioned is Kafka as it exists in Apache.
The Kafka ecosystem consists of Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry. Most of the additional pieces of the Kafka ecosystem comes from Confluent and is not part of Apache.
Kafka Stream is the Streams API to transform, aggregate, and process records from a stream and produces derivative streams. Kafka Connect is the connector API to create reusable producers and consumers (e.g., stream of changes from DynamoDB). The Kafka REST Proxy is used to producers and consumer over REST (HTTP). The Schema Registry manages schemas using Avro for Kafka records. The Kafka MirrorMaker is used to replicate cluster data to another cluster.
Kafka Ecosystem: Diagram of Connect Source, Connect Sink, and Kafka Streams
Kafka Connect Sources are sources of records. Kafka Connect Sinks are destinations for records.
Kafka Ecosystem: Kafka REST Proxy and Confluent Schema Registry
Kafka Streams - Kafka Streams for Stream Processing
The Kafka Stream API builds on core Kafka primitives and has a life of its own. Kafka Streams enables real-time processing of streams. Kafka Streams supports stream processors. A stream processor takes continual streams of records from input topics, performs some processing, transformation, aggregation on input, and produces one or more output streams. For example, a video player application might take an input stream of events of videos watched, and videos paused, and output a stream of user preferences and then gear new video recommendations based on recent user activity or aggregate activity of many users to see what new videos are hot. Kafka Stream API solves hard problems with out of order records, aggregating across multiple streams, joining data from multiple streams, allowing for stateful computations, and more.
Kafka Ecosystem: Kafka Streams and Kafka Connect
Kafka Ecosystem Review
What is Kafka Streams?
Kafka Streams enable real-time processing of streams. It can aggregate across multiple streams, joining data from multiple streams, allowing for stateful computations, and more. See also KSQL.
What is Kafka Connect?
Kafka Connect is the connector API to create reusable producers and consumers (e.g., stream of changes from DynamoDB). Kafka Connect Sources are sources of records. Kafka Connect Sinks are a destination for records.
What is the Schema Registry?
The Schema Registry manages schemas using Avro for Kafka records.
What is Kafka Mirror Maker?
The Kafka MirrorMaker is used to replicate cluster data to another cluster.
When might you use Kafka REST Proxy?
The Kafka REST Proxy is used to producers and consumer over REST (HTTP). You could use it for easy integration of existing code bases.
Related content
- What is Kafka?
- Kafka Architecture
- Kafka Topic Architecture
- Kafka Consumer Architecture
- Kafka Producer Architecture
- Kafka Architecture and low-level design
- Kafka and Schema Registry
- Kafka and Avro
- Kafka Ecosystem
- Kafka vs. JMS
- Kafka versus Kinesis
- Kafka Tutorial: Using Kafka from the command line
- Kafka Tutorial: Kafka Broker Failover and Consumer Failover
- Kafka Tutorial
- Kafka Tutorial: Writing a Kafka Producer example in Java
- Kafka Tutorial: Writing a Kafka Consumer example in Java
- Kafka Architecture: Log Compaction
About the Author
Rick Hightower is a distinguished engineering consultant with a focus on Artificial Intelligence and data engineering technologies. With decades of experience in the tech industry, Rick has established himself as a thought leader in the fields of distributed systems, big data processing, and stream processing frameworks like Apache Kafka.
As a prolific writer and speaker, Rick has contributed numerous articles and tutorials on complex technical subjects, making them accessible to a wide audience of developers and engineers. His work on the Kafka ecosystem, as demonstrated in this article, showcases his deep understanding of modern data architectures and their practical applications.
Rick’s expertise extends beyond Kafka to encompass a broad range of technologies in the AI and data engineering space. He is known for his ability to bridge the gap between theoretical concepts and real-world implementations, helping organizations leverage cutting-edge technologies to solve complex business problems.
In addition to his consulting work, Rick is actively involved in the tech community, frequently participating in conferences, webinars, and workshops to share his knowledge and insights. His passion for technology and commitment to education have made him a valuable resource for both aspiring and seasoned professionals in the field.
Rick Hightower’s Articles
Explore Rick Hightower’s insightful articles on Kafka, data engineering, and related technologies:
Term | Definition | Component | Function |
---|---|---|---|
Kafka Core | The foundation of Apache Kafka | Brokers, topics, logs, partitions, cluster | Manages message storage and distribution |
Kafka Streams | API for stream processing | Stream processors | Transforms, aggregates, and processes data streams |
Kafka Connect | API for data integration | Connectors (Source and Sink) | Creates reusable data producers and consumers |
Kafka REST Proxy | HTTP interface for Kafka | REST API | Allows producers and consumers to interact via HTTP |
Schema Registry | Schema management tool | Avro schemas | Manages and stores schemas for Kafka records |
MirrorMaker | Replication tool | Replication mechanism | Replicates data between Kafka clusters |
Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting