Kinesis vs. Kafka

May 5, 2017


Kinesis works with streaming data.

  • Stock prices
  • Game data (scores from game)
  • Social network data
  • Geospatial data like Uber data where you are
  • IOT sensors

Kafka works with streaming data too.

Kinesis Streams is like Kafka Core. Kinesis Analytics is like Kafka Streams. A Kinesis Shard is like Kafka Partition.

They are similar and get used in similar use cases.

Data is stored in Kinesis for default 24 hours, and you can increase that up to 7 days.

Kafka records are by default stored for 7 days and you can increase that until you run out of disk space. In fact, you can decide by the size of the data or by date. You can even use compaction with Kafka so it only stores the latest timestamp per key per record in the log.

In Kinesis, data is stored in shards. In Kafka, data is stored in partitions.

With Kinesis data can be analyzed by lambda before it gets sent to S3 or RedShift.

With Kinesis you pay for use, by buying read and write units.

Kinesis Analytics allows you to perform SQL like queries on data. Kafka Streaming allows you to perform functional aggregations and mutations.

Kafka is more flexible than Kinesis but you have to manage your own clusters, and requires some dedicated DevOps resources to keep it going.

Kinesis is sold as a service and does not require a DevOps team to keep it going. You pay for this. Depending on the use case, Kinesis could be faster to get started and much easier than hiring staff to manage a Kafka cluster. However, with less flexibility and potentially the expense. It really depends on your use case and volume of data on which is the better fit.

We hope you enjoyed this article. Please provide feedback.


