Kafka

Kafka is an open source event streaming and messaging platform, designedfor distributed and fault tolerant operation.

Architecture

flowchart TB Producer1 Producer2 subgraph Cluster subgraph Broker1 subgraph TopicA direction LR TopicAPartition1 TopicAPartition2 end subgraph TopicB direction LR TopicBPartition1 TopicBPartition2 end subgraph TopicC direction LR TopicCPartition1 TopicCPartition2 end end subgraph Broker2 subgraph TopicX direction LR TopicXPartition1 TopicXPartition2 end subgraph TopicY direction LR TopicYPartition1 TopicYPartition2 end subgraph TopicZ direction LR TopicZPartition1 TopicZPartition2 end end end subgraph ConsumerGroupA ConsumerGroupAConsumer1 ConsumerGroupAConsumer2 end subgraph ConsumerGroupB ConsumerGroupBConsumer1 ConsumerGroupBConsumer2 end Producer1-->Cluster Producer2-->Cluster ConsumerGroupA-->Cluster ConsumerGroupB-->Cluster

Concepts

  • Messages are the atomic units of data moving through the cluster. They're byte arrays, and are agnostic of format.
  • Topics categorise messages into segregated streams.
  • Partitions allow horizontal scalability by sharding topics. Partitioning can be:
    • randomly, if no key, partition, or custom partitioning logic is specified;
    • based on a key specified on each message;
    • by a key specified on each message; or
    • based on custom partitioning logic.
  • Brokers are the indiviudual nodes which make up the Kafka cluster.
  • Producers produce messages which are published to the Kafka cluster.
  • Consumers read from partitions, in-order. The last-read offset can be stored to allow the consumer to resume where it left off when interrupted. It's possibly to replay events by altering this offset value.
  • Consumer Groups are sets of consumers which each receive a proportion of the topic's messages. As consumers join and leave the group, the elected coordinator for the group manages assignment of partitions to consumers (rebalancing) to create a new generation of the group.

Backlinks