Major Components of Apache Kafka
1. Producer
-
Applications that publish (write) messages to Kafka topics.
-
Decide which topic and partition the message goes to.
-
Can send data synchronously or asynchronously.
2. Consumer
-
Applications that subscribe (read) messages from Kafka topics.
-
Belong to consumer groups → Kafka distributes partitions among consumers in the group (parallel processing).
-
Track offsets (position in log) to know which messages are processed.
3. Topic
-
A logical category (like a channel or stream) where records are published.
-
Example:
"orders"
,"payments"
. -
Topics are split into partitions for scalability.
4. Partition
-
A log file inside a topic, storing messages in ordered sequence.
-
Each message gets an offset (ID).
-
Multiple partitions = parallelism + higher throughput.
5. Broker
-
A Kafka server that stores topics and partitions.
-
Handles requests from producers (write) and consumers (read).
-
A cluster usually has multiple brokers (for scalability & fault-tolerance).
6. Cluster
-
A group of brokers working together.
-
Topics are distributed across brokers → partitions are replicated for reliability.
7. Zookeeper (in older Kafka versions ≤ 2.8)
-
Manages broker metadata, cluster coordination, leader election.
-
📌 Newer Kafka (KRaft mode) removes ZooKeeper → Kafka manages metadata internally.
8. Controller
-
A special broker that manages partition leaders and handles failover.
-
Ensures if a broker/partition leader fails, a new leader is elected from ISR.
9. Log (Commit Log / Message Store)
-
Each partition is a commit log file where records are appended.
-
Data is immutable and retained for a configured period (e.g., 7 days).
-
Consumers read sequentially using offsets.
10. Replication & ISR (In-Sync Replicas)
-
Kafka replicates partitions across brokers for durability.
-
ISR = replicas fully caught up with the leader → ensures safe failover.
11. Kafka Connect
-
A framework to integrate Kafka with external systems (DBs, Elasticsearch, S3, etc.) using source/sink connectors.
12. Kafka Streams / ksqlDB
-
Kafka Streams → a Java library for building real-time stream processing apps.
-
ksqlDB → SQL-like interface for querying and transforming Kafka streams.
🔹 Quick Diagram (Textual)
Producers ---> [ Topic: orders ]
| Partition-0 [Leader+Replicas]
| Partition-1 [Leader+Replicas]
| Partition-2 [Leader+Replicas]
Consumers <--- |
-
Producers write messages into topics (partitions).
-
Brokers store them.
-
Consumers read them (via consumer groups).
-
Replication + ISR ensure fault-tolerance.
✅ In short:
Kafka’s major components are: Producers, Consumers, Topics, Partitions, Brokers, Cluster, Controller, ZooKeeper (legacy), Logs, ISR, Kafka Connect, and Kafka Streams.