What is a Partition in Kafka?

  • A Partition is the basic unit of storage and parallelism in Kafka.

  • Each Kafka topic is split into one or more partitions.

  • A partition is essentially a log file (append-only) where messages are stored in order.

šŸ‘‰ Messages in a partition are identified by a unique, sequential offset (like a line number).


šŸ”¹ Why Partitions?

  1. Scalability → Multiple partitions allow a topic to be spread across brokers → higher throughput.

  2. Parallelism → Different consumers in a group can read partitions in parallel.

  3. Ordering guarantee → Kafka guarantees ordering only within a single partition (not across partitions).


šŸ”¹ Example

Topic orders with 3 partitions:

orders-topic:
   Partition-0: [0,1,2,3,...]
   Partition-1: [0,1,2,3,...]
   Partition-2: [0,1,2,3,...]
  • Each partition is stored on a broker.

  • Messages inside Partition-0 have offsets 0,1,2… in order.

  • Consumers read from specific partitions.


šŸ”¹ Partition Assignment (How Kafka decides where a message goes)

When a producer sends a message → partition chosen by:

  1. Keyed messages:

    • Partition = hash(key) % numPartitions.

    • Ensures messages with the same key go to the same partition → preserves order for that key.

  2. Round-robin (no key):

    • Messages distributed evenly across partitions.

  3. Custom partitioner:

    • Producer can define its own logic.


šŸ”¹ Replication of Partitions

  • Each partition is replicated across brokers for fault tolerance.

  • One replica = Leader (handles reads/writes).

  • Others = Followers (replicate data, part of ISR).

Example:
Partition-0 (Replication Factor = 3):

  • Leader on Broker-1

  • Followers on Broker-2 and Broker-3


šŸ”¹ Benefits of Partitions

āœ… Enable horizontal scaling of Kafka.
āœ… Provide parallelism (consumers read different partitions).
āœ… Ensure data durability with replication.
āœ… Allow ordering guarantees per key.


šŸ”¹ In Short

A Partition in Kafka is:

  • An ordered, immutable log of messages within a topic.

  • Identified by sequential offsets.

  • The key to scalability, parallelism, and fault-tolerance in Kafka.

  • Ordering is guaranteed only inside a partition.

Back to blog

Leave a comment