How Apache Kafka prevents duplicates?

Β 

πŸ”Ή 1. Why Duplicates Happen in Kafka

  • Producer retries: if a producer doesn’t get an ack (network issue, timeout), it may resend the same message.
  • Broker failures: if a leader fails after writing but before acking, clients may resend.
  • Consumer retries: if a consumer crashes after processing but before committing offset, it will reprocess the same message.

πŸ”Ή 2. Producer-Side Guarantees

a) Idempotent Producer (no duplicates on retries)

  • Kafka provides idempotent producers (enable.idempotence=true).
  • Each message gets a Producer ID (PID) and sequence number.
  • Broker detects if it already received the sequence and discards duplicates.

πŸ‘‰ Prevents duplicates caused by retries.

props.put("enable.idempotence", "true");  // default true since Kafka 3.0
props.put("acks", "all");
props.put("retries", Integer.MAX_VALUE);

βœ… Guarantees exactly-once within a single partition.


b) Transactions (Exactly-Once Semantics – EOS)

  • For multi-partition or producer+consumer flows.
  • Producer groups writes into a transaction.
  • Either all writes succeed or none are visible.
  • Works with read-process-write scenarios (like stream processing).
    • props.put("transactional.id", "txn-producer-1");
  • Call initTransactions(), beginTransaction(), send(), then commitTransaction() or abortTransaction().

βœ… Prevents duplicates in end-to-end pipelines.


πŸ”Ή 3. Broker-Side Settings

  • acks=all β†’ broker waits for all ISR replicas before ack β†’ prevents β€œlost ack β†’ resend duplicate” issues.
  • min.insync.replicas β†’ ensures durability before ack.

πŸ”Ή 4. Consumer-Side Handling

Even with idempotence, consumers may reprocess messages if they:

  • Fail after processing but before offset commit.

Solutions:

  • Use idempotent operations in downstream systems (e.g., UPSERT instead of INSERT).
  • Use transactions (EOS) with Kafka consumers (isolation.level=read_committed).
  • Store processed message IDs (deduplication store) β€” useful for external DB sinks.

πŸ”Ή 5. Exactly-Once in Kafka Streams

  • Kafka Streams API supports exactly-once processing with transactions.
  • Config:
processing.guarantee=exactly_once_v2

πŸ”Ή In Short

Kafka prevents duplicates using:

  1. Idempotent Producer β†’ prevents resend duplicates.
  2. Transactions (EOS) β†’ atomic multi-partition + end-to-end deduplication.
  3. Broker acks + replication β†’ ensures consistency.
  4. Consumer design β†’ commit offsets after processing, or use EOS with read_committed.

πŸ‘‰ With the right configs, Kafka can achieve exactly-once semantics (EOS) across the whole pipeline.

Back to blog

Leave a comment