How Apache Kafka prevents duplicates?
Β
πΉ 1. Why Duplicates Happen in Kafka
- Producer retries: if a producer doesnβt get an ack (network issue, timeout), it may resend the same message.
- Broker failures: if a leader fails after writing but before acking, clients may resend.
- Consumer retries: if a consumer crashes after processing but before committing offset, it will reprocess the same message.
πΉ 2. Producer-Side Guarantees
a) Idempotent Producer (no duplicates on retries)
- Kafka provides idempotent producers (
enable.idempotence=true
). - Each message gets a Producer ID (PID) and sequence number.
- Broker detects if it already received the sequence and discards duplicates.
π Prevents duplicates caused by retries.
props.put("enable.idempotence", "true"); // default true since Kafka 3.0
props.put("acks", "all");
props.put("retries", Integer.MAX_VALUE);
β Guarantees exactly-once within a single partition.
b) Transactions (Exactly-Once Semantics β EOS)
- For multi-partition or producer+consumer flows.
- Producer groups writes into a transaction.
- Either all writes succeed or none are visible.
- Works with read-process-write scenarios (like stream processing).
props.put("transactional.id", "txn-producer-1");
- Call
initTransactions()
,beginTransaction()
,send()
, thencommitTransaction()
orabortTransaction()
.
β Prevents duplicates in end-to-end pipelines.
πΉ 3. Broker-Side Settings
- acks=all β broker waits for all ISR replicas before ack β prevents βlost ack β resend duplicateβ issues.
- min.insync.replicas β ensures durability before ack.
πΉ 4. Consumer-Side Handling
Even with idempotence, consumers may reprocess messages if they:
- Fail after processing but before offset commit.
Solutions:
- Use idempotent operations in downstream systems (e.g., UPSERT instead of INSERT).
- Use transactions (EOS) with Kafka consumers (
isolation.level=read_committed
). - Store processed message IDs (deduplication store) β useful for external DB sinks.
πΉ 5. Exactly-Once in Kafka Streams
- Kafka Streams API supports exactly-once processing with transactions.
- Config:
processing.guarantee=exactly_once_v2
πΉ In Short
Kafka prevents duplicates using:
- Idempotent Producer β prevents resend duplicates.
- Transactions (EOS) β atomic multi-partition + end-to-end deduplication.
- Broker acks + replication β ensures consistency.
- Consumer design β commit offsets after processing, or use EOS with read_committed.
π With the right configs, Kafka can achieve exactly-once semantics (EOS) across the whole pipeline.