How Apache Kafka prevents duplicates?

June 23, 2025

Short answer: you can’t 100% prevent duplicates with a plain Kafka consumer.

Kafka guarantees at-least-once by default. To avoid processing an event twice you must either (a) use Kafka’s exactly-once processing (EOS) when you produce results back to Kafka, or (b) make your consumer idempotent (so reprocessing is harmless). Here are the practical paths:

Option A — Kafka “Exactly-Once” (read → process → write to Kafka)

Use transactional producer + idempotent writes and commit consumer offsets in the same transaction.

Key configs

Producer: enable.idempotence=true, acks=all, set a stable transactional.id (unique per app instance).
Consumer: isolation.level=read_committed, enable.auto.commit=false.

Flow (Java, high-level)

var consumer = new KafkaConsumer<String, String>(consumerProps);
var producer = new KafkaProducer<String, String>(producerProps);
producer.initTransactions();

while (true) {
  ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
  if (records.isEmpty()) continue;

  producer.beginTransaction();
  try {
    for (ConsumerRecord<String, String> r : records) {
      // process r
      producer.send(new ProducerRecord<>("out-topic", r.key(), transform(r.value())));
    }
    // atomically commit offsets with the write
    Map<TopicPartition, OffsetAndMetadata> offsets = currentOffsets(records);
    producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
    producer.commitTransaction();
  } catch (Exception e) {
    producer.abortTransaction(); // safe rollback; offsets not committed
  }
}

This gives EOS for read→process→write pipelines. Downstream consumers set isolation.level=read_committed to avoid seeing aborted writes.

Use Kafka Streams with processing.guarantee=exactly_once_v2 for the same guarantee with less code.

Option B — Idempotent consumer (read → process to external system)

When you write to a DB/HTTP service/etc., you can’t use Kafka transactions across systems. Make the effect idempotent and commit offsets only after the effect is durably recorded.

Patterns that work

Upsert by key in DB (e.g., INSERT ... ON CONFLICT UPDATE), or apply business keys with unique constraints.
Keep a processed_events table keyed by eventId (or (topic, partition, offset)), and write:
- begin DB tx
- apply effect (idempotent)
- mark processed id
- commit DB tx
- commitSync offsets
Inbox/Outbox: write message to an “inbox” table and process from there transactionally.

Minimum Java loop

props.put("enable.auto.commit", "false");

while (running) {
  var records = consumer.poll(Duration.ofMillis(1000));
  for (var r : records) {
    // idempotent effect, e.g., UPSERT by eventId or natural key
    applyIdempotentEffect(r);
  }
  consumer.commitSync(); // after effects are durable
}

Why duplicates happen (and how to reduce them)

Crash after processing but before committing offsets → same messages re-delivered.
Rebalances and retries can replay the last batch.
Network glitches during commits.

Hygiene checklist

enable.auto.commit=false and commit after processing (at-least-once).
Tune batch size so replays are small.
Implement ConsumerRebalanceListener: on onPartitionsRevoked, finish/flush work and commit offsets for those partitions.
Prefer cooperative-sticky rebalancing (Kafka clients 2.4+): partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor.
Use keys and make your handlers naturally idempotent.
For producers: retries high; with idempotence, duplicates won’t occur on the topic.

Quick decision guide

You produce to Kafka as output? Use transactions (EOS) or Kafka Streams exactly_once_v2.
You write to DB/HTTP? Make the operation idempotent and commit offsets after the durable effect (or use Inbox/Outbox).
Pure read-only? Duplicates don’t matter; just be aware of replays.

Back to blog

Item added to your cart

How Apache Kafka prevents duplicates?

Option A — Kafka “Exactly-Once” (read → process → write to Kafka)

Option B — Idempotent consumer (read → process to external system)

Why duplicates happen (and how to reduce them)

Quick decision guide

Leave a comment