How Apache Kafka prevents duplicates?

Short answer: you can’t 100% prevent duplicates with a plain Kafka consumer.

Kafka guarantees at-least-once by default. To avoid processing an event twice you must either (a) use Kafka’s exactly-once processing (EOS) when you produce results back to Kafka, or (b) make your consumer idempotent (so reprocessing is harmless). Here are the practical paths:


Option A — Kafka “Exactly-Once” (read → process → write to Kafka)

Use transactional producer + idempotent writes and commit consumer offsets in the same transaction.

Key configs

  • Producer: enable.idempotence=true, acks=all, set a stable transactional.id (unique per app instance).
  • Consumer: isolation.level=read_committed, enable.auto.commit=false.

Flow (Java, high-level)

var consumer = new KafkaConsumer<String, String>(consumerProps);
var producer = new KafkaProducer<String, String>(producerProps);
producer.initTransactions();

while (true) {
  ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
  if (records.isEmpty()) continue;

  producer.beginTransaction();
  try {
    for (ConsumerRecord<String, String> r : records) {
      // process r
      producer.send(new ProducerRecord<>("out-topic", r.key(), transform(r.value())));
    }
    // atomically commit offsets with the write
    Map<TopicPartition, OffsetAndMetadata> offsets = currentOffsets(records);
    producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
    producer.commitTransaction();
  } catch (Exception e) {
    producer.abortTransaction(); // safe rollback; offsets not committed
  }
}

This gives EOS for read→process→write pipelines. Downstream consumers set isolation.level=read_committed to avoid seeing aborted writes.

Use Kafka Streams with processing.guarantee=exactly_once_v2 for the same guarantee with less code.


Option B — Idempotent consumer (read → process to external system)

When you write to a DB/HTTP service/etc., you can’t use Kafka transactions across systems. Make the effect idempotent and commit offsets only after the effect is durably recorded.

Patterns that work

  • Upsert by key in DB (e.g., INSERT ... ON CONFLICT UPDATE), or apply business keys with unique constraints.
  • Keep a processed_events table keyed by eventId (or (topic, partition, offset)), and write:
    • begin DB tx
    • apply effect (idempotent)
    • mark processed id
    • commit DB tx
    • commitSync offsets
  • Inbox/Outbox: write message to an “inbox” table and process from there transactionally.

Minimum Java loop

props.put("enable.auto.commit", "false");

while (running) {
  var records = consumer.poll(Duration.ofMillis(1000));
  for (var r : records) {
    // idempotent effect, e.g., UPSERT by eventId or natural key
    applyIdempotentEffect(r);
  }
  consumer.commitSync(); // after effects are durable
}

Why duplicates happen (and how to reduce them)

  • Crash after processing but before committing offsets → same messages re-delivered.
  • Rebalances and retries can replay the last batch.
  • Network glitches during commits.

Hygiene checklist

  • enable.auto.commit=false and commit after processing (at-least-once).
  • Tune batch size so replays are small.
  • Implement ConsumerRebalanceListener: on onPartitionsRevoked, finish/flush work and commit offsets for those partitions.
  • Prefer cooperative-sticky rebalancing (Kafka clients 2.4+): partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor.
  • Use keys and make your handlers naturally idempotent.
  • For producers: retries high; with idempotence, duplicates won’t occur on the topic.

Quick decision guide

  • You produce to Kafka as output? Use transactions (EOS) or Kafka Streams exactly_once_v2.
  • You write to DB/HTTP? Make the operation idempotent and commit offsets after the durable effect (or use Inbox/Outbox).
  • Pure read-only? Duplicates don’t matter; just be aware of replays.
Back to blog

Leave a comment