Payment Gateway integration with Spring Boot

1) Goals & non‑negotiables

  • High TPS & low latency: sub‑200ms for your own APIs (excluding external gateway hops).
  • Exactly‑once effects: no double charges or double refunds.
  • Idempotency everywhere: API, workers, and webhook handlers.
  • Eventual consistency: order state syncs with payment state via events.
  • Security & compliance: PCI‑DSS scope minimized (tokenization; no PAN storage), HMAC‑verified webhooks, secrets rotation.

2) High‑level architecture

[Clients/Web/Mobile]
        |
      (TLS)
        |
   [API Gateway / Ingress]
        |
        +--> [AuthN/Z] ----+
                           |
                      [Payment API Service]  <-----> [Redis] (rate limit, locks, tokens)
                           |       \
                           |        \ (async)
                           |         -> [Kafka/Pulsar]  --->  [Payment Worker(s)]
                           |                                 /      \
                  [Postgres/MySQL] (OLTP + Outbox)         /        +--> [Notification Svc]
                           |                               /               
                           +------> [Gateway Connector(s): Stripe/Razorpay/PayU/etc.]
                                                   |
                                             [External Gateway]
                                                   |
                                              (Webhooks)
                                                   |
                                              [Webhook Ingest]
                                                   |
                                              [Kafka Topic: pg-events]
                                                   |
                                            [Payment Worker(s)]
                                                   |
                                            [Order Svc / Ledger Svc]  (via events)

Why this shape?

  • Sync edge: Create intent/authorize returns fast.
  • Async core: Captures, settlements, refunds, 3DS/UPI callbacks flow through Kafka + workers.
  • Outbox pattern ensures reliable “write‑then‑publish”.
  • Webhook Ingest is stateless and idempotent.

3) Core flows (state machine)

States: PENDING → REQUIRES_ACTION? → AUTHORIZED → CAPTURED → SETTLED → REFUND_PENDING → REFUNDED → FAILED/EXPIRED

Authorize (card)

  1. Client calls POST /payments with amount, currency, orderId, idempotencyKey.
  2. Payment API creates DB row (PENDING) + outbox event, calls gateway to create PaymentIntent (or order).
  3. If 3DS/OTP needed → returns REQUIRES_ACTION + clientSecret/redirect.
  4. Gateway callback/webhook finalizes: AUTHORIZED or FAILED. Worker captures (immediate or delayed).

UPI/NetBanking/Wallet

  • Similar, but likely PENDING → REQUIRES_ACTION with polling or webhook to move to CAPTURED.

Refund

  • POST /payments/{id}/refunds writes REFUND_PENDING, publishes event; worker calls gateway; webhook confirms REFUNDED.

4) Data model (minimal)

-- payments
id BIGSERIAL PK
order_id VARCHAR(64) UNIQUE
idempotency_key VARCHAR(64) UNIQUE NOT NULL
amount_cents BIGINT NOT NULL
currency CHAR(3) NOT NULL
status VARCHAR(24) NOT NULL
gateway VARCHAR(32) NOT NULL
gateway_payment_id VARCHAR(128)
customer_id VARCHAR(64)
metadata JSONB
version INT NOT NULL DEFAULT 0
created_at, updated_at

-- payment_events (immutable ledger)
id BIGSERIAL PK
payment_id BIGINT NOT NULL
type VARCHAR(32) NOT NULL   -- CREATED, AUTHORIZED, CAPTURED, FAILED, REFUND_...
payload JSONB NOT NULL
created_at

-- outbox (for reliable publish)
id BIGSERIAL PK
aggregate_type VARCHAR(64)  -- Payment
aggregate_id BIGINT
event_type VARCHAR(64)
payload JSONB
status VARCHAR(16)          -- NEW, SENT, ERROR
created_at, last_error

-- processed_webhooks (dedupe)
gateway_event_id VARCHAR(128) PK
first_seen_at TIMESTAMP

5) Idempotency & exactly‑once

  • Client‑supplied Idempotency-Key (UUID) → unique index; repeat calls return the first result.
  • Webhook dedupe: store gateway_event_id; ignore if seen.
  • Capture/Refund guarded by:
    • Redis lock lock:payment:{id}:capture
    • Version check (optimistic locking) on payments.version.
  • Outbox: DB transaction commits both payment change and outbox record. A background publisher reads outbox and publishes to Kafka. If publisher crashes, record stays and retries.

6) Resilience & backpressure

  • Timeouts (e.g., 2–3s to gateway), retries with jitter, circuit breaker (Resilience4j).
  • Bulkheads: connector threadpools per gateway to isolate slowness.
  • Rate limits: Redis token bucket per merchant/customer/IP.
  • Fallback: if gateway call uncertain (timeout), mark PENDING_GATEWAY and rely on webhook + reconciliation.

7) Spring Boot implementation sketch

7.1 Contract & controller

// DTOs
record CreatePaymentRequest(
    String orderId,
    long amountCents,
    String currency,
    String gateway,
    Map<String, String> metadata
) {}

record CreatePaymentResponse(
    String paymentId,
    String status,          // PENDING | REQUIRES_ACTION | AUTHORIZED ...
    String clientSecret,    // for 3DS/UPI, if applicable
    String redirectUrl
) {}

@RestController
@RequestMapping("/payments")
class PaymentController {
  private final PaymentService service;

  @PostMapping
  public ResponseEntity<CreatePaymentResponse> create(
      @RequestBody CreatePaymentRequest req,
      @RequestHeader("Idempotency-Key") String idemKey) {
    var res = service.createPayment(req, idemKey);
    return ResponseEntity.status(HttpStatus.ACCEPTED).body(res);
  }

  @PostMapping("/{id}/refunds")
  public ResponseEntity<Void> refund(@PathVariable String id,
      @RequestHeader("Idempotency-Key") String key) {
    service.initiateRefund(id, key);
    return ResponseEntity.accepted().build();
  }
}

7.2 Gateway strategy & connector

public interface GatewayClient {
  GatewayCreateResult createPaymentIntent(Payment p);
  GatewayCaptureResult capture(String gatewayPaymentId, long amount);
  GatewayRefundResult refund(String gatewayPaymentId, long amount);
  GatewayVerifyResult verifyWebhook(String signature, String payload);
}

@Service
class GatewayRouter {
  private final Map<String, GatewayClient> clients; // "razorpay","stripe","payu"
  public GatewayClient clientFor(String name) { return clients.get(name); }
}

7.3 Service with outbox + optimistic locking

@Service
@Transactional
class PaymentService {
  private final PaymentRepo repo; 
  private final OutboxRepo outbox;
  private final GatewayRouter router;

  public CreatePaymentResponse createPayment(CreatePaymentRequest r, String idemKey) {
    var existing = repo.findByIdempotencyKey(idemKey);
    if (existing.isPresent()) return toResp(existing.get());

    var p = Payment.newPending(r, idemKey);
    repo.save(p);

    var gw = router.clientFor(r.gateway());
    var created = gw.createPaymentIntent(p);

    p.applyGatewayCreate(created); // sets status and gateway ids
    repo.save(p);

    outbox.save(OutboxEvent.paymentCreated(p));
    return toResp(p);
  }

  public void initiateRefund(String paymentId, String key) {
    var p = repo.lockById(paymentId); // SELECT ... FOR UPDATE
    if (p.isRefundable()) {
      p.markRefundPending(key);
      repo.save(p);
      outbox.save(OutboxEvent.refundRequested(p));
    }
  }
}

7.4 Webhook handler (idempotent)

@RestController
@RequestMapping("/webhooks/gatewayX")
class WebhookController {
  private final GatewayClient gw;
  private final WebhookService svc;

  @PostMapping
  public ResponseEntity<Void> handle(@RequestHeader("X-Signature") String sig,
                                     @RequestBody String payload) {
    gw.verifyWebhook(sig, payload);         // throws if invalid
    var event = parseEvent(payload);
    if (svc.isDuplicate(event.id())) return ResponseEntity.ok().build();

    svc.record(event.id());                 // insert into processed_webhooks
    svc.enqueue(event);                     // publish to Kafka
    return ResponseEntity.ok().build();
  }
}

7.5 Resilience4j (example)

resilience4j:
  circuitbreaker:
    instances:
      gatewayStripe:
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 50
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
  retry:
    instances:
      gatewayStripe:
        maxAttempts: 3
        waitDuration: 200ms
        enableExponentialBackoff: true

8) Scaling for high TPS

  • Stateless APIs behind HPA. Keep synchronous path thin: validate → write row → call gateway create intent → return.
  • DB: primary for writes, read replicas for queries; partition by created_at or merchant_id at very high scale; tune connection pools (HikariCP).
  • Kafka: partition by payment_id to keep order; consumers scale horizontally.
  • Redis: cluster mode for rate limit and locks.
  • Cold paths async: settlement, invoice, email, ledger writes.
  • Batching where allowed: settlements/refunds if the gateway supports it.

Quick capacity thumb‑rules

  • 1 CPU core usually sustains ~500–1500 light REST RPS; payment paths are heavier due to gateway I/O → plan ~300–600 RPS/core.
  • Keep DB writes < 5–8k TPS per primary without sharding; if you exceed, shard by merchant or region, and move events/ledger to append‑only storage first, then project to SQL (CQRS).

9) Security & compliance

  • Don’t store PAN/CVV; use gateway tokens/payment methods.
  • HMAC‑verify all webhooks; rotate secrets; pin source IP ranges where supported.
  • Secrets in KMS/HashiCorp Vault; short‑lived tokens to the client.
  • PCI‑DSS scope: keep backends SAQ‑A/SAQ‑A‑EP by using hosted fields/redirects where possible.
  • PII: encrypt at rest (AES‑256), TLS 1.2+, mTLS inside cluster if feasible.

10) Ops, monitoring & testing

  • Golden signals: auth rate, capture rate, refund success, p50/p95/p99 latencies per gateway, error taxonomy (4xx/5xx/timeouts).
  • Business metrics: chargeback rate, authorization lift by BIN/issuer, decline reasons.
  • Trace with OpenTelemetry: propagate traceId across API → worker → connector.
  • Reconciliation job: periodic pull from gateway (last N hours/days) to repair stragglers.
  • Chaos & failure drills: simulate gateway slowness/outage; verify circuit breakers and fallback states.

11) Minimal API contracts

  • POST /payments → 202 + {paymentId,status,clientSecret|redirectUrl}
  • GET /payments/{id} → current status (from OLTP or a read model cache)
  • POST /payments/{id}/capture (if using delayed capture)
  • POST /payments/{id}/refunds
  • Webhooks: /webhooks/{gateway}

12) India‑specific notes (if relevant)

  • UPI: expect PENDING → SUCCESS via webhook in seconds; handle user cancellations cleanly.
  • NetBanking often requires redirect/return URL + webhook; always trust webhook over front‑channel.

Copy‑paste checklist

  • Idempotency key + unique index
  • Outbox table + publisher
  • Webhook dedupe table
  • Redis locks for capture/refund
  • Circuit breaker + retry + timeouts
  • Reconciliation cron + dashboards
  • HMAC verification + secrets rotation
  • Read model (cache) for status lookups
  • Playbooks for gateway outage & rollback
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.