Payment Gateway integration with Spring Boot
1) Goals & non‑negotiables
- High TPS & low latency: sub‑200ms for your own APIs (excluding external gateway hops).
- Exactly‑once effects: no double charges or double refunds.
- Idempotency everywhere: API, workers, and webhook handlers.
- Eventual consistency: order state syncs with payment state via events.
- Security & compliance: PCI‑DSS scope minimized (tokenization; no PAN storage), HMAC‑verified webhooks, secrets rotation.
2) High‑level architecture
[Clients/Web/Mobile]
|
(TLS)
|
[API Gateway / Ingress]
|
+--> [AuthN/Z] ----+
|
[Payment API Service] <-----> [Redis] (rate limit, locks, tokens)
| \
| \ (async)
| -> [Kafka/Pulsar] ---> [Payment Worker(s)]
| / \
[Postgres/MySQL] (OLTP + Outbox) / +--> [Notification Svc]
| /
+------> [Gateway Connector(s): Stripe/Razorpay/PayU/etc.]
|
[External Gateway]
|
(Webhooks)
|
[Webhook Ingest]
|
[Kafka Topic: pg-events]
|
[Payment Worker(s)]
|
[Order Svc / Ledger Svc] (via events)
Why this shape?
- Sync edge: Create intent/authorize returns fast.
- Async core: Captures, settlements, refunds, 3DS/UPI callbacks flow through Kafka + workers.
- Outbox pattern ensures reliable “write‑then‑publish”.
- Webhook Ingest is stateless and idempotent.
3) Core flows (state machine)
States: PENDING → REQUIRES_ACTION? → AUTHORIZED → CAPTURED → SETTLED → REFUND_PENDING → REFUNDED → FAILED/EXPIRED
Authorize (card)
- Client calls
POST /payments
with amount, currency, orderId, idempotencyKey. - Payment API creates DB row (
PENDING
) + outbox event, calls gateway to create PaymentIntent (or order). - If 3DS/OTP needed → returns
REQUIRES_ACTION
+ clientSecret/redirect. - Gateway callback/webhook finalizes:
AUTHORIZED
orFAILED
. Worker captures (immediate or delayed).
UPI/NetBanking/Wallet
-
Similar, but likely
PENDING → REQUIRES_ACTION
with polling or webhook to move toCAPTURED
.
Refund
-
POST /payments/{id}/refunds
writesREFUND_PENDING
, publishes event; worker calls gateway; webhook confirmsREFUNDED
.
4) Data model (minimal)
-- payments
id BIGSERIAL PK
order_id VARCHAR(64) UNIQUE
idempotency_key VARCHAR(64) UNIQUE NOT NULL
amount_cents BIGINT NOT NULL
currency CHAR(3) NOT NULL
status VARCHAR(24) NOT NULL
gateway VARCHAR(32) NOT NULL
gateway_payment_id VARCHAR(128)
customer_id VARCHAR(64)
metadata JSONB
version INT NOT NULL DEFAULT 0
created_at, updated_at
-- payment_events (immutable ledger)
id BIGSERIAL PK
payment_id BIGINT NOT NULL
type VARCHAR(32) NOT NULL -- CREATED, AUTHORIZED, CAPTURED, FAILED, REFUND_...
payload JSONB NOT NULL
created_at
-- outbox (for reliable publish)
id BIGSERIAL PK
aggregate_type VARCHAR(64) -- Payment
aggregate_id BIGINT
event_type VARCHAR(64)
payload JSONB
status VARCHAR(16) -- NEW, SENT, ERROR
created_at, last_error
-- processed_webhooks (dedupe)
gateway_event_id VARCHAR(128) PK
first_seen_at TIMESTAMP
5) Idempotency & exactly‑once
-
Client‑supplied
Idempotency-Key
(UUID) → unique index; repeat calls return the first result. -
Webhook dedupe: store
gateway_event_id
; ignore if seen. -
Capture/Refund guarded by:
-
Redis lock
lock:payment:{id}:capture
-
Version check (optimistic locking) on
payments.version
.
-
Redis lock
- Outbox: DB transaction commits both payment change and outbox record. A background publisher reads outbox and publishes to Kafka. If publisher crashes, record stays and retries.
6) Resilience & backpressure
- Timeouts (e.g., 2–3s to gateway), retries with jitter, circuit breaker (Resilience4j).
- Bulkheads: connector threadpools per gateway to isolate slowness.
- Rate limits: Redis token bucket per merchant/customer/IP.
-
Fallback: if gateway call uncertain (timeout), mark
PENDING_GATEWAY
and rely on webhook + reconciliation.
7) Spring Boot implementation sketch
7.1 Contract & controller
// DTOs
record CreatePaymentRequest(
String orderId,
long amountCents,
String currency,
String gateway,
Map<String, String> metadata
) {}
record CreatePaymentResponse(
String paymentId,
String status, // PENDING | REQUIRES_ACTION | AUTHORIZED ...
String clientSecret, // for 3DS/UPI, if applicable
String redirectUrl
) {}
@RestController
@RequestMapping("/payments")
class PaymentController {
private final PaymentService service;
@PostMapping
public ResponseEntity<CreatePaymentResponse> create(
@RequestBody CreatePaymentRequest req,
@RequestHeader("Idempotency-Key") String idemKey) {
var res = service.createPayment(req, idemKey);
return ResponseEntity.status(HttpStatus.ACCEPTED).body(res);
}
@PostMapping("/{id}/refunds")
public ResponseEntity<Void> refund(@PathVariable String id,
@RequestHeader("Idempotency-Key") String key) {
service.initiateRefund(id, key);
return ResponseEntity.accepted().build();
}
}
7.2 Gateway strategy & connector
public interface GatewayClient {
GatewayCreateResult createPaymentIntent(Payment p);
GatewayCaptureResult capture(String gatewayPaymentId, long amount);
GatewayRefundResult refund(String gatewayPaymentId, long amount);
GatewayVerifyResult verifyWebhook(String signature, String payload);
}
@Service
class GatewayRouter {
private final Map<String, GatewayClient> clients; // "razorpay","stripe","payu"
public GatewayClient clientFor(String name) { return clients.get(name); }
}
7.3 Service with outbox + optimistic locking
@Service
@Transactional
class PaymentService {
private final PaymentRepo repo;
private final OutboxRepo outbox;
private final GatewayRouter router;
public CreatePaymentResponse createPayment(CreatePaymentRequest r, String idemKey) {
var existing = repo.findByIdempotencyKey(idemKey);
if (existing.isPresent()) return toResp(existing.get());
var p = Payment.newPending(r, idemKey);
repo.save(p);
var gw = router.clientFor(r.gateway());
var created = gw.createPaymentIntent(p);
p.applyGatewayCreate(created); // sets status and gateway ids
repo.save(p);
outbox.save(OutboxEvent.paymentCreated(p));
return toResp(p);
}
public void initiateRefund(String paymentId, String key) {
var p = repo.lockById(paymentId); // SELECT ... FOR UPDATE
if (p.isRefundable()) {
p.markRefundPending(key);
repo.save(p);
outbox.save(OutboxEvent.refundRequested(p));
}
}
}
7.4 Webhook handler (idempotent)
@RestController
@RequestMapping("/webhooks/gatewayX")
class WebhookController {
private final GatewayClient gw;
private final WebhookService svc;
@PostMapping
public ResponseEntity<Void> handle(@RequestHeader("X-Signature") String sig,
@RequestBody String payload) {
gw.verifyWebhook(sig, payload); // throws if invalid
var event = parseEvent(payload);
if (svc.isDuplicate(event.id())) return ResponseEntity.ok().build();
svc.record(event.id()); // insert into processed_webhooks
svc.enqueue(event); // publish to Kafka
return ResponseEntity.ok().build();
}
}
7.5 Resilience4j (example)
resilience4j:
circuitbreaker:
instances:
gatewayStripe:
slidingWindowType: COUNT_BASED
slidingWindowSize: 50
failureRateThreshold: 50
waitDurationInOpenState: 30s
retry:
instances:
gatewayStripe:
maxAttempts: 3
waitDuration: 200ms
enableExponentialBackoff: true
8) Scaling for high TPS
- Stateless APIs behind HPA. Keep synchronous path thin: validate → write row → call gateway create intent → return.
-
DB: primary for writes, read replicas for queries; partition by
created_at
ormerchant_id
at very high scale; tune connection pools (HikariCP). -
Kafka: partition by
payment_id
to keep order; consumers scale horizontally. - Redis: cluster mode for rate limit and locks.
- Cold paths async: settlement, invoice, email, ledger writes.
- Batching where allowed: settlements/refunds if the gateway supports it.
Quick capacity thumb‑rules
- 1 CPU core usually sustains ~500–1500 light REST RPS; payment paths are heavier due to gateway I/O → plan ~300–600 RPS/core.
- Keep DB writes < 5–8k TPS per primary without sharding; if you exceed, shard by merchant or region, and move events/ledger to append‑only storage first, then project to SQL (CQRS).
9) Security & compliance
- Don’t store PAN/CVV; use gateway tokens/payment methods.
- HMAC‑verify all webhooks; rotate secrets; pin source IP ranges where supported.
- Secrets in KMS/HashiCorp Vault; short‑lived tokens to the client.
- PCI‑DSS scope: keep backends SAQ‑A/SAQ‑A‑EP by using hosted fields/redirects where possible.
- PII: encrypt at rest (AES‑256), TLS 1.2+, mTLS inside cluster if feasible.
10) Ops, monitoring & testing
- Golden signals: auth rate, capture rate, refund success, p50/p95/p99 latencies per gateway, error taxonomy (4xx/5xx/timeouts).
- Business metrics: chargeback rate, authorization lift by BIN/issuer, decline reasons.
-
Trace with OpenTelemetry: propagate
traceId
across API → worker → connector. - Reconciliation job: periodic pull from gateway (last N hours/days) to repair stragglers.
- Chaos & failure drills: simulate gateway slowness/outage; verify circuit breakers and fallback states.
11) Minimal API contracts
-
POST /payments
→ 202 +{paymentId,status,clientSecret|redirectUrl}
-
GET /payments/{id}
→ current status (from OLTP or a read model cache) -
POST /payments/{id}/capture
(if using delayed capture) POST /payments/{id}/refunds
- Webhooks:
/webhooks/{gateway}
12) India‑specific notes (if relevant)
-
UPI: expect
PENDING → SUCCESS
via webhook in seconds; handle user cancellations cleanly. - NetBanking often requires redirect/return URL + webhook; always trust webhook over front‑channel.
Copy‑paste checklist
- Idempotency key + unique index
- Outbox table + publisher
- Webhook dedupe table
- Redis locks for capture/refund
- Circuit breaker + retry + timeouts
- Reconciliation cron + dashboards
- HMAC verification + secrets rotation
- Read model (cache) for status lookups
- Playbooks for gateway outage & rollback