Microservices Design Patterns: Essential Architecture and Design Guide

1) Decomposition Design Patterns

1.1 Decompose by Business Capability

Definition: Split services around high‑level business capabilities (e.g., Billing, Catalog, Shipping).
Problem: Monoliths couple unrelated features; teams step on each other; slow, risky releases.
Solution: Align service boundaries with capabilities owned by dedicated teams; each service has its own API, data, and lifecycle. Example: “Payments” service owns all payment logic and data.

1.2 Decompose by Subdomain

Definition: Use Domain-Driven Design to slice by domain subdomains (Core, Supporting, Generic) and bounded contexts.
Problem: Business language/logic varies across contexts; shared models cause ambiguity and tight coupling.
Solution: Create services per bounded context with explicit contracts/anti-corruption layers between them. Example: “Orders” vs “Inventory” as separate contexts.

1.3 Strangler Pattern

Definition: Incrementally replace a legacy system by routing specific features to new services while the old system continues.
Problem: Big‑bang rewrites are risky/slow; migration halts delivery.
Solution: Add a routing façade; strangler services implement slices; traffic for those slices moves to new code until legacy can be retired.


🔹 The Problem

  • You have a large legacy application (monolith, old tech stack).
  • You want to migrate to a new system (microservices, cloud-native, modern frameworks).
  • A full rewrite is risky, time-consuming, and disruptive (could take years and break existing customers).

🔹 The Strangler Pattern – Definition

The Strangler Fig Pattern (inspired by the strangler fig tree that grows around a tree until it replaces it) means:
👉 Incrementally replace parts of a legacy system with new services, until the legacy is “strangled” and can be retired.


🔹 How it Works (Steps)

  1. Proxy / Facade in front of legacy
    1. Place an API Gateway, router, or façade in front of the legacy system.
    2. All client requests go through this entry point.
  2. Incremental Replacement
    1. Start building new services (microservices or modules).
    2. Route specific functionality (e.g., “user profile” or “catalog service”) to the new system.
    3. Other requests continue to go to the legacy.
  3. Iterative Migration
    1. Gradually replace more features of the legacy system with new ones.
    2. Keep routing through the gateway.
  4. Retire Legacy
    1. When all features are migrated, decommission the old system.

🔹 Example

Legacy E-commerce Monolith → migrate to microservices:

  • Step 1: Put API Gateway in front of monolith.
  • Step 2: Create a Product Catalog Service → gateway routes catalog requests to it, others still go to monolith.
  • Step 3: Create Order Service, then Payment Service, etc.
  • Step 4: Eventually, all requests go to microservices → monolith is retired.

🔹 Benefits

Low risk → business continues while migration happens.
Incremental delivery → value delivered sooner.
Less downtime → no big-bang cutover.
Test & validate each migrated piece independently.


🔹 Challenges

❌ Requires careful integration layer (API Gateway, router).
❌ Legacy and new must co-exist, which adds complexity.
❌ Data consistency can be tricky (may need replication or sync between old and new DBs).


🔹 In Short

The Strangler Pattern is about:

  • Wrapping a legacy system with a façade (gateway).
  • Gradually routing traffic to new components.
  • Replacing the legacy piece by piece until it’s fully retired.

👉 Think of it as “evolving, not replacing” your system.


2) Integration Patterns

2.1 API Gateway Pattern

Definition: A single entry point that fronts many services, handling routing, auth, rate limiting, and protocol translation.
Problem: Clients must call many services, manage auth, deal with versioning/latency.
Solution: Put a gateway in front; it exposes a simpler client API and forwards/aggregates to internal services.

2.2 Aggregator Pattern

Definition: One component composes data from multiple services and returns a unified response.
Problem: Client or upstream layer must orchestrate many calls, increasing latency and complexity.
Solution: Centralize orchestration in an aggregator (gateway, BFF, or dedicated service) to fan‑out, collect, and shape the response.

2.3 Client-Side UI Composition

Definition: The UI composes a page from multiple service- or widget-specific endpoints (micro‑frontends).
Problem: A server aggregator becomes a bottleneck; teams can’t ship UI independently.
Solution: Split UI into independently deployable fragments that fetch their own data; compose at the client (or edge/CDN).


3) Database Patterns

3.1 Database per Service

Definition: Each service owns its data store and schema.
Problem: Shared databases create coupling; schema changes break unrelated services.
Solution: Strict ownership of data per service; inter-service communication via APIs/events. Tradeoff: handle cross-entity queries via composition or projections.

3.2 Shared Database per Service

Definition: Multiple services share a single physical database (often separate schemas/tables).
Problem: Easy to start but hard to evolve; tight coupling and unsafe cross-service joins.
Solution: If unavoidable (legacy/migration), enforce schema boundaries, read-only views, and change-control; plan migration to per-service DBs.

3.3 Command Query Responsibility Segregation (CQRS)

CQRS splits your application’s responsibilities into two separate models:

  • Command side (writes): handles intent to change state — create/update/delete.
  • Query side (reads): handles returning data — no state changes.

Instead of one domain model and one database doing everything, you separate write and read concerns so each can be optimized independently.


Why do this?

  • Scale reads and writes independently (most systems are read‑heavy).
  • Different models for different purposes: rich domain rules for writes; denormalized, fast views for reads.
  • Performance: queries become simpler and faster (pre‑joined, denormalized projections).
  • Flexibility: you can change read models without touching the write model.

Core ideas (at a glance)

  • Commands: intent (e.g., PlaceOrder, CancelOrder). They validate business rules and produce events. No return of data except success/IDs.
  • Queries: return data (GetOrderDetails, ListCustomerOrders). Side‑effect free.
  • Events (often): facts that happened (OrderPlaced). Used to update read models asynchronously.
  • Eventual consistency (typical): after a command succeeds, read models may lag briefly while projections update.

Typical architecture (logical)

Clients
  |                 (Commands)                      (Queries)
  |                     |                               |
  v                     v                               v
+---------+       +------------+                   +-----------+
|  API    |-----> | Command    |      Events      |  Query    |
| Gateway |       | Handlers   | ---------------> | Handlers  |
+---------+       +------------+                   +-----------+
                     |   ^                             |
                     v   |                             v
                Domain Model                     Read Models / Views
                (Aggregates)                     (SQL/NoSQL/Caches)
                     |
                     v
               Write Store
             (RDBMS/NoSQL)

With Event Sourcing (common but optional)

  • Event Sourcing stores events as the source of truth (not the latest state).
  • The write model replays events to rebuild state; read models subscribe to events to build projections.
  • Benefits: audit trail, time‑travel; Trade‑offs: complexity, storage/ops.

When to use CQRS

Good fit

  • Complex domain rules on writes; very high read volumes.
  • Many different query shapes (dashboards, mobile cards, back‑office lists).
  • Need for auditability or event‑driven integrations.

Probably overkill

  • Simple CRUD apps with modest load.
  • Teams new to messaging/eventual consistency.

Pros & Cons

Pros

  • Independent scalability (e.g., many query replicas).
  • Faster, simpler queries via denormalized projections.
  • Clearer code: commands validate rules; queries don’t mutate.

Cons

  • Eventual consistency: reads can briefly lag writes.
  • More moving parts: messaging, projections, synchronization.
  • Ops complexity (monitoring DLQs, replays, versioning).

Practical example (Order service)

Command side (Spring):

@RestController
@RequestMapping("/orders")
class OrderCommandController {
  private final OrderCommandService service;

  @PostMapping
  public ResponseEntity<?> place(@RequestBody PlaceOrderCommand cmd) {
    String orderId = service.placeOrder(cmd); // validate, persist, publish event
    return ResponseEntity.accepted().body(Map.of("orderId", orderId));
  }
}

record PlaceOrderCommand(String customerId, List<Item> items) {}

Event publication (simplified idea):

  • After placeOrder, publish OrderPlaced (Kafka/RabbitMQ).
  • A projection service consumes OrderPlaced and updates a read model (e.g., a denormalized orders_view table or Redis document).

Query side (Spring):

@RestController
@RequestMapping("/orders-view")
class OrderQueryController {
  private final OrderViewRepository repo; // points to read DB / cache

  @GetMapping("/{id}")
  public OrderView get(@PathVariable String id) {
    return repo.findById(id).orElseThrow();
  }
}

Read model (denormalized) example

-- orders_view (read DB)
order_id | customer_name | total_amount | status | items_json | updated_at

Populated asynchronously by projection handlers listening to events.


Data stores (common combos)

  • Write side: relational (strong constraints) or event store.
  • Read side: anything that answers queries fast — Postgres views, Elastic, MongoDB, Redis, Cassandra — even multiple stores per query use case.

Messaging & consistency tips

  • Use a transactional outbox or Kafka transactions to ensure events are published exactly once with the write.
  • Embrace idempotent projections (upserts by key, version checks).
  • Monitor lag between write events and read projections.

Cheat sheet (for interviews)

  • CQRS = separate models for writes and reads.
  • Commands mutate, Queries read, often via events to update read models.
  • Benefits: scale, performance, clarity.
  • Trade‑offs: eventual consistency, complexity.
  • Works great with Event Sourcing, Saga, and message brokers (Kafka/RabbitMQ).

3.4 Saga Pattern

The Saga Pattern is a key design pattern in microservices to handle distributed transactions without a global lock.


🔹 The Problem

  • In a monolith, you can use ACID transactions across modules (all-or-nothing).
  • In microservices, each service has its own DB → you cannot use a single transaction across them.
  • Example: An Order Service needs to:
    • Create Order (Order DB)
    • Reserve Inventory (Inventory DB)
    • Process Payment (Payment DB)

👉 What if step 2 or 3 fails? How do you rollback?


🔹 Saga Pattern – Definition

A Saga is a sequence of local transactions.

  • Each local transaction updates its own database.
  • If a step fails, Saga executes compensating transactions to undo the previous work.

🔹 Two Saga Approaches

1. Choreography (Event-based)

  • Each service publishes an event after it finishes.
  • The next service listens and reacts.
  • If something fails, it publishes a compensating event.

✅ Simple, no central coordinator.
❌ Harder to manage when flow is complex (many services).

Example (Order → Inventory → Payment)

  • Order Service creates order → emits OrderCreated.
  • Inventory Service reserves stock → emits StockReserved.
  • Payment Service charges card → emits PaymentApproved.
  • If Payment fails → emits PaymentDeclined → Inventory Service listens and releases stock → Order Service cancels order.

2. Orchestration (Central Controller)

  • A Saga Orchestrator service tells each participant what to do.
  • It sends a command, waits for reply, then decides next step.

✅ Clear control, easier to debug.
❌ Orchestrator can become a central bottleneck.

Example (Orchestrator)

  • Orchestrator → Order Service: “Create Order”
  • Orchestrator → Inventory Service: “Reserve Stock”
  • Orchestrator → Payment Service: “Process Payment”
  • If Payment fails → Orchestrator sends “Cancel Order” and “Release Stock”.

🔹 Compensation Example

Suppose Payment fails:

  1. Order created ✅
  2. Stock reserved ✅
  3. Payment failed ❌

Compensating actions:

  • Cancel order
  • Release stock

🔹 Benefits of Saga Pattern

✅ Enables distributed transactions without 2-phase commit
✅ Increases resilience with eventual consistency
✅ Works well with microservices and messaging systems (Kafka, RabbitMQ)


🔹 Challenges

❌ Complex error handling
❌ Compensation logic must be carefully designed (not always possible)
❌ Debugging event chains can be hard


🔹 In Short

👉 The Saga Pattern = break a distributed transaction into a series of local transactions, with compensating transactions for rollback.

  • Choreography → event-driven, no central control.
  • Orchestration → central Saga coordinator.

4) Observability Patterns

4.1 Log Aggregation

Definition: Centralize logs from all services (structured JSON) into a searchable store.
Problem: Debugging distributed issues is impossible with siloed logs.
Solution: Standardize log format/correlation IDs; ship logs to a centralized system (e.g., ELK/OpenSearch); set retention and alerts.

4.2 Performance Metrics

Definition: Collect time‑series metrics (RED/USE/Golden signals) per service and infra.
Problem: You can’t detect regressions or capacity issues without quantitative signals.
Solution: Expose metrics endpoints; scrape/push to TSDB; build SLOs/alerts and dashboards.

4.3 Distributed Tracing

Definition: End‑to‑end traces of requests across services with spans and context propagation.
Problem: Hard to locate bottlenecks and failing hops in call chains.
Solution: Instrument with OpenTelemetry (trace IDs in logs/headers); visualize traces; sample wisely.

4.4 Health Check

Definition: Endpoints that report service health (liveness/readiness/startup).
Problem: Orchestrators need to know when to start, stop, or route traffic; naive checks cause flapping.
Solution: Provide separate checks; readiness covers dependencies; liveness is lightweight; integrate with deployment/auto‑scaling.


5) Cross‑Cutting Concern Patterns

5.1 Externalized Configuration

Definition: Store config outside the binary (files, env vars, config service) with versioning and secrets management.
Problem: Rebuilds/redeploys for simple config changes; secrets end up in code.
Solution: Twelve‑Factor style config, config servers, secret managers, dynamic reload with safety gates.

5.2 Service Discovery

In microservices, instances come and go (autoscaling, crashes, rolling deploys). You can’t hardcode IPs.

Service Discovery lets clients find the current network locations (IP/port) of a service at runtime.

Core pieces:

  • Service Registry: source of truth (who is up, where?).
  • Service Registration: instances register/deregister themselves (or are registered by an agent).
  • Health Checks: keep only healthy instances discoverable.
  • Discovery Client / Load Balancer: picks an instance to call.

Two discovery models

1) Client‑side discovery

  • Flow: Client queries registry → picks an instance (client‑side load balancing) → calls it directly.
  • Pros: Simple, fewer hops; smart clients.
  • Cons: Each client needs discovery logic & balancing.
  • Tech: Eureka + Spring Cloud LoadBalancer, Consul, Zookeeper.

2) Server‑side discovery

  • Flow: Client calls a router/load balancer (e.g., API Gateway/Envoy/NGINX). The router consults registry and forwards.
  • Pros: Thin clients; centralized policies/traffic shaping.
  • Cons: Extra hop; router is critical infra.
  • Tech: Kubernetes Services (kube-proxy + DNS), Envoy + Consul, AWS ALB + Cloud Map.

Common implementations

A) Spring Cloud Netflix Eureka (client‑side)

Registry: Eureka Server
Clients: Spring Boot apps with Eureka Client + Spring Cloud LoadBalancer

Eureka Server (Spring Boot)
pom.xml:

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
</dependency>

@SpringBootApplication:

@EnableEurekaServer
@SpringBootApplication
public class RegistryApp { public static void main(String[] a){ SpringApplication.run(RegistryApp.class,a); } }

application.yml (server):

server.port: 8761
eureka:
  client:
    register-with-eureka: false
    fetch-registry: false

Service registering with Eureka

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

application.yml (client):

spring.application.name: inventory-service
eureka.client.service-url.defaultZone: http://localhost:8761/eureka

Calling another service with client‑side load‑balancing

@Bean
@LoadBalanced
RestTemplate restTemplate() { return new RestTemplate(); }

// Use logical name (from spring.application.name)
String res = restTemplate.getForObject("http://order-service/api/orders/123", String.class);

B) Kubernetes (server‑side by design)

  • Pods register implicitly via the Kube API.
  • Service (ClusterIP/NodePort/LoadBalancer) provides a stable virtual IP + DNS name.
  • kube-dns/CoreDNS gives you http://order-service.default.svc.cluster.local.
  • kube-proxy load‑balances across pod endpoints.
  • No extra code needed—discovery is built‑in.

C) Consul

  • Acts as registry + health checks + KV.
  • Works with Envoy or Fabio for server‑side routing, or with client‑side libraries.

D) Service Mesh (sidecar pattern)

  • Istio / Linkerd / Consul Service Mesh: sidecar proxies (Envoy) handle discovery, mTLS, retries, circuit breaking.
  • App just calls a local proxy; mesh does the rest (policy‑driven).

Health checks & registration styles

  • Self‑registration: service registers itself (Eureka client).
  • 3rd‑party registration: an agent/sidecar (Consul agent, K8s controller) registers on behalf of the service.
  • Active health checks (HTTP/TCP) or passive (observed failures) keep the registry accurate.

When to use what

  • Kubernetes: default choice if you’re on K8s—Service + DNS is enough.
  • VM/Bare metal: Eureka/Consul/ZooKeeper + gateway/Envoy.
  • Polyglot + advanced traffic control: Service Mesh.

Pitfalls & best practices

  • Stale registry: ensure frequent health checks/TTL to prune dead instances quickly.
  • Backoff/retries: combine with circuit breakers and timeouts (Resilience4j/Envoy).
  • Service identity & TLS: use mTLS (mesh) or mutual TLS at gateway.
  • Blue/Green & canaries: favor server‑side discovery with a gateway/mesh for traffic shaping.
  • Name everything: stable logical names (e.g., order-service) across environments.

TL;DR

Service Discovery lets services find each other dynamically via a registry, health checks, and either client‑side or server‑side routing.

  • Spring world → Eureka + LoadBalancer.
  • Kubernetes → Service + DNS (built‑in).
  • Advanced control → Service Mesh (Istio/Linkerd/Consul) with sidecars.

5.3 Circuit Breakers


🔹 The Problem

In distributed systems (like microservices):

  • Service A calls service B over the network.
  • If B is down, slow, or unstable, repeated calls from A:
    • Waste resources (threads, CPU, DB connections).
    • Increase latency for users.
    • Can cause cascading failures (A fails, then C which depends on A, etc.).

We need a way to fail fast and prevent system meltdown.


🔹 The Circuit Breaker Pattern — Definition

The Circuit Breaker Pattern is a resilience pattern that monitors remote calls and:

  • Closed → Calls flow normally (default).
  • Open → Calls are blocked immediately (fail fast).
  • Half-Open → Allows a few trial requests to see if recovery is possible.

👉 Think of it like an electrical circuit breaker: if too many failures happen, “trip the switch” to protect the system.


🔹 States of Circuit Breaker

  1. Closed (healthy):

    • All requests allowed.

    • Failures are counted.

    • If failure threshold is crossed → switch to Open.

  2. Open (tripped):

    • Requests fail immediately (fallback executed).

    • After a timeout, switch to Half-Open.

  3. Half-Open (test mode):

    • Allow limited trial requests.

    • If they succeed → back to Closed.

    • If they fail → back to Open.


🔹 Example Flow

Service A → Service B (down):

  • After 5 consecutive failures → Circuit trips to Open.
  • For 30 seconds, A does not call B at all, but immediately returns fallback.
  • After 30 sec → moves to Half-Open → allows 1 request.
    • If success → Closed (normal).
    • If fail → Open again.

🔹 Benefits

✅ Prevents cascading failures.
✅ Improves response time (fail fast instead of waiting for timeouts).
✅ Gives failing service time to recover.
✅ Allows fallback strategies (cache, default response, queued retry).


🔹 Implementation in Java (Spring Boot + Resilience4j)

Dependency (Maven):

<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-spring-boot3</artifactId>
  <version>2.2.0</version>
</dependency>

Config (application.yml):

resilience4j.circuitbreaker:
  instances:
    myServiceCB:
      slidingWindowSize: 10
      failureRateThreshold: 50
      waitDurationInOpenState: 30s
      permittedNumberOfCallsInHalfOpenState: 3

Service Call:

import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;

@Service
public class OrderService {

    @CircuitBreaker(name = "myServiceCB", fallbackMethod = "fallbackOrders")
    public String getOrders() {
        // Call external service
        return restTemplate.getForObject("http://inventory/api/orders", String.class);
    }

    public String fallbackOrders(Exception ex) {
        return "Fallback: inventory service not available right now.";
    }
}

🔹 Real-World Use Cases

  • Microservices → prevent cascading failures.
  • Payment gateway calls → fallback to retry queue.
  • External APIs → return cached response if provider is down.

🔹 In Short

The Circuit Breaker Pattern protects your system from cascading failures by “tripping” after repeated errors, blocking calls temporarily, and then retrying cautiously.

  • States: Closed → Open → Half-Open.
  • Helps with: resilience, fail fast, fallback handling.
  • Implemented via libs like Resilience4j, Hystrix (legacy), Spring Cloud Circuit Breaker.

5.4 Blue-Green Deployments

Definition: Run two production environments (Blue and Green); switch traffic to the new one when ready.
Problem: In‑place deploys cause downtime and risky rollouts.
Solution: Deploy to idle environment, run checks, shift traffic (router/ALB), and roll back by flipping back; pair with DB expand‑contract migrations.


Quick usage hints

  • Prefer business capability + subdomain for boundaries, then choose API Gateway/Aggregators for client simplicity.
  • Default to Database per Service; add CQRS + Sagas when complexity warrants.
  • Bake in logs/metrics/traces/health from day one.
  • Use externalized config, discovery, circuit breakers, blue‑green to stay resilient and ship safely.
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.