What is service discovery?

What is service discovery?

Automatically finding the network location (host:port) of a service instance at runtime. Needed because instances scale up/down, move, or die.

Core building blocks

  • Naming: a stable service name (e.g., orders.svc).
  • Registration: instances announce themselves (or are registered by the platform).
  • Health + heartbeats: keep the registry accurate.
  • Lookup + load‑balancing: clients resolve name → instance(s) and pick one.
  • Change propagation: clients update when instances change (watch/TTL).

Patterns

1) DNS‑based discovery (simple, ubiquitous)

  • How it works: Each service has a DNS name (A/AAAA/SRV records). Clients resolve via DNS; TTL controls caching.
  • Pros: Minimal moving parts, works everywhere (Kubernetes, clouds).
  • Cons: Coarse health awareness; TTL staleness; limited metadata.
  • Use when: Platform already gives reliable DNS (Kubernetes CoreDNS, AWS Cloud Map, Consul DNS).

2) Registry‑based discovery (service registry)

  • How it works: A registry (Consul, Eureka, etcd, Zookeeper) stores live instances.
    • Self‑registration: instances register/deregister themselves.
    • Third‑party registration: sidecar/agent (or orchestrator) registers on behalf of instances.
  • Lookup styles:
    • Client‑side discovery: client queries registry and load balances (Ribbon, Spring Cloud LoadBalancer).
    • Server‑side discovery: client calls a stable VIP/gateway; a smart LB (Envoy/NGINX/ALB) consults the registry.
  • Pros: Health‑aware, rich metadata (version, zone), quick updates.
  • Cons: Extra component to run; need HA for the registry.

3) Platform‑native discovery (Kubernetes)

  • How it works: Service objects provide stable virtual IPs & DNS (orders.default.svc.cluster.local). Endpoints update automatically as Pods change.
  • Add‑ons:
    • Headless Services (ClusterIP: None) expose Pod IPs for client‑side LB.
    • Service Mesh (Istio/Linkerd): sidecar proxies + control plane provide discovery + retries, mTLS, traffic policy.
  • Pros: Built‑in, automated, integrates health/readiness.
  • Cons: K8s‑specific; cross‑cluster/multi‑region needs extra tooling (Gateway API, mesh, Global DNS).

4) Cloud‑managed discovery

  • Examples: AWS Cloud Map + App Mesh, AWS ALB/NLB target groups, Azure App Gateway + Service Fabric, GCP Service Directory + Traffic Director.
  • Pros: Managed control plane, integrates with cloud LBs & IAM.
  • Cons: Cloud lock‑in; hybrid portability needs adapters.

Operational patterns & best practices

  • Health checks: Use readiness (for routing) vs liveness (for restarts). Don’t route to unready instances.
  • Zone‑aware routing: Prefer same AZ/zone to cut latency and egress cost.
  • Version/canary routing: Attach labels/metadata (e.g., version=v2) to target subsets for canaries and blue‑green.
  • Backoff & caching: Cache lookups with short TTLs; exponential backoff on registry failures.
  • Bulkheads & timeouts: Even with discovery, protect remote calls (timeouts/retries/circuit breakers).
  • Secure discovery: mTLS between clients and registry/mesh; sign service identities (SPIFFE/SPIRE, mesh identities).
  • High availability: Run registries in odd‑size quorums (3/5 nodes), backup/restore, and monitor leader elections.
  • Cross‑cluster/region: Use global DNS, mesh federation, or gateways to bridge. Plan failover policies explicitly.

Quick decision guide

  • Kubernetes? Use K8s Services + DNS; add service mesh if you need traffic policy/mTLS/canaries.
  • Non‑K8s VMs/containers? Use Consul/Eureka + client‑side or server‑side LB (Envoy/NGINX/HAProxy).
  • All‑in on a cloud? Prefer the cloud’s managed discovery + native load balancers.
  • Polyglot & hybrid? Favor DNS‑compatible discovery (Consul DNS/Cloud Map) so every stack can consume it.

Tiny examples

Spring Boot + Eureka (client‑side discovery):

# app.yml
spring.application.name=orders
eureka.client.serviceUrl.defaultZone=http://eureka:8761/eureka
// Use service name instead of host:port
@LoadBalanced RestTemplate rt;
rt.getForObject("http://inventory/api/items/42", Item.class);

Kubernetes Service (server‑side LB + DNS):

apiVersion: v1
kind: Service
metadata:
  name: orders
spec:
  selector: { app: orders }
  ports:
    - port: 80
      targetPort: 8080
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.