What is service discovery?
What is service discovery?
Automatically finding the network location (host:port) of a service instance at runtime. Needed because instances scale up/down, move, or die.
Core building blocks
-
Naming: a stable service name (e.g.,
orders.svc
). - Registration: instances announce themselves (or are registered by the platform).
- Health + heartbeats: keep the registry accurate.
- Lookup + load‑balancing: clients resolve name → instance(s) and pick one.
- Change propagation: clients update when instances change (watch/TTL).
Patterns
1) DNS‑based discovery (simple, ubiquitous)
- How it works: Each service has a DNS name (A/AAAA/SRV records). Clients resolve via DNS; TTL controls caching.
- Pros: Minimal moving parts, works everywhere (Kubernetes, clouds).
- Cons: Coarse health awareness; TTL staleness; limited metadata.
- Use when: Platform already gives reliable DNS (Kubernetes CoreDNS, AWS Cloud Map, Consul DNS).
2) Registry‑based discovery (service registry)
-
How it works: A registry (Consul, Eureka, etcd, Zookeeper) stores live instances.
- Self‑registration: instances register/deregister themselves.
- Third‑party registration: sidecar/agent (or orchestrator) registers on behalf of instances.
-
Lookup styles:
- Client‑side discovery: client queries registry and load balances (Ribbon, Spring Cloud LoadBalancer).
- Server‑side discovery: client calls a stable VIP/gateway; a smart LB (Envoy/NGINX/ALB) consults the registry.
- Pros: Health‑aware, rich metadata (version, zone), quick updates.
- Cons: Extra component to run; need HA for the registry.
3) Platform‑native discovery (Kubernetes)
-
How it works:
Service
objects provide stable virtual IPs & DNS (orders.default.svc.cluster.local
). Endpoints update automatically as Pods change. -
Add‑ons:
-
Headless Services (
ClusterIP: None
) expose Pod IPs for client‑side LB. - Service Mesh (Istio/Linkerd): sidecar proxies + control plane provide discovery + retries, mTLS, traffic policy.
-
Headless Services (
- Pros: Built‑in, automated, integrates health/readiness.
- Cons: K8s‑specific; cross‑cluster/multi‑region needs extra tooling (Gateway API, mesh, Global DNS).
4) Cloud‑managed discovery
- Examples: AWS Cloud Map + App Mesh, AWS ALB/NLB target groups, Azure App Gateway + Service Fabric, GCP Service Directory + Traffic Director.
- Pros: Managed control plane, integrates with cloud LBs & IAM.
- Cons: Cloud lock‑in; hybrid portability needs adapters.
Operational patterns & best practices
- Health checks: Use readiness (for routing) vs liveness (for restarts). Don’t route to unready instances.
- Zone‑aware routing: Prefer same AZ/zone to cut latency and egress cost.
-
Version/canary routing: Attach labels/metadata (e.g.,
version=v2
) to target subsets for canaries and blue‑green. - Backoff & caching: Cache lookups with short TTLs; exponential backoff on registry failures.
- Bulkheads & timeouts: Even with discovery, protect remote calls (timeouts/retries/circuit breakers).
- Secure discovery: mTLS between clients and registry/mesh; sign service identities (SPIFFE/SPIRE, mesh identities).
- High availability: Run registries in odd‑size quorums (3/5 nodes), backup/restore, and monitor leader elections.
- Cross‑cluster/region: Use global DNS, mesh federation, or gateways to bridge. Plan failover policies explicitly.
Quick decision guide
- Kubernetes? Use K8s Services + DNS; add service mesh if you need traffic policy/mTLS/canaries.
- Non‑K8s VMs/containers? Use Consul/Eureka + client‑side or server‑side LB (Envoy/NGINX/HAProxy).
- All‑in on a cloud? Prefer the cloud’s managed discovery + native load balancers.
- Polyglot & hybrid? Favor DNS‑compatible discovery (Consul DNS/Cloud Map) so every stack can consume it.
Tiny examples
Spring Boot + Eureka (client‑side discovery):
# app.yml
spring.application.name=orders
eureka.client.serviceUrl.defaultZone=http://eureka:8761/eureka
// Use service name instead of host:port
@LoadBalanced RestTemplate rt;
rt.getForObject("http://inventory/api/items/42", Item.class);
Kubernetes Service (server‑side LB + DNS):
apiVersion: v1
kind: Service
metadata:
name: orders
spec:
selector: { app: orders }
ports:
- port: 80
targetPort: 8080