⏱️ Standard Latency Benchmarks in System Design

In system design, latency refers to how long it takes for an operation to complete, such as a request, a disk I/O, or a network call. There’s no absolute “standard,” but here are commonly referenced latency benchmarks based on different layers of a distributed system.


⚙️ Common Latency Numbers (Back-of-the-Envelope):

Operation Approximate Latency
L1 cache reference ~0.5 ns
L2 cache reference ~7 ns
Main memory (RAM) ~100 ns
SSD I/O (local) ~50–100 μs
SSD I/O (cloud/remote) ~0.5–2 ms
HDD I/O ~5–10 ms
1Gbps network round-trip (LAN) ~0.5–1 ms
Data center to data center (WAN) ~40–100 ms
API call to internal service ~10–100 ms
API call to third-party service ~100–500 ms
Cold start of a serverless function ~100 ms – 1 sec
User-perceived web latency target <100 ms (ideal), <500 ms (acceptable)

🎯 Latency Targets by Tier:

Tier Ideal Latency Goal
In-memory cache (Redis, Memcached) <1 ms
Local database query <10 ms
Remote DB/API call (internal) <100 ms
User-facing API response time <300–500 ms
Mobile app or web UI actions <100 ms perceived

📦 Real-World Design Implications:

  • ✅ Use caching (e.g., Redis) for <1ms lookups
  • ✅ Use asynchronous processing for high-latency tasks (e.g., emails, notifications)
  • ✅ Use rate limiting and bulk APIs to reduce round-trips
  • ✅ Design SLAs/SLOs around acceptable latency thresholds
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.