In system design, latency refers to how long it takes for an operation to complete, such as a request, a disk I/O, or a network call. There’s no absolute “standard,” but here are commonly referenced latency benchmarks based on different layers of a distributed system.
⚙️ Common Latency Numbers (Back-of-the-Envelope):
Operation
Approximate Latency
L1 cache reference
~0.5 ns
L2 cache reference
~7 ns
Main memory (RAM)
~100 ns
SSD I/O (local)
~50–100 μs
SSD I/O (cloud/remote)
~0.5–2 ms
HDD I/O
~5–10 ms
1Gbps network round-trip (LAN)
~0.5–1 ms
Data center to data center (WAN)
~40–100 ms
API call to internal service
~10–100 ms
API call to third-party service
~100–500 ms
Cold start of a serverless function
~100 ms – 1 sec
User-perceived web latency target
<100 ms (ideal), <500 ms (acceptable)
🎯 Latency Targets by Tier:
Tier
Ideal Latency Goal
In-memory cache (Redis, Memcached)
<1 ms
Local database query
<10 ms
Remote DB/API call (internal)
<100 ms
User-facing API response time
<300–500 ms
Mobile app or web UI actions
<100 ms perceived
📦 Real-World Design Implications:
✅ Use caching (e.g., Redis) for <1ms lookups
✅ Use asynchronous processing for high-latency tasks (e.g., emails, notifications)
✅ Use rate limiting and bulk APIs to reduce round-trips
✅ Design SLAs/SLOs around acceptable latency thresholds