⏱️ Standard Latency Benchmarks in System Design

June 25, 2025

In system design, latency refers to how long it takes for an operation to complete, such as a request, a disk I/O, or a network call. There’s no absolute “standard,” but here are commonly referenced latency benchmarks based on different layers of a distributed system.

⚙️ Common Latency Numbers (Back-of-the-Envelope):

Operation	Approximate Latency
L1 cache reference	~0.5 ns
L2 cache reference	~7 ns
Main memory (RAM)	~100 ns
SSD I/O (local)	~50–100 μs
SSD I/O (cloud/remote)	~0.5–2 ms
HDD I/O	~5–10 ms
1Gbps network round-trip (LAN)	~0.5–1 ms
Data center to data center (WAN)	~40–100 ms
API call to internal service	~10–100 ms
API call to third-party service	~100–500 ms
Cold start of a serverless function	~100 ms – 1 sec
User-perceived web latency target	<100 ms (ideal), <500 ms (acceptable)

🎯 Latency Targets by Tier:

Tier	Ideal Latency Goal
In-memory cache (Redis, Memcached)	<1 ms
Local database query	<10 ms
Remote DB/API call (internal)	<100 ms
User-facing API response time	<300–500 ms
Mobile app or web UI actions	<100 ms perceived

📦 Real-World Design Implications:

✅ Use caching (e.g., Redis) for <1ms lookups
✅ Use asynchronous processing for high-latency tasks (e.g., emails, notifications)
✅ Use rate limiting and bulk APIs to reduce round-trips
✅ Design SLAs/SLOs around acceptable latency thresholds

Back to blog

Item added to your cart

⏱️ Standard Latency Benchmarks in System Design

⚙️ Common Latency Numbers (Back-of-the-Envelope):

🎯 Latency Targets by Tier:

📦 Real-World Design Implications:

Leave a comment