How to find Memory Leakage & Fix them?

August 13, 2025

1) Spot the leak (symptoms)

Process RSS/heap grows steadily and never comes down.
More frequent/full GCs, rising old‑gen after each GC, GC pauses lengthen.
OOMKilled (K8s) or OutOfMemoryError in logs.
Throughput degrades over time without more traffic.

2) Reproduce & measure

Reproduce with realistic load (e.g., JMeter/k6).
Baseline metrics: heap used, GC time, RSS, objects allocated/sec, open FDs.
In containers, note limits (-Xmx vs cgroup limits).

3) Capture evidence

Java

Enable GC logs: -Xlog:gc*:file=gc.log:tags,uptime,level.
Take heap dumps at high usage or OOM:
- On OOM: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps
- On demand: jcmd <pid> GC.heap_dump /tmp/heap.hprof or jmap -dump:format=b,file=/tmp/heap.hprof <pid>
Take JFR profile for allocation hot spots: jcmd <pid> JFR.start name=leak settings=profile filename=app.jfr

Node / Python / .NET (quick notes)

Node: Chrome DevTools → Heap snapshot; clinic heapprofiler.
Python: tracemalloc, objgraph, guppy3/heapy.
.NET: dotMemory / PerfView; dump with dotnet-gcdump, analyze with Visual Studio.

4) Analyze the heap

Open dump in Eclipse MAT or YourKit/VisualVM.
Look for:
- Leak suspects / dominator tree: objects retaining large subgraphs.
- Growing collections (e.g., HashMap, ArrayList, caches) with stack traces to allocation sites.
- Class histograms: jcmd <pid> GC.class_histogram.
Cross‑check with JFR allocation flame graphs to find hot allocation paths.

5) Usual culprits (Java)

Unbounded caches/collections (often static singletons).
- ✅ Fix: add size limits & eviction (Caffeine/Guava), time‑based expiry, or Weak/SoftReference where appropriate.
Listeners/observers not removed, event bus subscribers lingering.
- ✅ Fix: unsubscribe on close; use weak listeners if supported.
ThreadLocals not cleared (esp. in pools).
- ✅ Fix: try { ... } finally { threadLocal.remove(); }
Connections/streams not closed (JDBC, HTTP, I/O).
- ✅ Fix: use try‑with‑resources; pool with max‑lifetime; leak detection in HikariCP.
Classloader leaks in app servers (static caches, threads preventing unload).
- ✅ Fix: stop non‑daemon threads on shutdown; avoid static refs to app classes; verify libraries are container‑friendly.
Logging/backpressure issues buffering in memory.
- ✅ Fix: async logging with bounded queues; drop/flush policies.
JSON/XML mappers reusing builders incorrectly.
- ✅ Fix: reuse safely or create per‑request; avoid holding on to full payloads.

6) Rectify systematically (step‑by‑step)

Pinpoint allocation site (JFR/stack trace from MAT).
Identify retention path (who holds it alive?) via dominator tree.
Introduce bounds / lifecycle hooks (eviction, close, unsubscribe).
Prove the fix:
1. Re‑run load → heap stabilizes after GC, old‑gen plateau, GC time down.
2. Compare before/after GC logs & heap histograms.
Add guardrails:
1. Leak‑detection in pools (Hikari leakDetectionThreshold).
2. Bounded queues, timeouts, backpressure.
3. Canary + monitors (heap used %, GC time %, RSS vs Xmx).

7) Quick Java examples

A) Unbounded cache → bounded with Caffeine

Cache<String, User> userCache = Caffeine.newBuilder()
    .maximumSize(100_000)
    .expireAfterWrite(Duration.ofMinutes(10))
    .recordStats()
    .build();

B) ThreadLocal cleanup

private static final ThreadLocal<SimpleDateFormat> F = ThreadLocal.withInitial(
    () -> new SimpleDateFormat("yyyy-MM-dd")
);

try {
    return F.get().format(date);
} finally {
    F.remove(); // important when threads are pooled
}

C) Always close resources

try (Connection c = ds.getConnection();
     PreparedStatement ps = c.prepareStatement(sql);
     ResultSet rs = ps.executeQuery()) {
    // use rs
} // auto-closed

D) MAT workflow

Open dump → Leak Suspects Report → inspect top dominators →
Right‑click suspect collection → Path to GC Roots → find static/singleton holder →
Patch code, redeploy, retest.

8) Native/off‑heap leaks

Check direct byte buffers, JNI, netty arenas, image libs.
Track with NMT (Java): -XX:NativeMemoryTracking=detail + jcmd <pid> VM.native_memory summary.
Ensure netty/buffer releases, cap off‑heap sizes.

9) Production hygiene (prevention)

Set sensible Xms/Xmx, container memory limits, and alerts (heap >80%, GC >10% CPU).
Autoscaling on CPU + GC time, not just requests.
SLOs for latency; dashboards for heap/GC/RSS/open FDs.
Regular heap snapshots in staging under load.

Back to blog

Item added to your cart

How to find Memory Leakage & Fix them?

1) Spot the leak (symptoms)

2) Reproduce & measure

3) Capture evidence

Java

Node / Python / .NET (quick notes)

4) Analyze the heap

5) Usual culprits (Java)

6) Rectify systematically (step‑by‑step)

7) Quick Java examples

8) Native/off‑heap leaks

9) Production hygiene (prevention)

Leave a comment