Engineering

Garbage Collection in the JVM

From the generational hypothesis to ZGC: understanding heap memory management

Learning Objectives

By the end of this module you will be able to:

Explain the generational hypothesis and how it shapes JVM heap layout.
Describe the G1 collector's region-based design and how it enables predictable pause targets.
Contrast ZGC and Shenandoah's concurrent evacuation strategies with G1.
Explain why write barriers are necessary and how the card table supports cross-generational references.
Select an appropriate GC strategy given latency, throughput, and memory footprint requirements.
Contrast JVM tracing GC with Rust's ownership model and Python's reference counting.

Core Concepts

The Generational Hypothesis

The foundation of modern JVM garbage collection rests on an empirical observation: most objects have short lifetimes and "die young." This generational hypothesis has shaped decades of GC design because it enables a powerful optimization—rather than scanning the entire heap to find garbage, collectors focus effort where most garbage accumulates: the young generation.

In the JVM, the heap is logically divided into generations organized by object age. Newly allocated objects go into the young generation (containing the Eden space and survivor spaces), where most die quickly. When the young generation fills, a minor collection runs. Only objects that survive young-generation collection get promoted to the old generation, which is collected far less frequently. This asymmetry works because the cost of collecting only the young generation is proportional to the number of live objects there—typically a small fraction of total heap size.

For engineers coming from Rust, this is radically different. Rust has no garbage collection at all; instead, the compiler enforces ownership and borrowing rules at compile time, eliminating runtime memory management overhead entirely. Python engineers will recognize the tracing approach (Java's GC traces reachable objects), though Python primarily uses reference counting supplemented with a generational collector to handle cyclic garbage.

Write Barriers and the Card Table

The generational design introduces a problem: the old generation can contain references to young-generation objects. If an old-generation object holds a reference to a young object, that young object cannot be garbage collected, yet during a minor collection the GC needs to know about this reference without scanning the entire old generation.

The solution is the write barrier—a small fragment of code injected by the JIT compiler at every location where an object reference is stored in a field. When an old-generation object is modified to hold a reference to a young object, the write barrier detects this and records it.

The recorded references are tracked in a card table, a data structure that divides the heap into fixed-size cards (typically corresponding to cache lines), marking them "dirty" when they contain references from older to younger generations. During a minor collection, the GC only needs to scan the marked cards—a tiny fraction of the old generation—rather than the entire old generation.

G1 Heap Architecture

The Garbage-First (G1) collector extends the generational model with a region-based design. Rather than having contiguous young and old generations, G1 partitions the heap into equally-sized regions (1MB to 32MB depending on heap size, with the goal of no more than 2048 regions). Eden, survivor, and old regions are now logical sets scattered across the heap, not contiguous blocks.

This architecture is powerful because it decouples GC pause targets from application growth. G1 can collect a small set of regions with predictable cost, enabling the collector to meet a user-specified pause-time target (e.g., 200ms). When the old generation grows, G1 doesn't need longer pauses; instead, it performs mixed collections that evacuate live objects from sets of old-generation regions in addition to young regions, adjusting the number of old regions collected based on a target number of mixed collections and the percentage of live objects in each region.

On multi-socket systems, G1 includes NUMA awareness for heap allocation (introduced in JDK 14) that preferentially selects free regions from the NUMA node to which the current thread is bound, exploiting non-uniform memory access characteristics. This keeps objects on the same NUMA node in the young generation, improving performance on large multi-socket servers.

Concurrent Collectors: ZGC and Shenandoah

While G1 uses stop-the-world collection phases (pausing all mutator threads), modern low-latency collectors push toward concurrent operation. ZGC achieves sub-millisecond pause times regardless of heap size, scaling from 8MB to 16TB heaps while maintaining microsecond-scale pause times. This is achieved by performing nearly all expensive work—marking, compaction, reference updating—concurrently while the application runs.

Shenandoah similarly performs marking, evacuation, and reference updating concurrently. How is this possible? Shenandoah and ZGC use load barriers—code injected at every location where the application reads an object reference. When the application loads a reference, the barrier can perform actions like pointer remapping or marking before the application uses the referenced object, enabling the collector to move objects around while the application runs.

This is a fundamental shift in thinking for engineers accustomed to ownership-based (Rust) or reference-counting (Python) systems. ZGC's and Shenandoah's barriers add overhead to every reference access, trading microseconds of latency at every read for guaranteed sub-10ms GC pauses—a bet that eliminating long GC pauses improves overall latency predictability for latency-sensitive systems.

Compare & Contrast

JVM Garbage Collection vs. Rust Ownership

Aspect	JVM GC	Rust Ownership
Memory Management	Automatic tracing; objects are reclaimed when unreachable	Manual via compiler-enforced rules; memory freed when owner goes out of scope
Flexibility	Can create reference cycles; GC must detect and break them	Borrowing prevents cycles; moving ownership is explicit
Pauses	Stop-the-world or concurrent pauses during collection	No runtime pauses; finalization is deterministic (destructors)
Performance	Adds GC overhead; trades off pause time vs. throughput	Zero GC cost; higher performance at the cost of compile-time burden on the programmer
Safety	Memory safe by default; dangling pointers impossible	Memory safe by default; enforced at compile time

Rust engineers will notice that GC feels loose—you can hold references anywhere, pass objects into libraries that hold them longer, and the GC eventually reclaims them. Rust's ownership model forces you to be explicit about who owns what, but ensures destruction happens exactly when the last owner exits scope.

JVM GC vs. Python's Reference Counting

Python's primary memory management mechanism is reference counting—each object tracks how many references point to it, and is freed when the count drops to zero. This is deterministic and simple, but cannot detect cycles (a list containing itself, or A->B->A references). Python supplements reference counting with a generational garbage collector specifically designed to handle reference cycles, focusing collection on container objects that can form cycles.

The JVM uses pure tracing GC: the collector starts from known roots (global variables, stack frames) and traces all reachable objects, reclaiming anything not reachable. This approach naturally handles cycles (an unreachable cycle is detected in one pass) but requires pauses or concurrent barriers. Python's approach is more granular—individual objects can be freed immediately—but cycles require periodic full-heap collection. Both are forms of automatic memory management, trading different concerns.

G1 vs. ZGC vs. Shenandoah

Attribute	G1	ZGC	Shenandoah
Pause Time	10-200ms (predictable)	<1ms (sub-millisecond)	<10ms
Pause Variability	Depends on collection set size	Independent of heap size	Independent of heap size
Scalability	4GB to 100GB+ heaps	8MB to 16TB heaps	Any heap size
Concurrent Work	Young-gen marking, reference updates	Nearly everything	Marking, evacuation, updates
Barriers	Write barriers only	Write + load barriers + colored pointers	Write + load barriers
Throughput Overhead	Low (1-3% barrier cost)	Moderate (5-10% load barrier cost)	Moderate (5-10% barrier cost)
Default Use	JDK 9+ default; most workloads	Latency-critical systems	Latency-critical systems

G1 is the workhorse—predictable pauses that suit most applications. ZGC and Shenandoah are specialized tools for systems where GC pauses cause unacceptable tail latency (trading throughput overhead for pause-time guarantees).

Worked Example

Scenario: Predicting G1 Pause Times

Imagine you are designing a financial trading system with a 50GB heap on a 16-core machine. Your requirement is that GC pauses never exceed 100ms. You need to estimate if G1 can meet this target.

Step 1: Understand G1 collection cost

G1's pause time is primarily determined by:

Live objects in the collection set (must be scanned)
Number of regions to evacuate
GC parallel thread efficiency

With 50GB heap and default region size of 32MB: 50000 / 32 = 1562 regions.

Step 2: Estimate young-generation size

Your application allocates 10GB/s. If the young generation is 1GB, a minor collection will occur every ~100ms. G1 targets 200ms max pause, so a 1GB young gen is safe. Live objects in young gen are typically 5-10% of young-gen size = 50-100MB to scan per minor collection.

Step 3: Account for mixed collections

Once young gen is full, G1 transitions to mixed collections that include old-generation regions. If 30% of your heap is live long-lived data, and G1 wants to reclaim space gradually over ~8 mixed collections, then each mixed collection adds 50GB * 0.30 / 8 / 32MB = ~60 regions.

At 32MB per region and modern concurrent marking, evacuating 60 regions with ~10GB of live objects in 100ms is tight but feasible with 16 cores.

Step 4: Configure and test

-XX:+UseG1GC \
-XX:MaxGCPauseMillis=100 \
-XX:InitialHeapSize=50G \
-XX:MaxHeapSize=50G \
-XX:ParallelGCThreads=16

Run a load test. If you see pause times consistently under 100ms and throughput is acceptable, G1 meets your requirements. If pauses spike, either reduce heap size, increase live-object ratio estimates, or consider ZGC.

Common Misconceptions

Misconception 1: "More Memory Always Means Longer GC Pauses"

Reality: With modern collectors, heap size does not directly determine pause time.

G1's mixed collection strategy means adding more old-generation space doesn't require longer pauses—G1 simply spreads evacuation over more collection cycles. ZGC and Shenandoah explicitly decouple pause time from heap size through concurrent work. However, heap size does affect throughput: more live objects mean more work during collection and more data to evacuate, so pause frequency may increase.

Misconception 2: "Write Barriers Add Huge Overhead"

Reality: Write barriers are cheap; load barriers are expensive.

Write barriers are lightweight because they are conditional checks that often do nothing. If an object reference is written and both objects are in the same generation, the barrier exits immediately. Load barriers, by contrast, run on every reference access and add 5-10% throughput overhead, which is why ZGC and Shenandoah are reserved for latency-critical systems.

Misconception 3: "Generational Collection Means Younger = Faster"

Reality: Generation age is a heuristic, not a guarantee.

The generational hypothesis assumes most objects die young, but long-lived objects eventually promote to old generation where they stay. If your application creates objects that seem short-lived but are kept alive by caches or thread pools, they'll stick around and be promoted, reducing the efficiency of the young-generation-only collections.

Boundary Conditions

When G1 Breaks Down

G1's region-based design excels for heaps with mixed lifetimes (some young, some old objects). It falters when:

Heap dominated by new objects: If nearly all objects are allocated and freed quickly (e.g., request-per-thread web service), G1's old-generation handling adds overhead. Consider Shenandoah or ZGC.
Heap dominated by long-lived objects: If most heap is permanent data structures (parsed configs, caches), G1 spends collection cycles in mixed mode evacuating old regions with few dead objects. Consider manual tuning or ZGC.
NUMA fragmentation: On very large multi-socket systems (8+ sockets), NUMA-aware allocation helps but doesn't eliminate cross-socket references entirely. Homogeneous workloads benefit more than heterogeneous ones.

When ZGC and Shenandoah Break Down

Low-latency collectors impose costs:

Read-heavy workloads: Load barriers add overhead to every reference access. If your application reads objects constantly (graph traversal, data structure iteration), this overhead accumulates.
Small heaps: The GC infrastructure overhead (colored pointers, barrier code, concurrent threads) becomes noticeable on heaps under 1GB.
Throughput-critical systems: Financial batch processing or data analytics that tolerate 500ms pauses but need maximum throughput should use G1 or even SerialGC instead.

The Fundamental Trade-off

Recent theoretical work establishes that garbage collection performance is bounded by fundamental space-time trade-offs: no algorithm can simultaneously minimize pause time, collection delay, and memory overhead. You must choose which dimension to optimize:

G1: Targets pause time (100-200ms) while maintaining reasonable throughput.
ZGC: Targets pause time (<1ms) at the cost of 5-10% throughput and requiring 25-35% free memory.
Shenandoah: Targets pause time (<10ms) with moderate throughput cost.

No collector dominates all three dimensions.

Key Takeaways

The generational hypothesis is empirically sound. Most objects die young. Collectors exploit this with cheap young-generation collection and infrequent old-generation collection.
Barriers are the key to concurrency. Write barriers (G1) are cheap checks that track old-to-young references. Load barriers (ZGC, Shenandoah) enable concurrent object relocation at higher cost but lower pause times.
Heap partitioning enables predictable pauses. G1's regions and ZGC's concurrent marking decouple pause time from heap size, allowing predictable latency on large heaps.
Collector choice is workload-dependent. G1 suits diverse workloads; ZGC suits latency-critical systems; Shenandoah is a middle ground. Throughput-critical systems may still benefit from simpler collectors.
Concurrent GC requires trade-offs. Concurrent collectors offer better scalability on multicore systems but add implementation complexity and overhead. The cost-benefit analysis depends on your latency requirements.
No free lunch. Fundamental mathematical bounds mean pause time, collection delay, and memory efficiency cannot all be minimized simultaneously. You choose which dimension to optimize.

Further Exploration

JVM Internals

JEP 304: Garbage Collector Interface — Understand how the JVM abstracts garbage collectors
Oracle GC Tuning Guide — Practical guidance on selecting and tuning collectors for production workloads

Performance Analysis

Java Microbenchmark Harness (JMH) — Compare collectors on your workload; don't rely on marketing claims

Language Comparison

Rust vs. Java Memory Management — Study how ownership rules prevent entire categories of GC problems
Python's Cyclic GC — Compare hybrid approach (reference counting + generational GC) with JVM's pure tracing