Engineering

Garbage Collection

How runtimes reclaim memory automatically — and why the details matter

Lead Summary

Garbage collection (GC) is a form of automatic memory management that relieves programmers from explicitly tracking and releasing heap-allocated objects. When an object becomes unreachable — no longer referenced by any live part of the program — a garbage collector automatically reclaims and returns its memory to the heap. This mechanism is foundational to most modern languages, including Java, Python, Go, JavaScript, and Erlang, and has been an active area of systems research for over sixty years.

The discipline encompasses a surprisingly wide design space: when to collect, which objects to examine, how to handle cycles, and how to minimize the impact on running application code. These choices produce collectors with profoundly different latency, throughput, and memory trade-offs.

Historical Development

Garbage collection was invented by John McCarthy in 1959–1960 as part of the Lisp programming language. McCarthy devoted approximately one page of Lisp's original specification to what he called the mark-and-sweep algorithm. The technique works in two passes: first, it traces all objects reachable from a set of root references via depth-first search and marks them as live; then it sweeps the heap, reclaiming all unmarked objects. This naïve stop-the-world algorithm became the founding technique upon which all subsequent GC research was built.

Origin

The first practical garbage collection algorithm was proposed by McCarthy in 1960 and successfully applied to Lisp. It remains the conceptual baseline against which every modern collector is compared.

The next major conceptual leap came with the observation that object age is a strong predictor of object survival. Lieberman and Hewitt formalized this in their paper "A Real Time Garbage Collector Based on the Lifetimes of Objects", introducing the weak generational hypothesis and the generational GC design that followed from it.

Later, David Bacon and colleagues contributed two significant advances: the concurrent cycle-collection algorithm for reference-counted systems, and a unified theoretical framework showing that all high-performance collectors are hybrids of tracing and reference counting, rather than categorically distinct approaches.

Core Concepts

Tracing vs. Reference Counting

The two fundamental strategies for identifying dead objects approach the problem from opposite directions.

Tracing garbage collection tracks live objects: starting from a root set of known-live references (stack variables, global variables), it traverses the object graph and marks everything reachable. Anything not reached is garbage. Collection is inherently non-deterministic in timing — the application does not know when an object will be reclaimed.

Reference counting tracks dead objects: each object carries a counter of how many references point to it. When the count drops to zero, the object is immediately reclaimed. This provides deterministic object lifetimes for acyclic reference graphs, but introduces a fundamental problem: reference cycles prevent counts from ever reaching zero, causing memory leaks.

A unified theory of GC, developed by David Bacon, demonstrates that all high-performance collectors are hybrids of tracing and reference counting — the two approaches are points on a spectrum, not categorically distinct alternatives.

The unified theory provides a single cost model to quantify the trade-offs resulting from different hybridizations, enabling principled comparison of diverse collector designs.

The Weak Generational Hypothesis

The empirical foundation for a large class of modern collectors is the weak generational hypothesis: most allocated objects die young (become garbage shortly after allocation), and objects that survive past a certain age tend to persist for a long time. Object lifetime distributions show a sharp peak at young ages with exponential decay, making age a strong predictor of survival probability.

This observation directly motivates generational collection: rather than scanning the entire heap on every collection, focus effort on the young generation where most garbage accumulates.

Mechanism & Process

Stop-the-World Collection

The simplest collection model halts all application threads during a collection cycle. With no concurrent mutation, the collector can safely traverse the entire object graph without synchronization. The guarantee simplifies the algorithm considerably — no object can become newly unreachable during a collection, and no new allocations happen mid-sweep.

The cost is the "embarrassing pause": during collection, the application performs no useful work and can experience noticeable latency spikes. Research comparing stop-the-world and concurrent collectors for Java shows this is the central trade-off in collector design.

Generational Collection

Generational collectors divide the heap into regions by object age (generations). Newly allocated objects enter the young generation. When the young generation fills, a minor collection reclaims only that region. Because most young objects are already garbage at collection time, the cost of a minor collection is proportional to the number of live young objects — typically a small fraction of the young generation — rather than total heap size. This dramatically reduces both average pause times and collection frequency compared to full-heap collectors.

Trade-off

Generational collection trades completeness for efficiency: objects in older generations are collected less frequently, so cyclic garbage in those generations may persist until a full collection runs. Long-lived cyclic data structures can consume memory for extended periods.

Oracle's Java 8 documentation on generations describes how the JVM exploits the weak generational hypothesis: the vast majority of objects are allocated in the young generation and most die there. Minor collections assume this property holds and are optimized accordingly.

Mark-Compact Collection

Mark-and-sweep identifies and frees dead objects correctly but leaves heap fragmentation: live objects scattered across the heap can prevent large allocations even when total free memory is sufficient. Mark-compact solves this by adding a compaction phase: after marking, live objects are relocated toward one end of the heap, leaving a single large contiguous free block. Allocation then becomes a simple pointer bump — no free-list search required.

The compaction step also improves cache performance by placing objects that are accessed together in memory near each other, improving spatial locality. The trade-off is that compaction requires copying objects and updating all references to them.

Incremental and Concurrent Collection

Incremental collectors break collection work into small increments interleaved with application execution, replacing a single large pause with many small ones. This is particularly important for real-time systems with strict latency requirements. Incremental collectors require write barriers: every reference write must inform the collector of modifications to already-scanned objects, adding overhead to memory writes but enabling predictable latency profiles.

Concurrent collectors go further: collection threads run simultaneously with application threads. This can reduce pauses to near-zero but requires careful synchronization and adds implementation complexity. Research on concurrent real-time garbage collectors shows they can achieve microsecond-level pause-time deadlines, making them viable for latency-sensitive applications.

Cycle Collection for Reference Counting

Reference counting's inability to reclaim cyclic structures is a well-known limitation. David Bacon's concurrent cycle-collection algorithm addresses this by observing that a cycle can only be isolated when a reference count is decremented to a nonzero value. The algorithm is both concurrent — it operates in the presence of simultaneous mutation — and localized, never requiring a global search of the entire data space. It achieves the same theoretical time bounds as tracing collectors while maintaining the incrementality of reference counting.

Bacon's algorithm was successfully implemented in the Jalapeño JVM as part of the Recycler, and PHP adopted it since version 5.3 for handling circular references.

Variants & Subtypes

Hybrid Strategies in Production Runtimes

Modern garbage-collected runtimes typically combine multiple algorithms to balance competing concerns — pause time, throughput, fragmentation control, and memory efficiency. For example:

  • Java and .NET: generational collection with intra-generational mark-and-sweep, periodic mark-and-compact for fragmentation control, and rare full-heap copying collection.
  • CPython: primary reference counting supplemented by a generational cycle collector that focuses exclusively on container objects (arrays, dictionaries, lists, class instances) that can form cycles. Containers that support GC carry an extra gc_ref field.
  • Erlang (BEAM VM): per-process generational GC where each process maintains its own independent heap collected separately. GC pauses affect only individual processes, not the entire system — a fundamentally different isolation model from JVM or CLR stop-the-world collectors.

WasmGC

The WasmGC proposal introduces fixed-size structs and arrays with automatic heap management into WebAssembly, allowing garbage-collected languages compiled to Wasm to leverage the host environment's existing GC rather than shipping a custom GC implementation in the binary. This eliminates significant code size overhead for managed languages targeting the web.

GC Beyond Memory: Tombstones in CRDTs

The term "garbage collection" extends beyond heap memory to any system that accumulates unreachable state requiring cleanup. A prominent example is sequence CRDTs used for collaborative editing: deleted elements become tombstones — metadata retained to resolve concurrent edits — but accumulate indefinitely without explicit GC. A 1,000-character document with heavy edit history may internally contain 50,000 tombstones. Without active GC strategies such as epoch-based pruning or causal stability tracking, production systems risk unbounded memory growth on clients and growing storage overhead over time.

Controversies & Debates

GC Overhead vs. Manual and Arena Allocation

Garbage collection is not free. Research from UMass Amherst quantifies the cost: GC systems typically require heaps 2–3x the size of live data to allow effective collection batching, incur CPU overhead for tracing and marking phases, and cause latency spikes from stop-the-world pauses. Arena allocation — which avoids per-object tracking entirely through bulk deallocation — can achieve up to 54% CPU reduction in allocation-heavy workloads where allocation patterns align with arena semantics.

The trade-off is not simply "GC is slower." Arena allocation requires the programmer to reason explicitly about object lifetimes and is error-prone when lifetimes are complex. GC pays a runtime cost to eliminate that entire class of bugs. Which trade-off is correct depends entirely on workload characteristics and reliability requirements.

Key Takeaways

  1. Garbage collection automatically reclaims unreachable objects GC relieves programmers from explicit memory tracking, automatically returning memory to the heap when objects become unreachable.
  2. Tracing and reference counting represent opposite approaches to identifying dead objects Tracing tracks live objects from roots; reference counting tracks dead objects via counters. All high-performance collectors are hybrids of both strategies.
  3. The weak generational hypothesis explains why young objects die young Empirical evidence shows most allocated objects become garbage shortly after creation, making object age a strong predictor of survival probability.
  4. Generational collection trades completeness for efficiency By focusing collection effort on young objects, generational collectors reduce pause times and collection frequency, but cyclic garbage in old generations may persist longer.
  5. Concurrency and pause-time reduction require write barriers and careful synchronization Incremental and concurrent collectors can reduce pauses to microseconds, but require write barriers that add overhead to memory operations.

Further Exploration

Foundational Theory

Algorithm & Design

Production Implementations

Performance & Trade-offs