Engineering

Concurrency and the Java Memory Model

Happens-before, visibility, and thread safety — seen through the lens of Rust, Python, and Go

Learning Objectives

By the end of this module you will be able to:

Explain the Java Memory Model's happens-before relationship and why it is necessary for correct concurrent programs.
Distinguish between the semantics of synchronized, volatile, and java.util.concurrent.atomic types.
Explain why immutability is the safest concurrency strategy and how to enforce it in Java.
Compare Java's runtime concurrency model with Rust's compile-time Send/Sync system, Python's GIL, and Go's goroutines.
Identify common concurrency pitfalls: data races, deadlocks, and visibility bugs.

Core Concepts

The Java Memory Model and Happens-Before

Modern CPUs and compilers do not execute instructions in source-code order. They reorder reads and writes for performance, and different CPU cores maintain private caches that may not reflect the latest value written by another core. In a single-threaded program this is invisible. In a multi-threaded program, it produces results that feel impossible.

Java's answer is the Java Memory Model (JMM) — a formal specification that defines when one thread's writes are guaranteed to be visible to another thread's reads. The central concept is the happens-before relationship. If action A happens-before action B, then the side effects of A are visible to B. Without such a relationship, the JVM makes no guarantee about visibility at all.

The JMM establishes several built-in happens-before edges:

A thread's write to a volatile field happens-before every subsequent read of that field by any thread.
Releasing a monitor lock (synchronized exit) happens-before any subsequent acquisition of the same lock.
Starting a thread (Thread.start()) happens-before any action in that thread.
All actions in a thread happen-before any other thread's successful Thread.join() on it.

Visibility is not atomicity

volatile guarantees that a write is visible to other threads. It does not make a compound operation like counter++ atomic. Incrementing requires read-modify-write, which is three separate operations. Use AtomicInteger for that.

`synchronized`, `volatile`, and Atomics

Java provides three layers of concurrency primitives, each covering a different set of guarantees.

synchronized acquires a monitor lock on an object and creates a mutual exclusion region. On entry, it refreshes the thread's view of shared state from main memory. On exit, it flushes all writes back. This gives you both visibility and atomicity, but at the cost of potential contention and deadlock.

public synchronized void increment() {
    this.count++; // safe: atomicity + visibility
}

volatile is lighter. It provides visibility without mutual exclusion. A volatile write flushes to main memory immediately; a volatile read always fetches from main memory. It is appropriate when one thread writes and others only read, or when the write is an atomic single-field update (a reference swap, a boolean flag).

private volatile boolean shutdownRequested = false;

// Writer thread
public void requestShutdown() { shutdownRequested = true; }

// Reader thread
public void run() {
    while (!shutdownRequested) { doWork(); }
}

java.util.concurrent.atomic types (AtomicInteger, AtomicReference, AtomicLong, etc.) provide atomic compound operations through hardware-level Compare-And-Swap (CAS) instructions. They are faster than synchronized under low contention because they avoid kernel-level blocking.

private final AtomicInteger counter = new AtomicInteger(0);

public int nextId() {
    return counter.incrementAndGet(); // atomic read-modify-write
}

Immutability as a Concurrency Strategy

The deepest concurrency guarantee is to share nothing mutable. Immutable data structures provide inherent thread safety and eliminate the need for locks or synchronization mechanisms entirely — a write to an immutable object is impossible, so there is no race condition to prevent.

Shared mutable state creates reference-sharing problems where changes visible to one reference affect all other references to the same object. Immutability severs this chain. The attack surface for a single-core program is identical to that of a multi-core program when all shared data is immutable.

In Java, immutability is a design discipline rather than a language enforcement. The pattern is:

Declare all fields private final.
Do not expose mutable internal state through getters (return defensive copies or use immutable collections).
Mark the class final to prevent subclasses from reintroducing mutability.

public final class Point {
    private final int x;
    private final int y;

    public Point(int x, int y) { this.x = x; this.y = y; }
    public int x() { return x; }
    public int y() { return y; }
    public Point translate(int dx, int dy) {
        return new Point(x + dx, y + dy); // returns a new object, never mutates
    }
}

java.lang.String, java.lang.Integer, and java.time.Instant are canonical examples in the standard library.

Records make immutability ergonomic

Java 16+ records are shallowly immutable by construction. record Point(int x, int y) {} declares final fields and a canonical constructor automatically. Prefer records for value-like data objects in new code.

The OS-Thread-Per-Java-Thread Model

Before virtual threads (covered in the next module), each Java Thread maps 1:1 to an OS kernel thread. This means:

Creating a thread is expensive (OS allocation, stack reservation, typically 512 KB–1 MB per thread).
The OS scheduler controls preemption: threads can be interrupted at any point, requiring explicit synchronization.
Thread pools (ExecutorService, ForkJoinPool) exist precisely because raw thread creation does not scale to thousands of concurrent operations.

Deadlock

Java's monitor-based model enables deadlock whenever two threads each hold a lock the other needs. The classic pattern: Thread A holds lock X and waits for lock Y; Thread B holds lock Y and waits for lock X. Neither can proceed.

Prevention strategies:

Establish a consistent global lock ordering. Always acquire locks in the same order everywhere.
Use java.util.concurrent.locks.Lock with tryLock(timeout) to detect and back off from deadlock.
Prefer higher-level abstractions (ConcurrentHashMap, BlockingQueue) that internally manage locking correctly.

Compare & Contrast

Java vs. Rust: Runtime Locks vs. Compile-Time Ownership

The most important conceptual shift for a Rust engineer is that Java's thread safety is enforced at runtime, not at compile time.

Rust's ownership system guarantees memory safety and thread safety without garbage collection. The core rules are checked by the compiler: each value has exactly one owner, and when ownership transfers, the original owner loses access. The borrowing system enforces that either multiple immutable references or exactly one mutable reference can exist at any time — a rule that directly prevents data races.

Rust enforces concurrency safety through Send and Sync marker traits. Send permits ownership transfer to another thread. Sync permits shared references across threads (defined as &T implementing Send). Types with unsafe semantics — Rc<T>, Cell<T>, RefCell<T> — do not implement these traits, so the compiler rejects any attempt to use them across thread boundaries.

Fig 1

Rust's compile-time thread safety vs. Java's runtime thread safety

Rust's type system explicitly prevents Rc<T> from crossing thread boundaries. If you clone an Rc<T> across threads, both threads could update the reference count without synchronization. Rust provides Arc<T> (atomically reference-counted) for the thread-safe equivalent. This distinction is enforced at compile time. Java has no equivalent — you can pass any object to any thread. The contract is documented in Javadoc; enforcement is yours to manage.

Java vs. Python: True Parallelism vs. the GIL

Python's Global Interpreter Lock (GIL) was a deliberate design choice in CPython's bytecode interpreter to optimize single-threaded performance and simplify interpreter development at the cost of preventing true parallelism. Only one thread executes Python bytecode at any given time. Threading in CPython achieves concurrency (interleaved execution) for I/O-bound work but not parallelism (simultaneous execution) for CPU-bound work.

Java has no GIL. Java threads are OS threads running in parallel on multiple cores. This is a genuine capability difference: a Java program with four threads on a four-core machine can run four computations simultaneously. A CPython program with four threads cannot. The consequence is that Java's concurrency bugs — data races, visibility issues — are real races between parallel executing code, not simulated concurrency.

Python's GIL makes threading simpler but fundamentally limits throughput. Java's 1:1 OS-thread model provides true parallelism and full responsibility for correctness.

Java vs. Go: OS Threads vs. M:N Green Threads

Go implements an M:N threading model where M goroutines are dynamically multiplexed onto N OS kernel threads. Goroutines start with approximately 2–4 KB of stack and use dynamic growth, far smaller than OS thread stacks of 1–8 MB. This makes millions of concurrent goroutines practical. Classic Java threads cannot scale that way because each thread consumes an OS-level stack.

Go's scheduler uses work-stealing to load-balance goroutines across kernel threads. Go's goroutines use preemptive scheduling where the runtime can interrupt execution at safe points without explicit developer action, unlike Rust's async model, which requires explicit await points.

Java's classic thread model is closer to Go's OS thread layer than to goroutines. Virtual threads (Module 10) are Java's answer to Go's goroutines.

JavaScript: No Shared State by Design

JavaScript in Node.js and browsers runs on a single-threaded event loop. There is no shared mutable state between concurrent operations because there is no real concurrency in the traditional sense: callbacks, promises, and async/await represent interleaved execution on one thread, never parallel execution on multiple. The concurrency model avoids data races entirely by construction — but also prevents parallelism except through worker threads with explicit message-passing.

Model	Thread type	Parallelism	Safety mechanism
Java (classic)	OS threads (1:1)	Yes	Locks, volatile, atomics (runtime)
Rust	OS threads (1:1)	Yes	Send/Sync, ownership (compile time)
Python (CPython)	OS threads + GIL	No (CPU-bound)	GIL (automatic, limited)
Go	Goroutines (M:N)	Yes	Channels, race detector
JavaScript	Event loop (single)	No (single thread)	Shared-nothing (by design)

Common Misconceptions

"volatile makes operations atomic." It does not. volatile guarantees visibility: a write is flushed to main memory and a read always fetches from it. But count++ is three operations: read, increment, write. Between the read and the write, another thread can interpose its own read. Use AtomicInteger.incrementAndGet() for atomic increment.

"synchronized on this is always correct." Synchronizing on this exposes your lock object to external code. Any caller can synchronized(yourObject) and hold your lock indefinitely. Prefer a private final lock object: private final Object lock = new Object();.

"If I don't see a data race in tests, there is no data race." Concurrent bugs are non-deterministic. They depend on thread scheduling, CPU cache states, and JIT optimizations. A program without the right happens-before edges is wrong regardless of whether the race manifests in testing. The JMM defines what is legal, not what is observed.

"Immutability requires copying everything." Immutable objects are shareable without copying. Once created and published safely, an immutable object can be read by any number of threads with zero synchronization overhead. The cost is allocation on writes, not reads. For read-heavy shared data, immutability is cheaper than locking.

"Java's concurrency model is like Python's — threads don't run truly in parallel." This is false. Java OS threads run on separate CPU cores simultaneously. The GIL is a CPython artifact, not a JVM feature.

Worked Example

Goal: A shared counter read by many threads, incremented by several workers. Start from a broken version and fix it step by step.

Step 1: Broken — no synchronization

public class BrokenCounter {
    private int count = 0;

    public void increment() { count++; }    // read-modify-write, not atomic
    public int get() { return count; }      // no visibility guarantee
}

This is broken in two ways. First, count++ is not atomic: two threads can both read the same value, both increment, and both write back, losing one increment. Second, there is no happens-before between the write in one thread and the read in another — a reader may see a stale cached value indefinitely.

Step 2: Correct but coarse — synchronized

public class SynchronizedCounter {
    private int count = 0;

    public synchronized void increment() { count++; }
    public synchronized int get() { return count; }
}

synchronized on the method acquires the monitor on this. The exit of increment() happens-before any subsequent entry to either method, ensuring visibility and atomicity. The cost: all threads serialise through the monitor. Under high contention this becomes a bottleneck.

Step 3: Correct and faster — AtomicInteger

import java.util.concurrent.atomic.AtomicInteger;

public class AtomicCounter {
    private final AtomicInteger count = new AtomicInteger(0);

    public void increment() { count.incrementAndGet(); }
    public int get() { return count.get(); }
}

AtomicInteger.incrementAndGet() uses a hardware CAS instruction: it retries in a tight loop until the compare-and-swap succeeds. No kernel lock is acquired. Under low-to-medium contention this is significantly faster than synchronized.

Step 4: The immutable alternative — no synchronization at all

If the counter's value is published once and then only read (not incremented continuously), make it immutable:

public final class ImmutableSnapshot {
    private final int count;

    public ImmutableSnapshot(int count) { this.count = count; }
    public int count() { return count; }
}

A safely published ImmutableSnapshot requires no locks, no volatile, and no atomics. Readers never need synchronization. This is only correct when state transitions produce new objects rather than mutating existing ones — a functional style with a persistent data structure.

Safe publication

For an immutable object to be safely visible to other threads, it must be published through a happens-before edge — for example, via a volatile field, a final field written in a constructor, or placing it into a synchronized collection. Writing to a plain non-final, non-volatile field is not enough.

Key Takeaways

The JMM defines visibility, not just atomicity. Without a happens-before edge — from synchronized, volatile, thread start/join, or java.util.concurrent utilities — reads may see stale values even if no data corruption occurs.
Three tools, three trade-offs. synchronized gives full mutual exclusion and visibility at the cost of potential contention. volatile gives visibility without mutual exclusion, suitable for simple flags and single-field updates. Atomics give atomic compound operations via CAS — faster than synchronized under low contention, less expressive than full locks.
Immutability is the strongest safety strategy. Immutable data structures eliminate the need for locks by eliminating mutable shared state. Prefer final fields, records, and immutable collections. The attack surface for concurrent access is identical to single-threaded access.
Rust prevents races at compile time; Java does not. Rust's Send and Sync traits, enforced by the borrow checker, prevent data races statically. Java's any-object-can-cross-any-thread model shifts the entire correctness burden to the programmer.
Classic Java threads are OS threads. Unlike Go's lightweight goroutines or JavaScript's event loop, classic Java threads map 1:1 to OS threads. They are expensive to create and require explicit pooling. Virtual threads (Module 10) close this gap.

Further Exploration

Java Memory Model & Concurrency

JSR 133 Java Memory Model FAQ — The authoritative plain-language explanation of the JMM
Java Concurrency in Practice — The canonical reference for Java concurrent programming patterns
JEP 401: Value Classes and Objects — The upcoming Valhalla proposal and its concurrency constraints

Language Comparisons

Extensible Concurrency with Send and Sync — Primary source on Rust's trait-based thread safety
What Is the Python Global Interpreter Lock (GIL)? — Accessible deep dive into GIL design and consequences
How Stacks are Handled in Go — Explains goroutine stack growth and why goroutines are cheap
Safe Systems Programming in Rust — Peer-reviewed overview of Rust's ownership model