Engineering

Project Loom and Virtual Threads

How the JVM achieves million-thread concurrency without async/await

Learning Objectives

By the end of this module you will be able to:

Explain how virtual threads differ from platform threads and why millions of them can coexist in a single JVM process.
Describe the carrier thread mounting/unmounting mechanism and what triggers a yield.
Identify and avoid virtual thread pinning scenarios involving synchronized blocks and native methods.
Implement structured concurrency using StructuredTaskScope and explain its lifetime and cancellation guarantees.
Compare virtual threads with Go goroutines, Rust async/await, and Kotlin coroutines, explaining the trade-offs of each model.
Migrate a ThreadLocal usage to scoped values and understand when each is appropriate.

Core Concepts

What a Virtual Thread Is

A virtual thread is a user-space thread managed entirely by the JVM rather than by the operating system. The key consequence: a single JVM process can instantiate and manage millions of concurrent virtual threads with minimal overhead, compared to the low thousands that OS-backed platform threads can realistically support.

Platform threads carry a fixed, OS-allocated stack (typically 1 MB or more). Virtual threads store their call stacks on the Java heap, as ordinary heap objects subject to garbage collection. Stack frames grow and shrink as the virtual thread executes, and the GC reclaims frames that are no longer needed. Memory per virtual thread is measured in hundreds of bytes to a few kilobytes rather than megabytes.

API entry point

// Creating a virtual thread is a one-liner in Java 21+
Thread vt = Thread.ofVirtual().start(() -> System.out.println("Hello from a virtual thread"));

// Or via an executor whose factory creates virtual threads
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();

The M:N Scheduler and Carrier Threads

The JVM implements M:N scheduling: a large number M of virtual threads are dynamically multiplexed onto a smaller number N of carrier threads — ordinary platform threads managed by an internal ForkJoinPool. By default the carrier pool size equals the number of CPU cores, capped at 256.

Virtual threads do not have dedicated carriers. When a virtual thread has work to do it mounts onto whichever carrier is available; when it blocks on I/O or another blocking operation it unmounts, returning the carrier thread to the pool so another virtual thread can use it.

Fig 1

M:N scheduling: many virtual threads share a small pool of carrier threads

The Continuation Mechanism

The mounting/unmounting model is implemented through continuations. When a virtual thread hits a blocking operation, the runtime saves its entire call stack and local variables into a continuation object on the heap, releases the carrier thread, and parks the virtual thread. When the blocking operation completes (the socket becomes readable, the lock is acquired, etc.), the scheduler picks an available carrier, restores the continuation, and execution resumes from exactly where it left off — without any changes to application code.

Continuations are what make blocking code fast: the carrier is freed while the virtual thread waits, so thousands of I/O waits happen concurrently on a handful of OS threads.

From the programmer's perspective the code looks and debugs like sequential synchronous code. The concurrency is entirely an implementation detail of the runtime.

Virtual Threads and I/O-Bound Workloads

Virtual threads deliver significant scalability gains for I/O-bound, high-concurrency workloads — the dominant pattern in server applications. Production benchmarks show 14,500 requests/second with virtual threads compared to 2,300 requests/second with a pool of 200 platform threads, simply because blocking threads no longer waste OS resources.

They do not make CPU-bound work faster. If your task uses 100% CPU throughout its duration, a virtual thread buys nothing over a platform thread — throughput remains bounded by available cores.

Virtual Threads Are Not New — They Are Production-Ready

Virtual threads were first introduced as a preview feature in Java 19 (JEP 425), refined in Java 20, and finalized as a production feature in Java 21 LTS — supported through September 2031. The two-release preview cycle allowed the project team to incorporate real-world feedback before freezing the API.

Compare & Contrast

Virtual Threads vs. Rust async/await

Rust's async/await model compiles async functions into stackless state machines. Every .await point is an explicit suspension point encoded in the function's type signature. The compiler knows at compile time exactly what state must be saved. Rust async tasks are extremely memory-efficient (tens of bytes), but the model is infectious: once a function is async, everything that calls it must be async too.

Java virtual threads use a different strategy: preemptive scheduling at safe points rather than cooperative yielding at explicit await points. The JVM inserts yield opportunities transparently. Existing blocking code — Thread.sleep, InputStream.read, JDBC calls — just works without modification.

The practical implication: thread pinning is a performance hazard specific to Java's implementation. In Rust, an unintended blocking call in async code is a compile-time problem. In Java, it silently pins a carrier thread.

Dimension	Rust async/await	Java virtual threads
Suspension model	Stackless, cooperative, explicit `.await`	Stackful, preemptive, transparent
Memory per task	Tens of bytes	Hundreds of bytes — a few KB
Coloring problem	Yes — async is viral	No — any blocking call works
Unintentional blocking	Compiler error	Silent pinning
Scheduler	User-supplied runtime (Tokio, async-std)	JVM built-in ForkJoinPool

Virtual Threads vs. Go Goroutines

Go goroutines are also M:N, language-integrated, and use cooperative scheduling with a runtime-managed work-stealing scheduler. The conceptual models are similar. The difference is largely ergonomic: goroutines are a first-class language primitive (go func()), whereas Java virtual threads are library-level and spelled Thread.ofVirtual().

Go's goroutine scheduler has been tuned over a longer period and has tighter integration with the runtime. Java's scheduler is newer but benefits from the JIT.

Virtual Threads vs. Kotlin Coroutines

Kotlin coroutines have been stable since Kotlin 1.3 (2018) and demonstrated the viability of lightweight concurrency on the JVM before Loom arrived. They use cooperative scheduling with explicit suspend functions — structurally closer to Rust than to Java virtual threads.

The key difference: Kotlin coroutines are opt-in and viral (like Rust async), requiring suspend to be propagated up the call chain. Java virtual threads are transparent — you write plain Thread code and the JVM handles the scheduling.

Kotlin on Loom

Kotlin coroutines can run on virtual threads by configuring Dispatchers.IO to use a virtual-thread executor. The two models are interoperable.

Dimension	Kotlin coroutines	Java virtual threads
Scheduling model	Cooperative, explicit `suspend`	Transparent, JVM-managed
Coloring problem	Yes — `suspend` propagates	No
Structured concurrency	Yes, built-in (`launch`, `async`)	Yes, via `StructuredTaskScope` (preview)
Maturity	Stable since 2018	Finalized in Java 21 (2023)

Virtual Threads vs. Reactive Programming (Project Reactor / RxJava)

Virtual threads and reactive frameworks are complementary approaches, not competing alternatives. Reactive frameworks provide fine-grained control and advanced flow operators (backpressure, operator fusion) at the cost of a steep cognitive overhead — callback chains and operator composition.

Virtual threads generally match or exceed reactive framework throughput in benchmarks for typical I/O-bound server workloads, while the code remains sequential and debuggable. The choice should be driven by team expertise and specific use case, not by a blanket performance claim.

Common Misconceptions

"Virtual threads make everything faster." Virtual threads improve concurrency, not raw computation speed. CPU-bound workloads see no benefit; only work that spends significant time waiting on I/O or blocking operations will scale better.

"You should pool virtual threads like platform threads." Virtual threads are cheap enough to be created per-task and discarded. Thread pooling was necessary with platform threads to amortize expensive creation cost. With virtual threads, pooling is an anti-pattern — it reintroduces resource contention and complicates lifetimes without benefit.

"ThreadLocal works fine with virtual threads." It works in the sense that virtual threads fully support ThreadLocal for backward compatibility. But using ThreadLocal as a cache for expensive mutable objects is an anti-pattern under virtual threads: with millions of virtual threads, each holding its own cached instance, memory consumption multiplies accordingly. Use scoped values for new code in high-concurrency contexts.

"Synchronized blocks are fine." Inside a synchronized block or native/JNI call, a virtual thread is pinned to its carrier thread — it cannot unmount. If the pinned virtual thread then blocks (e.g., waits for a lock held by another thread), the carrier is occupied and unavailable to other virtual threads. This is silent under normal conditions; replace synchronized with ReentrantLock in hot paths.

Worked Example

Migrating a Thread-Per-Request Server to Virtual Threads

Below is a progression from a traditional thread-pool executor to virtual threads, followed by structured concurrency for a composite call.

Step 1 — Classic fixed thread pool (the old way)

// Limited by pool size. Under load, requests queue up.
ExecutorService pool = Executors.newFixedThreadPool(200);

pool.submit(() -> {
    String userProfile = fetchUserProfile(userId);   // blocks
    String orders      = fetchOrders(userId);        // blocks
    return merge(userProfile, orders);
});

Step 2 — Virtual thread executor (simple migration)

// One virtual thread per task. No pool sizing needed.
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();

executor.submit(() -> {
    String userProfile = fetchUserProfile(userId);   // still blocks — now unmounts carrier
    String orders      = fetchOrders(userId);        // same
    return merge(userProfile, orders);
});

The code is identical to step 1. The blocking calls are now non-wasteful because the carrier thread is released during each I/O wait.

Step 3 — Structured concurrency to fan out calls in parallel

// Java 21+ with --enable-preview for StructuredTaskScope
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {

    Subtask<String> profileTask = scope.fork(() -> fetchUserProfile(userId));
    Subtask<String> ordersTask  = scope.fork(() -> fetchOrders(userId));

    scope.join();           // waits for both
    scope.throwIfFailed();  // propagates first exception, cancels the other task

    return merge(profileTask.get(), ordersTask.get());
}

StructuredTaskScope.ShutdownOnFailure cancels all remaining subtasks the moment one fails. Lifetimes are scoped to the try block — no orphaned tasks, no leaked threads.

Step 4 — Replacing a ThreadLocal cache with a scoped value

// Old way — problematic with millions of virtual threads
private static final ThreadLocal<DateFormat> DATE_FORMAT =
    ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd"));

// New way with scoped values (JEP 487, preview)
private static final ScopedValue<String> CURRENT_USER = ScopedValue.newInstance();

ScopedValue.where(CURRENT_USER, "alice").run(() -> {
    // CURRENT_USER.get() returns "alice" here and in any child task
    processRequest();
});

Scoped values are immutable within a scope, automatically inherited by child tasks forked inside a StructuredTaskScope, and require no explicit copying.

Boundary Conditions

Virtual threads do not help CPU-bound work. If your virtual thread does image processing or cryptographic operations without any I/O, it occupies its carrier for the full duration. Running many CPU-bound virtual threads competes for carrier threads and can hurt throughput compared to a fixed ThreadPoolExecutor sized to the number of cores.

Pinning degrades the M:N model. A virtual thread inside a synchronized block that then blocks on I/O will hold a carrier hostage. Widespread pinning can cause carrier thread starvation, undoing the scalability gains. Diagnose with -Djdk.tracePinnedThreads=full.

Diagnosing pinning

Add -Djdk.tracePinnedThreads=full to your JVM arguments during development. It prints a stack trace whenever a virtual thread is pinned. Common culprits: JDBC drivers and legacy libraries using synchronized internally. Check whether your database driver has a virtual-thread-compatible release.

Structured concurrency is still in preview. StructuredTaskScope has been in preview since Java 21 and continues to evolve (JEP 505 is the fifth preview). Production use requires --enable-preview and acceptance that the API may change before finalization.

Scoped values are also in preview. JEP 487 and JEP 506 bring scoped values through successive preview iterations. ThreadLocal remains the stable alternative for production code today.

Thread-local caching anti-pattern amplified at scale. Libraries that cache expensive objects (e.g., DateFormat, serialization buffers) in ThreadLocal as a performance optimization were designed around a small, reused pool of platform threads. With a virtual-thread-per-task model, each task creates its own instance and never reuses it — potentially creating millions of cached objects in memory. Audit dependencies (Jackson, database pools) before switching to virtual threads at scale.

Key Takeaways

Virtual threads are heap-allocated, JVM-managed threads They cost hundreds of bytes rather than megabytes. The JVM multiplexes millions of them onto a small carrier thread pool equal in size to the number of CPU cores.
Continuations are the implementation mechanism When a virtual thread blocks, its stack is saved to the heap and the carrier thread is freed. No application code changes are needed — blocking code gains reactive-like scalability for free.
Pinning is the principal hazard A synchronized block or native/JNI call prevents unmounting. Replace hot synchronized sections with ReentrantLock and use -Djdk.tracePinnedThreads=full to detect problems early.
Structured concurrency enforces task lifetime hygiene StructuredTaskScope ensures that forked subtasks cannot outlive their enclosing scope, makes cancellation and error propagation automatic, and keeps thread relationships visible in observability tools.
ThreadLocal works but does not scale well Virtual threads support ThreadLocal for backward compatibility, but caching mutable objects in thread-locals creates one instance per task rather than per pooled thread. Prefer scoped values for new, high-concurrency code that benefits from automatic child-task inheritance.

Further Exploration