Engineering

WebAssembly

A portable bytecode format and execution model for near-native performance across runtimes

Lead Summary

WebAssembly (Wasm) is a binary instruction format designed as a portable, efficient compilation target for a wide variety of programming languages. Unlike the JVM or the CLR, which were built as primary programming execution platforms, WebAssembly is deliberately minimal: its MVP instruction set covers only four numeric types (i32, i64, f32, f64), making it easy to map onto native hardware without an abstraction penalty.

Originally conceived for the web browser, WebAssembly has expanded well beyond that context. It now runs server-side, in embedded environments, and in plugin systems where security boundaries and language-agnostic composition matter. As of 2021, approximately 40 programming languages could compile to Wasm. A layered ecosystem has grown around the core spec: the WebAssembly System Interface (WASI) provides capability-controlled access to system resources, and the Component Model defines a language-agnostic composition layer sitting above the binary format itself.


Core Concepts

Stack-Based Virtual Machine

WebAssembly implements a stack-based virtual machine where instructions manipulate values on an implicit operand stack. Each instruction consumes (pops) its argument values and produces (pushes) result values. This computational model distinguishes Wasm from register-based architectures and makes bytecode verification straightforward: the type of every stack slot is known statically before execution begins.

Linear Memory

WebAssembly provides a linear memory model: a contiguous, mutable array of raw bytes that programs can load and store at any byte address. The operand stack is separate from this linear memory, which provides protection against stack-smashing attacks. Linear memory is bounded, and all accesses are dynamically bounds-checked at runtime — an out-of-bounds access causes a trap rather than silent memory corruption.

Crucially, each WebAssembly module's linear memory is initialized to zero by default unless explicitly filled with data during instantiation. This prevents information leakage through uninitialized memory and eliminates a class of vulnerabilities where sensitive data from previous executions could bleed into new ones.

Structured Control Flow

WebAssembly enforces structured control flow: all branches are restricted to properly nested block, loop, and if/then/else constructs. There are no arbitrary jumps or goto-like instructions. This design choice serves two goals simultaneously — it makes fast control-flow verification possible at load time, and it provides implicit control-flow integrity, preventing the return-oriented programming attacks that afflict native code execution.

Contrast with assembly

In unstructured native assembly, a jump can target any instruction address. WebAssembly's structured control flow closes that attack surface entirely by construction, not by runtime checking.

Module Instantiation and AOT Compilation

WebAssembly separates compilation from execution. A module is first compiled (potentially ahead-of-time) and then instantiated — a process that initializes runtime state, registers imported functions, and sets up the linear memory. This separation allows a compiled module to be shared across multiple instances and enables AOT-compiled binaries to achieve approximately 50% of native performance without JIT warmup overhead, which benefits cold-start-sensitive workloads.


Security Model

WebAssembly's security architecture rests on Software Fault Isolation (SFI) rather than hardware virtualization or kernel privilege separation. SFI achieves sandboxing through a combination of:

  • Bounds-checked linear memory: every load and store is validated against the module's allocated range.
  • Structured control flow: indirect jumps are restricted to a controlled function table.
  • Memory isolation between modules: each module's linear memory is fully isolated from both the host system and other modules.
  • No direct syscalls: WebAssembly cannot call the host operating system directly. When a syscall accepts pointer arguments, the runtime must perform address-space translation between Wasm's linear memory and native process memory before forwarding.
WebAssembly's isolation is enforced structurally — by the shape of the bytecode itself — not by relying on memory protection hardware or kernel enforcement.

Bytecode Validation

Before instantiation, WebAssembly modules undergo static validation analogous to JVM class verification. The validator uses dataflow analysis to confirm that stack machine types match at every instruction, control-flow constructs are well-formed, and imports and exports align with declared signatures. This load-time pass means that a compliant Wasm runtime can execute validated bytecode without per-instruction runtime type checks.


The WASI System Interface

The WebAssembly System Interface (WASI) is the standard that allows Wasm modules to interact with a host operating system in a portable, capability-safe way. WASI implements a capability-based security model: the host runtime must explicitly grant capabilities to guest modules in the form of pre-opened file descriptors, sockets, or other resource handles passed as function parameters. A module that is not granted a capability cannot access the corresponding resource.

WASI specifies standardized interfaces for:

  • Filesystem access (via pre-opened directory handles, preventing path traversal)
  • Network sockets
  • Environment variables
  • Clocks and timers
  • Randomness
  • Terminal I/O (stdin, stdout, stderr)

WASIp2 extended these interfaces to include HTTP servers and clients (wasi-http) and higher-level abstractions such as Key-Value Stores (wasi-keyvalue). This layered approach means Wasm modules can be tested and deployed across different runtimes without any platform-specific adaptation.


Portability and Its Limits

WebAssembly's ISA is defined to be identical across all machine architectures, providing a strong compile-once-run-anywhere property. The standardized binary format, combined with the Component Model's runtime-level support for component linking, means a compiled Wasm component can execute on any compliant runtime without recompilation — a contrast to traditional FFI where code must be rebuilt per target platform.

However, portability is not absolute. Certain memory-constrained embedded devices cannot run standard WebAssembly, forcing runtime implementors to either deviate from the standard or abandon support for those platforms. This gap reveals that even carefully designed ISAs are constrained by practical hardware economics when memory budgets fall below the floor Wasm assumes.


Compilation Toolchains

Emscripten and the LLVM Backend

The primary toolchain for compiling C and C++ to WebAssembly is Emscripten, which uses Clang as its frontend, LLVM for intermediate representation and optimization, and Binaryen as a Wasm-specific optimizer. Since version 1.39.0 (October 2019), Emscripten switched from its legacy "fastcomp" backend to the upstream LLVM Wasm backend combined with wasm-ld, gaining full support for incremental compilation using relocatable WebAssembly object files.

The results are significant: the LLVM Wasm path beats fastcomp on both speed and code size on most benchmarks, with observed link-time speedups reaching 7x. Because LLVM's design is modular — frontends produce IR, the backend produces code for a target — any language with LLVM support gains Wasm compilation capability by adding a Wasm backend, without language-specific toolchain modifications.

Fig 1
C / C++ Clang IR LLVM Wasm Binaryen .wasm binary
Emscripten compilation pipeline

The Component Model

The Component Model is WebAssembly's answer to the cross-language composition problem. The MVP's instruction set deals only in numeric primitives, which makes it hard for code written in different languages to pass complex data to one another — strings, records, lists, and tagged unions need to be agreed upon at the boundary. The Component Model addresses this by introducing a complete composition layer above the core binary format.

WIT: The Interface Definition Language

WIT (WebAssembly Interface Types) is the IDL used within the Component Model. WIT files specify component interfaces with rich types — records (structs), variants (tagged unions), lists, options, results, and other algebraic data types — that go far beyond Wasm's native i32/i64/f32/f64. WIT is language-agnostic: the same .wit file can be implemented by a Rust library or consumed by a Go caller, with neither needing to understand how the other works internally.

The Canonical ABI

Underneath WIT sits the Canonical ABI, which defines exactly how high-level interface-type values are "lifted" from and "lowered" into core WebAssembly's numeric primitives at component boundaries. The Canonical ABI specifies both static type rules and dynamic runtime behavior for these conversions, ensuring that independently compiled components can reliably interoperate regardless of implementation language.

Lifting and lowering

"Lifting" means converting core Wasm integers and floats into a rich interface type when a function is called across a component boundary. "Lowering" is the inverse — serializing the rich type back to numerics before passing it into core Wasm. The Canonical ABI pins down exactly how this conversion works so that a Rust component and a Python host always agree.

wit-bindgen: Automated Glue Code

The wit-bindgen tool suite automatically generates language-specific bindings from WIT specifications, handling data serialization, deserialization, memory management, and function calling conventions across language boundaries. This automation lets developers write Wasm components in any supported language (Rust, C, C++, C#, Go) and consume them from any host environment (JavaScript, Python, Ruby) without writing manual glue code.

Language-Agnostic Composition in Practice

The full vision is that a Rust library, a Go service, and a Python script can interoperate through shared WIT contracts, composed together at runtime by the Component Model's linking layer. No language needs to know about the implementation details of the others. The Component Model has received academic scrutiny alongside its practical development: it was presented at POPL 2024 and POPL 2025 WebAssembly Workshops, covering both design rationale and formal semantics.


Extensions and Proposals

Threading and Atomics

The WebAssembly threading proposal extends the platform with shared memory and atomic instructions. Workers can write to shared buffers observed by concurrent readers, with atomic operations facilitating user-level synchronization. The model follows SC-DRF (sequential consistency for data-race-free executions). Languages like C/C++ and Rust expose different "strengths" of atomic operations that let programmers selectively weaken cross-thread synchronization for performance.

WasmGC

The WasmGC proposal introduces fixed-size structs and arrays with automatic heap management, enabling garbage-collected languages to delegate GC to the host environment's existing collector rather than shipping their own inside the Wasm binary. This is a meaningful shift from the MVP design, where memory safety was achieved purely through sandboxing without any GC. WasmGC allows managed languages (Java, Kotlin, Dart) to compile to Wasm without carrying the overhead of a custom GC.


Notable Applications

SQLite in the Browser

A concrete demonstration of WebAssembly's reach: SQLite compiled to Wasm supports the full SQL feature set — including JOINs, GROUP BY, and subqueries — when running in a browser. This makes SQLite WASM fundamentally more capable than IndexedDB for relational data patterns, since IndexedDB lacks complex query support and provides only key-value iteration.

Wasm on Embedded and IoT Devices

WebAssembly's minimal footprint and sandboxing make it attractive for embedded and IoT contexts, where running untrusted plugin code safely inside a constrained environment is valuable. However, the portability limits described above mean not all embedded targets can run standard Wasm — very memory-constrained devices may require non-standard runtime adaptations.


Controversies and Limits

WebAssembly is often described as a universal compilation target, but the evidence shows this is an aspiration rather than a guarantee. True portability is constrained by hardware realities: embedded devices with insufficient memory cannot run standard Wasm, pushing runtime implementors toward deviations that break the standard. The security model is strong but not absolute — the bounds-checked linear memory and SFI guarantees protect against memory confusion across module boundaries, but do not automatically protect against logical vulnerabilities within a module's own address space.