Lead Summary
Erlang/OTP is a programming language and framework platform built around a single design thesis: failures are inevitable, so the runtime itself should manage them. Rather than asking developers to write defensive code that anticipates every error, Erlang provides a substrate — the BEAM virtual machine and the Open Telecom Platform (OTP) framework — where failing processes are isolated, supervised, and restarted automatically. This philosophy, known as "let it crash," emerged from real-world telecom infrastructure requirements at Ericsson and produced a system capable of running millions of concurrent processes with bounded latency, transparent distribution across nodes, and hot code reloading with zero downtime.
The result is a runtime unlike any other: one that treats concurrency, fault tolerance, and distribution not as optional features to bolt on, but as first-class concerns baked into the language, the compiler, and the virtual machine itself.
Core Concepts
The Actor Model as First-Class Language Feature
Erlang implements the actor model as a native language primitive, not as a library or abstraction over threads. Every Erlang process is an actor with its own isolated state. Processes have no shared memory and communicate exclusively through asynchronous message passing. All data in messages is copied between processes (except for reference-counted binaries and literals on the same node), ensuring complete isolation: a single process failure cannot directly corrupt the state of any other process.
This shared-nothing design eliminates an entire class of concurrency bugs. There are no locks, no mutexes, no race conditions over shared state — because there is no shared state to race over.
Every Erlang process is an actor with no shared memory. Supervisors can restart, escalate, or ignore failing actors. The "let it crash" philosophy shifts fault handling from defensive programming to external monitoring and recovery.
Avoiding Function Coloring
A subtle but important property of Erlang's actor model is that it avoids function coloring entirely. In languages with async/await (Rust, JavaScript, Python), asynchronous functions infect their callers — you cannot call an async function from a synchronous context without refactoring. In Erlang, every function has the same "color." Actors communicate via asynchronous message sends and can suspend to wait for replies without changing function signatures or propagating async markers through the call graph. Concurrency is structural, not syntactic.
The OTP Supervision Framework
OTP (Open Telecom Platform) extends the raw actor model with a structured approach to fault tolerance. At its core is the supervision tree: a hierarchical arrangement of supervisor processes and worker processes.
Supervisors monitor their child processes and apply one of three configurable restart strategies when a child fails:
- one-for-one — restart only the failing child, leave others untouched
- one-for-all — restart all children when any one fails, useful when children have dependencies
- rest-for-one — restart the failing process and all processes started after it in the initialization order
This architecture makes the "let it crash" philosophy operational. Rather than adding try/catch blocks around every operation that might fail, developers design supervision hierarchies that reflect the structure of their application. A failing database connection worker gets restarted; if it keeps failing, the failure escalates up to the next supervisor level.
"Let it crash" does not mean "ignore errors." It means separating the code that does the work from the code that handles failure. Workers focus on the happy path; supervisors handle recovery. This produces simpler, more readable code in both places.
Mechanism & Process
BEAM: A VM Designed for Concurrency, Not Throughput
The BEAM (Bogdan/Björn's Erlang Abstract Machine) was designed specifically to optimize for concurrent execution and fault-tolerant systems, not raw single-threaded throughput. This distinguishes it fundamentally from general-purpose VMs like the JVM, which was designed for enterprise applications with different performance priorities.
The BEAM implements lightweight processes that are not OS-level threads. A newly-spawned Erlang process uses approximately 327 words of memory — roughly 2.5 KB on 64-bit systems. Compare this to OS threads, which typically require 1–2 MB of stack space each. This difference of three orders of magnitude is what makes massive concurrency practical: the BEAM can run millions of concurrent processes within a single OS process.
Preemptive Scheduling via Reduction Counting
BEAM's scheduler is preemptive, but it achieves preemption through a mechanism that operates above the OS level. The Erlang compiler and VM cooperate: the compiler generates bytecode that increments a reduction counter on each function call. The VM checks this counter at regular points and performs a context switch when a process's reduction budget is exhausted.
This approach — cooperative at the C level, preemptive at the language level — produces more predictable scheduling than OS-level time-slicing. Heavy processes cannot starve lighter ones; the scheduler guarantees that every process gets runtime within a bounded number of reductions.
The consequence is soft real-time behavior: predictable latency and responsiveness for most production scenarios, without hard real-time guarantees. Timeouts in Erlang are guaranteed not to fire before their deadline, but may fire somewhat after — acceptable for the telecom and web infrastructure workloads BEAM targets.
Per-Process Garbage Collection
Erlang's garbage collector operates per process, not globally. Each process heap is divided into a young generation and an old generation: most data in short-lived processes becomes garbage quickly, so the collector focuses on young generation objects, reducing the overhead of full heap scans.
Because GC is per-process, garbage collection pauses are bounded by the size of a single process's heap — not the entire application's heap. A GC pause in one process does not stop other processes. This property, combined with preemptive scheduling, is what makes BEAM systems exhibit low, predictable tail latency rather than periodic long pauses.
Transparent Distribution
BEAM's distribution model extends the local actor model across the network. Processes on different nodes communicate through the Erlang distribution protocol with code that is nearly identical to local message passing — the difference is largely invisible to application code.
The infrastructure behind this transparency involves two components:
- EPMD (Erlang Port Mapper Daemon) — automatically started when an Erlang node launches, EPMD maps symbolic node names to machine addresses and port numbers, enabling dynamic node discovery
- Cookie-based authentication — nodes authenticate each other using shared secrets (cookies), providing rudimentary access control over which nodes can join a cluster
Nodes connect on-demand: when one node references a process on another named node, the connection is established automatically. This creates what Erlang documentation calls a "loosely connected" distributed system — nodes are peers, not clients and servers in the traditional sense.
Hot Code Reloading
The BEAM supports hot code reloading: new module versions can be loaded into a running system without stopping it. The VM maintains two module versions simultaneously — the old code continues running in existing processes, while new process spawns use the updated code. A process can also explicitly switch to the new version by calling a fully-qualified function.
This capability was a design requirement from Erlang's telecom origins, where systems must achieve "nine nines" (99.9999999%) uptime and cannot be restarted to deploy updates. It remains one of BEAM's most distinctive features.
Components & Structure
The BEAM Ecosystem
While BEAM was created to run Erlang, its architecture is language-agnostic. Several languages now compile to BEAM bytecode and benefit from the same runtime properties:
- Elixir — a Ruby-inspired syntax built on BEAM, with the same OTP capabilities
- LFE (Lisp Flavoured Erlang) — a Lisp dialect running on BEAM
- Gleam — a statically typed language targeting BEAM
The BEAM process model and OTP framework are available to all of these languages, making "Erlang/OTP" increasingly a description of the platform rather than a single language.
OTP Behaviors
OTP provides a set of standard behaviors — generic implementations of common process patterns that handle the supervision-compatible boilerplate while letting developers fill in application-specific callbacks:
- GenServer — a generic server process with synchronous call and asynchronous cast interfaces
- Supervisor — a process whose sole job is monitoring children and applying restart strategies
- Application — the top-level structure wrapping a supervision tree for deployment
These behaviors enforce a consistent structure across Erlang codebases and ensure that any OTP-compliant process can be placed into a supervision tree.
Reception & Influence
Erlang's actor model and supervision philosophy have influenced a generation of concurrent systems. The Akka toolkit brought actor-based supervision trees to the JVM (Scala and Java). Elixir democratized access to the BEAM platform with a more approachable syntax, expanding the community significantly. Go's goroutines adopt lightweight concurrency but without the structured fault-tolerance layer OTP provides.
The "let it crash" philosophy has become a recognized design pattern beyond Erlang, cited in discussions of resilient microservice architectures and Kubernetes pod restart policies — though neither replicates the granularity and transparency of OTP supervision trees running inside a single VM.
Controversies & Debates
Erlang's syntax — inherited from Prolog, with comma/semicolon/period termination conventions and single-assignment variables — is frequently cited as a barrier to adoption. The language's Prolog heritage makes it unfamiliar to developers coming from C-family languages. Elixir addressed this by providing Ruby-inspired syntax over the same runtime.
The cookie-based authentication model for BEAM distribution has been noted as a weak security boundary. The EEF Security Working Group explicitly documents that Erlang distribution was designed for trusted networks and requires additional hardening (TLS, firewalling) for use in untrusted environments. The EPMD model, where any process on the same machine can query node addresses, adds to the exposure surface.