Engineering

Production Readiness

The operational realities that only surface at scale — and how to plan for them before they bite you

Learning Objectives

By the end of this module you will be able to:

Design a tombstone garbage collection strategy that prevents unbounded table growth in a local-first system.
Estimate the metadata overhead introduced by sync infrastructure and translate that into concrete storage budget decisions.
Identify the sync-everything anti-pattern and apply selective sync to control client footprint.
Recognize workloads — analytics, reporting, aggregation — that should not go through the local-first sync path.
Assess the full implementation effort of a local-first migration and communicate that effort accurately to stakeholders.

Key Principles

These principles do not map to single algorithms. They are heuristics for production-grade judgment — the kind of thing you internalize after debugging your second incident at 2am.

1. Tombstones are not free. They are debt.

Every delete in a sequence CRDT creates a tombstone: a record that the element existed and was removed. Tombstones cannot be discarded safely without coordination, because another peer might not yet have seen the delete and could try to reference the element. Without a garbage collection strategy, tombstones accumulate indefinitely.

As documented in CRDT production experience, sequence CRDTs grow in proportion to the number of deleted items. A 1,000-character document that has been heavily edited may internally contain 50,000 tombstones. An empty todo list after 10,000 task creates and deletes still retains 10,000 tombstone entries. This is not theoretical: it is the default behavior without an active GC strategy.

Yorkie's production GC design shows one approach: every deleted element becomes a tombstone, and the GC pass can only run when causal stability is confirmed — i.e., when you know all peers have received the delete. Until then, the tombstone stays.

Design implication: plan your GC strategy before you go to production, not after your storage bill spikes. Decide on epoch-based pruning or causal stability tracking, and make that strategy part of your schema and sync protocol.

2. Metadata overhead is a budget item, not an implementation detail.

CRDT implementations add 16–32 bytes of metadata per character to track authorship, causality, and position in the logical sequence. A 10,000-character document grows from approximately 10 KB to 320 KB. Every user interaction adds more metadata that the CRDT must retain.

This is why CodeMirror 6 explicitly rejected CRDTs, citing "significant cost" and "level of extra complexity and memory use." For mobile applications and resource-constrained devices, this is not a footnote — it is an architectural constraint.

Design implication: benchmark your actual document sizes early. If your app stores structured records rather than free-form text, the overhead is often more manageable. If you store large text fields, factor in a 10–30x memory multiplier per document and plan client-side storage quotas accordingly.

3. Do not sync everything. Ever.

The sync-everything approach — replicating an entire dataset to every client — is the most common anti-pattern in local-first implementations. It feels like a shortcut during development (no filtering logic, no access rules, instant offline support) and becomes a liability in production.

As RxDB documents: if your dataset is measured in gigabytes, it is simply not feasible to download everything to every client. Attempting to display server logs by downloading terabytes of data to the client defeats the purpose entirely.

The correct model is dynamic partial replication: the shape of replicated data changes over time based on user context, security rules, and runtime parameters. A user working on Project A should not receive data from Project B. A user who has never opened the "Analytics" section should not have that data pre-loaded on their device.

Design implication: define your sync shape as early as you define your data model. Sync shapes are not a performance optimization you add later — they determine what goes on the wire and what stays on the server.

4. Local-first is an OLTP pattern. Keep analytics on the server.

Local-first architectures are optimized for transactional workloads: a user works with a bounded set of records (their tasks, their notes, their messages), modifies them offline, and syncs changes. This is OLTP behavior with a bounded working set.

Analytics requires something fundamentally different: processing large historical datasets — often gigabytes or terabytes — to compute aggregations, trends, and cohort summaries. The local-first promise of "work offline, sync later" breaks down when you cannot practically download the entire dataset to a client device.

Trying to route analytics through the local-first sync path — pulling historical records to the client for aggregation — creates performance problems that no caching strategy can solve. Device computational capabilities vary wildly, making performance unpredictable.

Design implication: analytics, reporting dashboards, and aggregation queries belong on the server. Build a separate read path (a standard server-rendered Nuxt page calling a Postgres query) for anything that touches historical aggregation. Do not try to make the local-first layer do both jobs.

5. CRDT reads are not guaranteed to be stable.

CRDTs guarantee eventual convergence, not safe reads. While the final state will be consistent across all peers, intermediate states observed during reads can be non-deterministic. If update functions are non-inflationary, a CRDT read value can be completely arbitrary — your application might observe a high value now and a lower one after the next sync merge.

As CRDT literature documents, many developers erroneously believe they are getting correctness guarantees from CRDTs when they are not. Developers inevitably inspect CRDT state in unsafe ways, and this creates subtle application bugs that are hard to reproduce because they depend on sync timing.

Design implication: treat every intermediate CRDT value as potentially temporary. Do not derive irreversible decisions (sending an email, billing a customer, deleting a file) from a CRDT read without first confirming stability through your sync protocol. Use well-designed library query interfaces (Yjs, Automerge) rather than inspecting raw CRDT state.

6. Implementation cost is front-loaded and non-trivial.

Implementing a local-first system correctly requires deep expertise: CRDT algorithms, conflict resolution semantics, sync protocol design, GC strategy, and ongoing maintenance. Most teams face a choice between spending months building and debugging custom infrastructure or adopting existing libraries with their own constraints and learning curves.

CRDT and OT implementations are notoriously difficult to debug. The engineering complexity diverts resources from core product features, which is why many teams deprioritize local-first adoption despite understanding its benefits.

Design implication: use this as a communication tool with stakeholders, not a reason to abandon the approach. Local-first migration is a multi-sprint investment with a non-linear learning curve. Scope the first milestone narrowly (one data type, one sync shape, one conflict strategy), validate it in production, then expand.

Annotated Case Study

A project management app migrates to local-first — three production incidents

This case reconstructs a representative sequence of production issues that emerges when a team ships a local-first feature without addressing the operational concerns above. It is composite and illustrative, not a single company's story.

The initial release goes well. The team ships offline task editing using a sync engine with last-write-wins conflict resolution. Users can edit tasks offline and sync when reconnected. Demo is fast. The team feels confident.

Incident 1: Storage alerts fire at week 6.

The sync metadata table — which stores tombstones for deleted tasks — has grown to 2.3 million rows for a 300-user team. The actual live tasks table has 40,000 rows. The ratio is 57:1.

What happened: Tasks are created and deleted frequently. Each deletion leaves a tombstone. No GC strategy was implemented because it seemed like a future concern. At the current growth rate, the tombstone table will exceed the live data table by 100:1 within three months.

The fix: The team implements epoch-based pruning: once a week, tombstones older than 30 days are deleted, provided all active clients have confirmed sync beyond that epoch. The job runs as a scheduled Postgres function. Storage stabilizes.

The lesson: Tombstone GC is not a future concern. It is a day-one architectural decision. The 30-day epoch is a policy choice — different apps need different values based on how long users realistically stay offline.

Incident 2: A new enterprise client's device runs out of storage at week 10.

The client has 8,000 projects. On first login, the app syncs all project metadata to the device. The sync shape was defined as "all projects this user can access," which seemed reasonable for a 50-project team.

What happened: The sync-everything shape hits a data volume it was never tested against. Initial sync takes 11 minutes, then the mobile app reports storage exhaustion on the user's device (an older iPad with 32 GB storage, 60% full before install).

The fix: The team redesigns sync shapes to replicate only projects the user has opened in the last 30 days, plus the 20 most recently modified. Everything else is fetched on demand. Initial sync drops to under 30 seconds.

The lesson: Dynamic partial replication is not an optimization. It is a correctness requirement for apps with non-trivial data volumes. Test sync behavior with production-representative data volumes in staging.

Incident 3: The analytics dashboard silently breaks at week 14.

A product manager reports that the "tasks completed this quarter" chart shows different numbers depending on which browser tab they use and whether they have been offline recently.

What happened: The team had routed the analytics aggregation query through the local CRDT state rather than through a server-side query. The dashboard was computing COUNT() over local CRDT records. Because intermediate CRDT reads are non-deterministic, and because the dashboard was reading state before the latest sync had fully merged, it was observing stale intermediate values. Different tabs had different sync states.

The fix: The analytics dashboard is rebuilt as a standard server-rendered page that queries Postgres directly. The local-first layer is explicitly scoped to transactional workflows only. A clear architectural rule is documented: if a feature requires aggregation over historical data, it goes through the server read path.

The lesson: Analytics workloads are incompatible with the local-first sync path. Route them to the server. The local-first layer is not a general-purpose database for all query shapes.

Common Misconceptions

"Tombstones are only a problem for collaborative text editing."

Tombstones appear in any CRDT that supports deletion from a sequence: task lists, comment threads, ordered collections. If your app supports delete operations on list-type data, you have tombstone accumulation. The text editing case is dramatic (50,000 tombstones for a 1,000-character document) but the pattern applies broadly.

"I'll add GC later when it becomes a problem."

GC requires knowing when it is safe to prune — which requires causal stability tracking, which requires the sync protocol to propagate confirmation that all peers have received the deletion. Retrofitting this into an existing sync protocol after production launch is significantly harder than designing it in from the start. "Later" usually means "during an incident."

"Partial sync adds complexity — I'll start with full sync and optimize."

This is the most expensive misconception on the list. Full sync shapes become load-bearing once clients have downloaded and cached data. Changing sync shapes after the fact requires careful migration of client-side state and re-sync logic. The "optimize later" assumption treats sync shape as an implementation detail when it is an API contract.

"Local-first gives me correctness guarantees everywhere."

CRDTs guarantee convergence — the final state will be consistent. They do not guarantee that any intermediate state you read is accurate or stable. If your application logic depends on reading CRDT state to make decisions, those decisions may be based on values that have not yet converged. Convergence is a guarantee about the future, not about right now.

"16–32 bytes per character is fine for my use case."

This is only fine if you have benchmarked your actual document corpus. A todo item title ("Fix login bug") is ~13 characters — metadata overhead is manageable. A description field containing a technical specification at 8,000 characters becomes 240 KB of CRDT state. Multiply by the number of documents a typical user holds and verify against your target device constraints.

Boundary Conditions

Local-first architecture is well-suited for specific problem shapes. These are the conditions under which it starts to strain:

When the working set cannot fit on the client device. Local-first depends on being able to hold a meaningful subset of the data locally. When the data the user needs exceeds device storage — either in absolute terms or after CRDT metadata overhead — you cannot provide a useful offline experience without aggressive filtering. If filtering to a manageable subset is not semantically meaningful for your app, local-first may not be the right fit.

When queries require cross-user aggregation. If a feature requires knowing "how many tasks did all users in this organization complete this month," local-first cannot help — no single client has all the data. This aggregation lives on the server. The boundary is clear: single-user or small-team transactional workflows are in scope; organization-wide analytics are out of scope.

When conflict resolution semantics cannot be defined. Local-first requires you to specify what "correct" means when two offline edits conflict. For free-form text, merge semantics are well-established. For structured state transitions (a task moving from "In Progress" to "Done" while another peer cancels it), the right merge behavior is domain-specific. If your domain has conflicts you genuinely cannot specify a policy for, local-first adds complexity without adding safety.

When your team cannot sustain the operational overhead. CRDT and sync implementations are notoriously difficult to debug and maintain. A two-person team shipping an MVP under time pressure may not be the right context for a ground-up local-first implementation. Using a library (ElectricSQL, PowerSync, Triplit) shifts maintenance burden to the library maintainers but introduces a dependency and its own constraints.

When regulatory requirements prohibit client-side data storage. Some compliance regimes (certain healthcare, finance, or government contexts) restrict where data can be persisted. If your users are subject to these requirements, verify that client-side CRDT state — which is a full local replica of the data — is compliant before committing to the architecture.

Thought Experiment

You have shipped a local-first feature to production. It works well for individual users editing their own records. The CTO now asks whether you can extend the same architecture to power a real-time executive dashboard that shows live counts of records across all users in the organization.

Before answering, work through these questions:

Where does the data for this dashboard actually live? Does any single client have all of it?
What happens to the dashboard when five users are offline and making changes simultaneously? What state does the dashboard show?
If you computed these aggregates by pulling all records to a "server client" via the sync engine and running queries there — what would the tombstone accumulation look like at scale?
What alternative architecture would you propose instead? How would you draw the boundary between what goes through the local-first sync path and what does not?
How would you explain this architectural boundary to a CTO who sees the two features (offline task editing and live dashboard) as natural extensions of each other?

There is no single correct answer. The value is in being precise about where local-first guarantees hold and where they do not — and in being able to articulate that boundary clearly to someone who has not built the system.

Key Takeaways

Tombstone accumulation is unbounded by default. Sequence CRDTs never discard deleted elements without an active GC strategy. Plan epoch-based or causal-stability-based pruning before you go to production.
CRDT metadata overhead is a storage multiplier. Expect 16–32 bytes of metadata per character in text CRDTs — a 10–30x size multiplier on text documents. Budget for this in client storage quotas and test with production-representative data volumes.
Sync shapes are an API contract, not a performance optimization. Syncing everything to every client fails at non-trivial data volumes. Define partial, user-scoped sync shapes from the start and test them with the largest realistic dataset your system will encounter.
Local-first is OLTP. Analytics belongs on the server. The local-first sync path cannot support aggregation across historical datasets. Draw a hard architectural boundary: transactional workflows through the sync engine, reporting and analytics through direct server-side Postgres queries.
Implementation effort is front-loaded and non-linear. Local-first migration is a multi-sprint investment requiring expertise in CRDTs, sync protocol design, GC strategy, and conflict semantics. Scope the first production milestone narrowly, validate it, then expand.

Further Exploration

On tombstone GC and CRDT memory growth

Kevin Jahns — Are CRDTs suitable for shared editing? — The canonical analysis of sequence CRDT memory growth in real editing scenarios.
Yorkie Garbage Collection Design — A production GC design from a team that ships CRDTs at scale.

On metadata overhead and memory budgets

CRDTs in Production (InfoQ) — A practitioner talk on real-world memory trade-offs.
Optimizing CRDTs for Low Memory Environments (ECOOP 2025) — Academic work on constrained-device CRDT optimization.

On sync shapes and partial replication

ElectricSQL — Developing local-first software — Covers dynamic partial replication as a first-class architectural concern.
RxDB — Downsides of offline-first — Honest accounting of the failure modes, including the sync-everything trap.

On CRDT read semantics

Joe Hellerstein — CRDTs: Convergence, Determinism, Lower Bounds — A rigorous treatment of why CRDT reads are not guaranteed to be stable.

On implementation effort

The CRDT Dictionary: A Field Guide — Clear-eyed survey of the implementation complexity landscape.