Engineering

PostgreSQL

A relational database engine at the center of modern data architecture

Lead Summary

PostgreSQL is a production-grade open-source relational database management system that has become the backbone of a wide range of modern application architectures, from traditional server-side web apps to distributed and local-first systems. Its combination of standard SQL compliance, MVCC-based concurrency control, extensible constraint system, and mature logical replication infrastructure makes it the server-side source of truth that sync engines, ORM layers, and offline-capable clients coordinate around. As of 2026, PostgreSQL occupies a central role not only as a standalone database but as the upstream publisher in a growing ecosystem of sync tools that replicate its data to embedded databases on clients.

Core Concepts

Multi-Version Concurrency Control

PostgreSQL uses Multi-Version Concurrency Control (MVCC) where readers do not block writers by default. Each transaction sees a snapshot of the database consistent with the moment the transaction began. This mechanism is what allows high-throughput concurrent access without full table locking, but it comes with a significant operational implication: vacuuming.

When a transaction runs for a long time, PostgreSQL cannot reclaim dead tuples that might still be visible to that transaction's old snapshot. This inability to autovacuum results in heap bloat, index swelling, and severely degraded performance under heavy write loads. Keeping transactions open too long is therefore a critical anti-pattern that degrades concurrency even though it does not cause immediate errors.

Constraint System and Deferability

PostgreSQL supports a rich constraint system where certain types — UNIQUE, PRIMARY KEY, REFERENCES (foreign keys), and EXCLUDE — can be declared DEFERRABLE, meaning their enforcement is delayed until commit time rather than checked immediately on each row operation. NOT NULL and CHECK constraints, however, are always checked immediately. Transactions are the units of consistency: a transaction that cannot commit with all constraints satisfied is rolled back entirely.

This deferability mechanism is important for multi-step operations that pass through intermediate invalid states. It enables optimistic mutation patterns in local-first architectures, where speculative writes to a local database mirror a server-side transaction that must ultimately satisfy all constraints at commit time.

Write-Ahead Log as a Synchronization Primitive

PostgreSQL's Write-Ahead Log (WAL) is an append-only, sequential record of all changes applied to the database. Replaying a WAL sequence is deterministic and fast: immutability guarantees the same final state regardless of when replay happens, and sequential reads are cache-friendly. This property makes the WAL the foundation for PostgreSQL's streaming and logical replication protocols, and also the mechanism that local-first sync engines exploit to keep client replicas consistent with the server.

The WAL is not just a durability tool. It is the interface through which any downstream system — replica, sync engine, or event stream — can consume a reliable, ordered feed of every state change in the database.

Replication

Asynchronous vs. Synchronous Streaming Replication

PostgreSQL supports two replication durability modes. Asynchronous replication (the default) considers a transaction complete once committed on the primary node, with replicas consuming the WAL feed on their own timeline. This has minimal performance overhead but introduces a window of data loss on failover: any operations committed but not yet shipped to replicas are lost if the primary crashes.

Synchronous replication requires that a transaction be acknowledged by all configured replicas before returning success to the client. This closes the data-loss window but can cut write throughput by more than half over slow or high-latency network links. PostgreSQL 10 introduced quorum commit as a middle ground, allowing configuration of how many replicas must acknowledge before a commit is considered durable.

Replica Lag and Failure Modes

Replicas fall out of sync through several mechanisms: network interruptions cause replicas to stop receiving WAL segments; the primary may write changes faster than a replica can apply them; and misconfigured WAL retention settings can cause log segments to be recycled before a slow replica consumes them. Hardware failures and bandwidth constraints compound these risks. Managing replica lag requires monitoring replication slot positions and tuning WAL retention settings to match the slowest replica in the fleet.

Logical Replication and Binary Format

Beyond streaming replication, PostgreSQL supports logical replication: a higher-level protocol that publishes table-level row changes (INSERT, UPDATE, DELETE) rather than raw WAL blocks. This enables selective, cross-version replication and is the mechanism used by sync engines like ElectricSQL and PowerSync to consume server changes.

PostgreSQL 16 introduced binary format for initial synchronization, significantly improving performance for large table snapshots compared to the TEXT format. Binary sync reduces both transfer time and disk I/O, mitigating WAL storage exhaustion during long initial sync operations — a problem that becomes acute when onboarding new clients with large datasets.

Logical replication does not replicate constraints

PostgreSQL's logical replication protocol does not ship schema objects like foreign key constraints or indexes to subscribers. Each subscriber must manage its own constraint definitions, and foreign keys between replicated and non-replicated tables can cause failures at the application level.

Schema Design for Distributed Clients

When PostgreSQL serves as the backend for applications with offline or distributed clients, schema design choices that are irrelevant in single-node scenarios become load-bearing architectural decisions.

Client-Generated IDs

Server-based ID generation — through SERIAL, IDENTITY, or SEQUENCE columns — is fundamentally incompatible with offline-first applications. Generating a server-assigned ID requires a round-trip to the database, which blocks while offline. UUID primary keys are the practical standard for local-first sync schemas because they can be generated on the client without any server contact, enabling offline record creation that can be synchronized when connectivity is restored.

Within UUID variants, the choice matters for server-side performance. UUIDv4 is random and scatters inserts across the B-tree index, causing page splits and bloat. UUIDv7 is time-ordered, with a 48-bit Unix timestamp prefix, which means new inserts append to the right side of the index — producing behavior similar to an auto-increment sequence but without server coordination. PostgreSQL 18 adds a native uuidv7() function, making this the recommended UUID strategy going forward.

ULIDs (Universally Unique Lexicographically Sortable Identifiers) offer an alternative with lexicographic sortability and approximately 30% faster generation via the pgx_ulid extension compared to both UUID variants, though they require extension dependencies and text storage inflates size by about 60% compared to native UUID storage.

Unique Constraints in Distributed Contexts

Unique constraints are problematic when multiple offline clients can independently generate records with conflicting values. When two clients create rows with the same unique-constrained value while offline, the first sync to the server succeeds while the second blocks or deadlocks. This is the exact pattern that ElectricSQL explicitly does not support — tables must have unique constraints removed before being "electrified." In distributed sync schemas, uniqueness validation should be handled through application logic or eventual consistency patterns rather than database-level constraints on non-primary-key columns.

Foreign Keys and Partial Replication

PostgreSQL's logical replication does not synchronize foreign key constraints themselves to subscribers. When clients hold only a partial replica of the database — a subset of rows from a subset of tables — enforcing foreign keys locally can block valid inserts where the parent record exists on the server but is outside the client's replicated set. Practical local-first schemas either disable foreign key enforcement on the client-side SQLite replica, manage referential integrity through application logic, or use sync rules that guarantee parent and child records are always replicated together.

Schema Migration Across Distributed Client Versions

In a local-first system, multiple versions of the application run simultaneously on different clients with different schema versions. Schema migration becomes a source of data loss when an older client reads and re-saves a record created by a newer client: the older client's deserializer silently strips unknown fields, and the write-back permanently removes them from the server.

The write-both/read-new pattern addresses this: new versions write data to both old and new columns during a transition window, keeping old clients functional while new columns propagate. This requires at minimum two deployment phases and temporary data duplication. PowerSync's schemaless approach sidesteps this by replicating Postgres data as JSON to client SQLite and applying views client-side, decoupling the server schema from the client schema entirely.

PostgreSQL in Local-First Architectures

A wave of sync engines has emerged that position PostgreSQL as the authoritative server database and replicate subsets of its data to embedded databases on client devices — typically SQLite, sometimes PGlite (a WebAssembly port of Postgres). The central problem these systems solve is making offline-capable, real-time-collaborative applications without requiring developers to build custom synchronization logic.

SQLite as the Client Mirror

SQLite is the dominant client-side storage solution for local-first Postgres sync, running as a WASM build in browsers (backed by the Origin Private File System for persistence) and as native SQLite on mobile platforms. Notion's migration from IndexedDB to SQLite WASM improved page navigation times by 20%, providing production-scale evidence of the performance advantages.

OPFS (Origin Private File System) provides persistent byte-level file access for SQLite WASM in browser contexts, but only inside Web Workers — not on the main UI thread. This adds architectural complexity but enables synchronous I/O and the microsecond-latency reactive query cycles that local-first UIs depend on. Multi-tab access to the same OPFS database requires a SharedWorker to coordinate exclusive access.

The Three Major Sync Engines

Fig 1
ElectricSQL Read-path sync Shapes (partial replication) Auth: sync-time CRDTs for conflicts GA: March 2025 PowerSync Postgres → SQLite Bucket model (partial replication) Auth: sync-time Sync Rules + JWT 10+ yrs production Zero (Rocicorp) Query-driven sync ZQL client queries → zero-cache Auth: query-time Transform endpoint Always-online
Authorization timing comparison across sync engines

ElectricSQL implements a read-path sync model where the sync engine handles reading from PostgreSQL and distributing changes to clients, while writes go through separate API endpoints (GraphQL, REST, tRPC, server actions) that then update the database. Shapes are the core primitive: a shape defines a subset of PostgreSQL data by specifying a table, selected columns, and an optional WHERE clause. The proxy or backend defines the shape's authorization boundary; clients can only add further subset filters, never escape the server-defined WHERE clause. ElectricSQL uses CRDTs for conflict resolution and released version 1.0 in March 2025.

PowerSync syncs backend PostgreSQL to embedded SQLite on client devices using a bucket-based replication model. Sync Rules define a two-stage ETL pipeline: a parameter query extracts user context from JWT tokens, and a data query maps rows to buckets — a row is synced to a device only if its bucket IDs intersect with the user's assigned bucket IDs. This enforces authorization at sync time. PowerSync's core sync technology has been in production for over a decade with Fortune 500 enterprise users in energy, manufacturing, and mining, reporting zero incidents of data loss.

Zero (by Rocicorp) uses a query-driven architecture where client code writes ZQL queries that define exactly what data to sync. A stateful middleware service called zero-cache maintains a SQLite replica of the upstream Postgres database via logical replication and Postgres Event Triggers. Authorization is evaluated on every query through a server-side transform endpoint, not pre-computed in sync rules. This means Zero adapts to permission changes immediately but requires always-online operation.

Sync-Time vs. Query-Time Authorization

This design dimension — when authorization is evaluated — is a fundamental trade-off across local-first systems:

  • Sync-time systems (PowerSync, ElectricSQL) batch authorization into sync rules or shape definitions. Permissions are computed once during sync and cached locally. This works well offline but creates a time-of-check-to-time-of-use gap: stale permissions from initial sync may remain in effect until the next sync cycle.
  • Query-time systems (Zero) evaluate permissions on every query via a server-side transform. Permissions are always fresh, but the system requires connectivity for every authorized query.

Conflict Resolution Strategies

When the same data is modified offline on multiple clients, sync engines must resolve the resulting conflicts. Three strategies are in common use:

Last-Write-Wins (LWW) keeps the change with the latest timestamp and discards others. It is simple to implement and sufficient for roughly 95% of applications where users edit largely non-overlapping data. However, LWW silently discards concurrent writes without any notification, making it inappropriate for rich collaborative editing where concurrent meaningful changes are common.

CRDTs (Conflict-Free Replicated Data Types) are mathematical data structures designed so that any two replicas applying the same set of operations in any order arrive at identical state. Changes are commutative — merge order does not affect the result. Both ElectricSQL and PowerSync with Yjs implement CRDT-based synchronization. An experimental pg_crdt Postgres extension from Supabase brings CRDT types natively to the database layer.

CRDTs in production are not simple

While CRDTs prevent the silent data loss of LWW, deploying them in production introduces novel complexity: managing sequence rules, preventing node divergence, and ensuring referential integrity across CRDT merges. Researchers have identified gaps that remain before CRDTs can be considered fully production-ready in general-purpose settings.

Optimistic mutations are a write-path pattern distinct from conflict resolution: the client applies a speculative update immediately to the local database and UI, then waits for server confirmation. If the server rejects the mutation, the client rolls back to the pre-mutation state. This pattern produces instant UI feedback without waiting for network round-trips. Libraries like TanStack Query and Pinia Colada expose lifecycle hooks (onMutate, onError, onSettled) that implement the rollback flow.

In server-authoritative systems like Replicache and PowerSync, the server remains the sole authority for validating and applying changes. Clients create speculative results that are pushed to the server for canonical execution, preventing malicious or buggy clients from injecting invalid state.

Performance Considerations

Distributed PostgreSQL Overhead

Distributed PostgreSQL deployments — where queries must traverse multiple nodes — can increase latencies by 10–100x compared to single-node. Network round-trips between nodes, logical replication processing, conflict detection, and write amplification across replicas all compound. Three replicas create approximately 4x total write load on the system: the primary write plus three replica writes, each of which requires updating all indexes on the modified tables.

Local-first sync architectures address a related but distinct problem: rather than distributing the Postgres cluster itself, they keep Postgres as a single-node (or conventionally replicated) authoritative server and move query execution to client-side SQLite replicas. Network latency becomes invisible to users because reads hit a local database; only writes and sync operations traverse the network.

Schema Isolation as a Performance Boundary

PostgreSQL's schema namespace combined with role-based access control provides a practical mechanism for data isolation within a single Postgres instance. By giving each application module its own schema and a scoped database role, the database itself enforces module boundaries at runtime without requiring separate physical databases. This is an accessible first step toward microservice-style data boundaries while retaining the operational simplicity of a monolith.

Integration with the Broader Ecosystem

ORM Layers

Drizzle ORM supports PostgreSQL, MySQL, and SQLite with a shared schema definition that works across WASM SQLite clients and native PostgreSQL servers. NuxtHub uses PGlite (embedded PostgreSQL) for local development when no environment variables are set, supporting local-first development workflows, and routes production reads to replicas via Drizzle's withReplicas() feature.

Prisma provides runtime type-safety by pairing schema-generated types with Zod validation, reducing the risk of runtime errors from database drift.

SQL as a Declarative Language

SQL is a declarative query language: a developer specifies the result set desired, not the procedure for obtaining it. The database's query planner determines the optimal execution plan independently of query authorship. This separation means that SQL queries can be transparently optimized by changing indexes or statistics without changing application code — a property that local-first reactive query systems exploit by re-running the same SQL query whenever underlying data changes, trusting the planner to keep the operation fast.

Compliance and Data Residency

GDPR compliance for PostgreSQL applications requires encryption in transit and at rest, audit logging, data minimization, and right-to-be-forgotten mechanisms (the ability to hard-delete personal data on request). Data residency constraints — ensuring stored data remains within a specific legal jurisdiction — add complexity in multi-region cloud deployments. Local-first architectures compound this: data synced to client devices may be persisted in OPFS storage outside the control of the server's compliance posture, requiring careful thought about what is and is not replicated to clients.

Further Exploration

Official References

Sync Engines

Deep Dives