Engineering

Frontend–Backend Contracts

Designing the API boundary from both sides: protocol choices, BFF patterns, schema-driven workflows, and resilient error handling

Learning Objectives

By the end of this module you will be able to:

  • Compare REST, GraphQL, and tRPC across caching, type safety, complexity, and team-fit dimensions and recommend one for a given scenario.
  • Describe the Backend-for-Frontend pattern, explain when it adds value versus when it adds complexity, and identify the team ownership requirement for it to work.
  • Implement schema-driven type generation from an OpenAPI spec and explain how it prevents type drift.
  • Explain the N+1 problem in GraphQL and the DataLoader solution.
  • Use React Error Boundaries and retry strategies to build resilient data-fetching flows.

Core Concepts

Protocol choice: REST, GraphQL, tRPC

Three protocols dominate modern frontend–backend communication. Each makes a distinct set of tradeoffs, and picking the wrong one for your context will cost you more than any performance micro-optimization.

REST is the default for good reason. It has decades of ecosystem stability, universal library support across languages, and it maps cleanly onto HTTP semantics. The practical payoff is caching: REST APIs enable HTTP-layer caching through Cache-Control headers and CDN integration, something that GraphQL cannot take advantage of at the network layer. REST's explicit schemas also enable automatic breaking-change detection and code generation tooling. The cost is over-fetching and under-fetching — but over-fetching impact depends heavily on client context; mobile networks show material penalties while broadband networks experience negligible differences.

GraphQL earns its place when you serve multiple client types — mobile, web, embedded — with genuinely different data requirements. GraphQL excels when serving multiple clients with different data requirements through query flexibility. The tradeoff is steep: GraphQL's single POST endpoint breaks HTTP-layer caching that REST leverages, forcing all caching logic to the application layer, and GraphQL's rate limiting is substantially more complex because uniform endpoints obscure variable computational costs, requiring complexity analysis and cost directives. GraphQL's learning curve is also steeper, and the complexity costs can be unjustified for simple CRUD applications.

tRPC offers end-to-end TypeScript type safety with zero API schema maintenance — the types are the contract. But tRPC achieves end-to-end type safety through monorepo-enforced TypeScript type sharing and cannot be used in non-monorepo settings without losing its primary advantage of automatic type synchronization. There are also scaling ceilings: tRPC experiences TypeScript performance degradation above 100–200 endpoints due to type inference overhead, and it requires synchronized deployment, preventing independent backend scaling or staged rollouts.

On latency and protocol choice

The N+1 problem in GraphQL

The N+1 problem is the most dangerous operational risk in GraphQL. When a resolver fetches related data naively — one query to get N items, then one query per item for its associations — database load multiplies. The N+1 query problem can multiply database load by 100x or more at scale without data loader batching and request-scoped caching.

The solution is DataLoader. Data loaders have become the production standard for mitigating N+1 problems by collecting and batching requests within a single GraphQL operation. The same principle is confirmed from the schema-driven angle: DataLoader and batching mechanisms prevent N+1 problems in GraphQL by batching database queries, dramatically improving latency and database load.

Fig 1
Naive (N+1) SELECT * FROM posts SELECT * FROM users WHERE id=1 SELECT * FROM users WHERE id=2 ... N more queries 1 + N queries total DataLoader (batched) SELECT * FROM posts SELECT * FROM users WHERE id IN (…) 2 queries total
N+1 naive fetch vs DataLoader batching within a single GraphQL operation

Backend-for-Frontend (BFF)

The BFF pattern adds a dedicated server layer between each client type and your internal services. Rather than exposing raw microservice APIs to frontend clients, a BFF aggregates, transforms, and tailors data for each surface — mobile, web, embedded devices — independently.

BFFs aggregate data from multiple microservices into task-centric APIs tailored to frontend UX needs rather than backend database schema structure. The word "task-centric" matters here. A BFF is not an API gateway with a thin wrapper — it reflects the operations users actually perform, not the shape of your internal data models.

Fan-out/fan-in is a fundamental composition pattern in this context: gateways initiate parallel downstream requests, collect responses, and assemble unified payloads. This is something your backend instincts will recognize as entirely natural — it is parallel service orchestration with aggregation, implemented at the API layer.

BFFs also double as security boundaries. BFFs serve as security boundaries protecting internal services from direct client access and centralizing sensitive business logic.

Where it breaks down. The BFF pattern is not free. Multiple uncoordinated BFFs become a distributed monolith; proper scaling requires BFFs owned per frontend context, not per organization. Multiple BFFs also introduce code duplication in aggregation and service integration logic, and shared libraries create conflicting pressures between code reuse and architectural independence. Most importantly, BFF patterns introduce operational complexity unjustified for MVPs and single-platform applications; multi-platform demand and team growth should trigger adoption.

BFF effectiveness critically depends on team ownership alignment: each BFF should be owned by the cross-functional team responsible for its frontend within a bounded context.

The ownership rule is the pattern's center of gravity. Without it, BFFs drift toward the distributed monolith failure mode regardless of architectural intent.

API gateways vs BFFs. These are related but distinct. API gateways provide centralized handling of cross-cutting concerns: authentication, authorization, rate limiting, request transformation, and logging. API gateways should remain thin, handling only infrastructure concerns — placing business logic in gateways creates a new monolith. BFFs own aggregation and client-specific shaping; gateways own infrastructure policy.


Schema-driven development

Schema-driven development is the practice of making the API contract the single source of truth that both sides compile against. The payoff is eliminating the type drift that causes runtime errors in production.

Code generation tools automatically generate type definitions from API schemas, eliminating manual type duplication and ensuring frontend types always match API contracts at build time. Type-safe code generation significantly reduces integration bugs through compile-time checking; accessing a non-existent field becomes a compile error rather than a runtime bug.

For REST APIs, the standard toolchain uses OpenAPI. RTK Query code generation automatically produces type-safe API slice definitions from OpenAPI and GraphQL schemas, eliminating manual REST endpoint integration boilerplate. For GraphQL, schema-first development lets frontend and backend work in parallel: GraphQL schema-first development enables frontend-backend parallel work by establishing the shared schema contract before implementation, with mock servers allowing frontend feature development before backend completion.

API versioning. Additive API changes — new endpoints, optional parameters — are non-breaking and don't require versioning, enabling long-lived single-version APIs that evolve gradually. Breaking changes like endpoint removal, field removal, and type changes require explicit versioning or structured deprecation, with best practice being a 6-month announcement, 12-month migration window, and 18–24 months before removal. Hybrid versioning combines API evolution with explicit versioning for truly breaking changes — a strategy Stripe has demonstrated successfully.

Consumer-driven contract testing. Schema compatibility alone is not enough — you also need to verify that the interactions consumers rely on are preserved. Consumer-driven contract testing with Pact detects integration breakages before deployment through concrete request-response examples; CI/CD integration with "can-i-deploy" prevents incompatible versions from being released. Consumer-driven contracts define expected provider interactions through actual usage patterns, ensuring providers don't break consumer functionality in microservice architectures.

Automated enforcement. Automated schema validation in CI/CD detects breaking changes at build time using tools like oasdiff, flagging removed properties, new required fields, and type changes. Schema diffing tools automate detection and categorization of breaking vs non-breaking changes, making schema-driven workflows practical for organizations without dedicated governance specialists. The OpenAPI 3.1 deprecate keyword enables machine-readable deprecation signaling in schemas, allowing tools to automatically warn developers about consuming deprecated endpoints.


Error boundaries and resilience

The frontend has to handle the cases your backend considers someone else's problem — network failures, partial service degradation, and cascading errors in the component tree.

React Error Boundaries. React error boundaries catch synchronous rendering errors in component trees and display fallback UIs instead of crashing. They use class component lifecycle methods and can be nested for granular error containment. Suspense and error boundaries are complementary: Suspense handles loading states while error boundaries catch rendering errors, providing complete resilience when combined.

Two important limitations: error boundaries cannot catch asynchronous data-fetching errors, which require separate library-specific mechanisms, and error boundaries do not function in SSR environments after HTML has been sent — developers must use stream-level error handling instead.

Retry strategies. Data-fetching libraries like React Query provide configurable retry strategies with exponential backoff and jitter mechanisms that prevent synchronized retry storms by adding randomness. This is the same exponential backoff with jitter pattern used in distributed systems — it maps directly from your backend experience.

Graceful degradation. React Query supports offline recovery via request queuing and graceful degradation that maintains partial functionality while communicating failures to users. Error recovery UIs should provide actionable recovery options enabling users to retry or navigate away without full page reloads.

Error boundary scope

Error boundaries only catch synchronous rendering errors. Async errors from data fetching, errors inside event handlers, and SSR errors all fall outside their scope. You must compose error boundaries with library-level error handling (e.g., React Query's onError, stream error handling for SSR) to achieve full coverage.


Annotated Case Study

E-commerce platform: migrating from a monolithic REST API to a multi-client architecture

Context. A team has a monolithic REST backend serving a web frontend. A mobile app is being added, and the two clients need substantially different data shapes — the mobile app is bandwidth-constrained, the web app needs rich related data for a dashboard.

Decision 1: Protocol. The team evaluates GraphQL for query flexibility but the backend is a single service, the team is backend-heavy, and there is no existing GraphQL operational experience. GraphQL's learning curve is steeper than REST's resource-and-verb model, and complexity costs are potentially unjustified for simple CRUD applications. They stay with REST.

Decision 2: BFF adoption. Rather than adding GraphQL, they introduce a BFF per client platform. The mobile BFF returns slim payloads aggregated from three internal services in a single fan-out/fan-in call. The web BFF returns richer payloads optimized for the dashboard view. Critically, the team structures ownership so the mobile team owns the mobile BFF end-to-end — BFF effectiveness critically depends on team ownership alignment.

Decision 3: Schema-driven types. Both BFFs publish OpenAPI specs. The mobile and web frontend teams run code generation at build time — code generation tools eliminate manual type duplication and ensure frontend types always match API contracts at build time. They add oasdiff to CI to block merges that introduce breaking changes without a version bump.

Decision 4: Contract testing. The mobile team uses Pact to record the exact request-response shapes their client depends on. Consumer-driven contract testing with Pact detects integration breakages before deployment. The "can-i-deploy" check in CI prevents the mobile BFF team from shipping a change that would silently break the app.

Decision 5: Error handling. The web dashboard wraps major data panels in nested error boundaries. When the recommendations service degrades, the recommendations panel shows a fallback without crashing the rest of the dashboard — graceful degradation maintains partial functionality while communicating failures to users. React Query's retry config uses exponential backoff with jitter to avoid hammering a recovering service.

Outcome. Mobile latency drops because the mobile BFF returns exactly what the app needs in one roundtrip instead of three. Type drift bugs disappear after codegen adoption. The "can-i-deploy" gate catches two breaking API changes in the first month before they reach production.


Compare & Contrast

REST vs GraphQL vs tRPC — decision matrix

DimensionRESTGraphQLtRPC
HTTP cachingNative (CDN, Cache-Control)Application-layer onlyApplication-layer only
Type safetyVia codegen from OpenAPIVia codegen from schemaNative (TypeScript inference)
Client flexibilityFixed endpoints, over/under-fetch riskFlexible per-queryFixed procedures
N+1 riskLow (controlled by endpoint design)High without DataLoaderLow (controlled by procedure design)
Rate limitingStraightforward per-endpointComplex (cost analysis required)Straightforward
Multi-client supportModerate (BFF recommended)Strong nativeWeak (monorepo constraint)
Monorepo requiredNoNoYes for type safety
Team fitAnyMulti-client, polyglotTypeScript full-stack, monorepo
Operational maturityDecades10+ yearsNewer, smaller ecosystem
Scale ceilingEffectively unlimitedUnlimited with DataLoader~100–200 endpoints (TS inference)

BFF vs API Gateway

These are often confused because they both sit in front of internal services. The distinction is intent.

DimensionAPI GatewayBackend-for-Frontend
PurposeInfrastructure policy (auth, rate limiting, routing)Client-specific aggregation and shaping
Business logicNone — should stay thinYes — task-centric composition
OwnerPlatform/infra teamFrontend product team
NumberOne (or a few regional)One per client platform
Failure modeBecomes a bottleneck if overloadedBecomes a distributed monolith if uncoordinated

Boundary Conditions

When REST breaks down. REST struggles when many clients with genuinely different data needs hit the same API. Over-fetching becomes a real cost on mobile. The BFF pattern is the idiomatic solution; switching to GraphQL is the higher-investment alternative when query flexibility is the core need.

When GraphQL breaks down. GraphQL is poorly suited to simple CRUD services with a single client. The operational cost — application-layer caching, complexity-based rate limiting, DataLoader setup, schema governance — only pays off at multi-client scale. Without DataLoader, the N+1 problem can multiply database load by 100x or more at scale.

When tRPC breaks down. tRPC is unusable outside a monorepo — the type sharing that makes it work requires shared source code access. Above 100–200 endpoints, TypeScript type inference overhead degrades IDE and build performance. And synchronized deployment prevents independent scaling or staged rollouts, which matters in organizations running continuous delivery at scale.

When BFF breaks down. BFF patterns introduce operational complexity unjustified for MVPs and single-platform applications. Adding a BFF layer adds latency, deployment complexity, and an additional failure point. Multiple uncoordinated BFFs become a distributed monolith. An alternative at GraphQL scale is federation: GraphQL Federation allows domain teams to contribute subgraphs to a shared supergraph, providing an alternative to BFF sprawl.

When schema-driven codegen breaks down. Codegen only helps if the schemas are accurate and kept up-to-date. Schemas that drift from implementation are worse than no schemas — they produce false confidence. Consumer-driven contract testing (Pact) catches this class of failure. Automated breaking-change detection in CI with tools like oasdiff prevents the schema from diverging from the implementation.

When error boundaries break down. Error boundaries only catch synchronous component rendering errors. They cannot catch asynchronous data-fetching errors, errors inside event handlers, or errors in SSR contexts after HTML has been sent. Full resilience requires composing error boundaries with React Query's onError, Suspense, and stream-level SSR error handling.


Quiz

1. A REST API returns a response with Cache-Control: max-age=60. A user makes the same request 30 seconds later. What happens at the browser level?

  • A) A new network request is sent immediately.
  • B) The cached response is returned without a network request.
  • C) A conditional If-None-Match request is sent.
  • D) The response is served stale while a background revalidation starts.

Correct: B. Within the max-age window, the browser serves from cache without a network request.


2. Your GraphQL server resolves a list of 50 posts and for each post fetches its author. You measure 51 database queries per request. What is the solution?

  • A) Switch to REST.
  • B) Add a DataLoader to batch author queries by ID.
  • C) Reduce the query page size to 10 posts.
  • D) Add a Redis cache in front of the database.

Correct: B. DataLoader collects author ID requests within a single GraphQL operation and issues a single batched query (WHERE id IN (...)), reducing 51 queries to 2.


3. Your team is building a TypeScript monorepo with a Next.js frontend and a Node.js backend. You want end-to-end type safety with zero schema files. Which protocol is the best fit?

  • A) REST with OpenAPI codegen.
  • B) GraphQL with schema-first development.
  • C) tRPC.
  • D) gRPC with protobuf.

Correct: C. tRPC achieves end-to-end TypeScript type safety through shared source code in a monorepo, requiring no separate schema files.


4. A React error boundary wraps a component that fetches user data with fetch() inside a useEffect. The fetch fails with a 500. Does the error boundary catch this?

  • A) Yes — error boundaries catch all errors in the component tree.
  • B) No — error boundaries cannot catch asynchronous errors from data fetching.
  • C) Yes — if the component re-throws the error during render.
  • D) Only if the error boundary uses getDerivedStateFromError.

Correct: B. Error boundaries only catch synchronous rendering errors. An async fetch error inside useEffect falls outside their scope and requires library-level handling (e.g., React Query's onError).


5. You're adding a new optional field to an existing REST endpoint response. Does this require an API version bump?

  • A) Yes — any schema change requires versioning.
  • B) No — additive changes like new optional fields are non-breaking.
  • C) Only if clients use strict JSON parsing.
  • D) Only if the field is nullable.

Correct: B. Additive changes — new endpoints or optional response fields — are non-breaking and don't require versioning. Breaking changes like field removal or type changes require a version bump or structured deprecation.

Key Takeaways

  1. Protocol choice is an organizational decision, not a performance one. On fast networks, REST, GraphQL, and tRPC have negligible latency differences. Choose based on team structure, client diversity, and monorepo constraints.
  2. GraphQL without DataLoader is a reliability risk at scale. The N+1 problem can multiply database load by 100x. DataLoader batching is non-optional in production GraphQL.
  3. BFFs require team ownership to work. A BFF owned by the platform team rather than the frontend team drifts toward a distributed monolith. Ownership alignment is the pattern's critical requirement.
  4. Schema-driven codegen makes the API contract a compile-time guarantee. Running type generation from OpenAPI or GraphQL schemas eliminates the type drift that causes runtime errors, and automated breaking-change detection in CI prevents accidental regressions.
  5. Error boundaries are scoped to synchronous rendering. Building resilient UIs requires composing error boundaries with library-level async error handling (React Query retry, onError) and stream-level SSR error handling — no single mechanism covers all failure modes.

Further Exploration

tRPC

  • tRPC documentation — Official tRPC docs cover the monorepo setup, router composition, and the TypeScript inference model.

GraphQL

Contract Testing

  • Pact documentation — Consumer-driven contract testing, the can-i-deploy check, and CI/CD integration guides.

API Schemas

Architectural Patterns

React