Engineering

Architecture Decision Capstone

Applying the full curriculum to real product decisions

Learning Objectives

By the end of this module you will be able to:

  • Make a reasoned rendering strategy choice (CSR, SSR, ISR, RSC, islands) given a concrete product scenario with stated content and performance constraints.
  • Design a state management architecture that correctly separates UI state, server state, and URL state, with a rationale for each library or approach.
  • Select an API contract style (REST, GraphQL, or tRPC) and justify it against team size, type-safety requirements, and client diversity.
  • Choose an authentication model suited to the client type (SPA, SSR app, mobile) and threat model, identifying where the security boundary sits.
  • Specify a frontend testing strategy that allocates effort across unit, integration, visual regression, and E2E layers proportionally to risk.
  • Articulate the risks that tight server-client coupling introduces in full-stack frameworks and name concrete mitigation strategies drawn from backend engineering practice.

Annotated Case Study

The scenario

A four-person backend team at a logistics company has been asked to deliver a new self-service portal. Users will manage shipments, track live delivery status, and download invoices. The platform must serve both web browsers and a mobile app that a contract team will build in parallel. The company's existing backend is a set of Go microservices; the new portal will call them via internal APIs.

The team has no dedicated frontend engineers. Everyone has solid Go and some TypeScript. They have six weeks to a working demo and need choices they can defend to the principal engineer reviewing their architecture proposal.

Work through each decision below and read the annotations before moving on.


Decision 1: Rendering strategy

The question. Shipment lists and invoice history are mostly static per page load. The live tracking map updates every few seconds. Which rendering strategy?

A naive answer. "We'll use CSR everywhere because we know React and it's simple." This is defensible for a prototype but misses meaningful performance consequences. Client-Side Rendering creates fetch waterfalls: four sequential 300ms requests produce ~1.2 seconds of wait time on fast networks, whereas server-side parallelization of the same requests completes in ~300ms.

A better answer. Decompose by content type.

  • Shipment lists and invoice history: use SSR or ISR. The pages are data-heavy but not real-time. SSR pre-renders HTML server-side, eliminating client waterfalls for the initial load. For pages where the underlying data changes rarely, ISR (Incremental Static Regeneration) pre-renders at build time and regenerates in the background after a revalidation window, providing maximum CDN performance with manageable staleness.

  • Live tracking map: this is the one genuinely interactive, real-time region. This is a strong candidate for an islands architecture pattern. The rest of the tracking page renders as static HTML; a single hydrated island handles the live map. Islands load and hydrate independently and in parallel — a slow map island does not block other page regions from becoming interactive.

Backend parallel

This is the frontend equivalent of deploying a read-heavy query to a CDN cache and a write-heavy endpoint to a stateful worker. Not everything runs on the same infrastructure simply because it's in the same product.

What the claims say about hydration costs. Hydration cost scales dramatically on mobile devices. Average React SSR applications have a 4.2-second Time to Interactive on mobile, and 53% of users abandon sites exceeding 3 seconds. Partial hydration reduces JavaScript payload to approximately half compared to full-page hydration. Islands architecture renders HTML on the server with placeholders for dynamic regions hydrated separately as self-contained widgets, allowing parallel independent hydration.

The annotation. The architecture proposal should not pick one rendering mode globally. It should name the rendering mode per page type, explain the performance rationale, and note which framework feature (Next.js ISR, Astro islands, or React Suspense streaming) will implement it. Vague answers like "we'll see" indicate the decision hasn't been made — it's just been deferred, and deferral here means discovering the problem six weeks later under demo pressure.


Decision 2: State management

The question. The portal has a logged-in user context, a filter/sort/pagination state on the shipment list, and live tracking data that polls every five seconds. Where does each live?

A naive answer. "Redux for everything, like we always see in tutorials." Redux enforces centralized, immutable state management with explicit actions and pure reducers, making changes predictable — but this comes with boilerplate cost that is rarely justified when state concerns are separate.

A better answer. Apply the three-category split:

Server state — shipment lists, invoice data, live tracking. This is data that exists remotely, can be modified by others, becomes stale, and requires background refetching and loading/error states. Server state is not client state. It belongs in a dedicated server state library (TanStack Query, SWR). TanStack Query handles stale-while-revalidate semantics: serving cached content immediately while revalidating in the background, balancing speed with eventual freshness. For the live tracking poll, a short staleTime forces background refetch automatically when the query is invalidated and rendered.

URL state — the filter, sort order, and current page on the shipment list. Treating the URL as the source of truth eliminates duplicate state and ensures automatic synchronization between address bar and UI. URL state gives the portal shareability (users can send a filtered view to a colleague), bookmarkability, and correct back-button behavior with no extra code. useSearchParams (React Router) or the equivalent reads and writes this like useState that persists in the URL.

UI state — the open/closed state of a dropdown, the hover state of a row, whether an inline form is visible. This stays in component-local useState. When state is local, only that component and its direct children re-render when it updates — a significant performance advantage over unnecessarily lifted state.

Global client state — authenticated user info, session data, notification count. These are legitimately global. A lightweight library (Zustand: ~1-2KB) or React Context is appropriate. The recommendation against global stores as a default bears repeating: putting state into global stores preemptively is a common anti-pattern that introduces unnecessary complexity and disconnects state from the components using it.

Server state is not client state. Treat them differently from day one and you will never fight a cache invalidation battle you didn't see coming.

The annotation. A state management architecture that mixes server data, URL filters, and UI toggles in a single Redux store is a design smell recognizable from backend work: it is the equivalent of storing session context, query parameters, and application config in the same database table. The appropriate response is the same — separate concerns by kind, not by convenience.


Decision 3: API contract style

The question. The portal must talk to Go microservices internally and expose something the contract mobile team can use. Which API contract style?

Context that matters. The mobile team is external. The backend is a monorepo of Go services. The frontend and backend teams overlap — some engineers work both sides.

Evaluating the options.

tRPC provides end-to-end type safety without schema definition languages by inferring types directly from backend function signatures; when a backend procedure changes, TypeScript immediately reflects those changes in the frontend. This is compelling for a small team where backend and frontend engineers are the same people in a TypeScript monorepo. The constraint: tRPC cannot be used in non-monorepo settings without losing its primary advantage, and it requires synchronized deployment preventing independent backend scaling or staged rollouts. The mobile team cannot consume tRPC without a TypeScript client.

GraphQL excels when serving multiple clients with different data requirements through query flexibility. The mobile team gets to request exactly the fields it needs; the portal does the same. The costs: GraphQL's single POST endpoint breaks HTTP-layer caching that REST leverages, forcing all caching logic to the application layer. GraphQL's learning curve is steeper than REST's resource-and-verb model. The N+1 query problem can multiply database load by 100x or more at scale without data loader batching.

REST retains dominance through ecosystem stability and decades of standardization, with universal library support across languages — including Go, Swift, and Kotlin. REST APIs enable practical HTTP-layer caching through Cache-Control headers and CDN integration. Explicit schemas enable automatic breaking-change detection and code generation.

A defensible answer for this scenario. REST for the external-facing API consumed by the mobile team. tRPC for internal portal-to-BFF calls where the team controls both sides and shares a TypeScript codebase. The Backend-for-Frontend sits between them: it aggregates internal Go microservices into task-centric APIs tailored to frontend UX needs, exposes REST outward, and keeps BFF ownership aligned with the frontend team. API gateways should remain thin, handling only infrastructure concerns — placing business logic in gateways creates a new monolith.

When mobile enters the picture

Any API contract that requires a TypeScript runtime on the consumer side cannot be used by native mobile clients. Decide the external contract first and build inward from there.

The annotation. The correct answer here is not "pick one style globally." It is recognizing that the internal and external boundaries have different requirements. On fast modern networks, protocol choice differences in latency are negligible — developer experience and organizational fit matter more than performance micro-optimizations. Choose based on team topology, not benchmarks.


Decision 4: Authentication model

The question. The portal is an SSR web app. The mobile app is a native client. Users are business customers who authenticate via the company's existing identity provider (OIDC). Where do tokens live? Where does the security boundary sit?

The principle that does not change. Client-side authorization cannot be trusted to enforce access control; all security decisions must occur server-side for every request. The client environment is not trustworthy even with HTTPS encryption. This is not a frontend-specific lesson — it is the same boundary as in backend work, applied to a new layer.

Web portal (SSR). The most secure current model is the BFF pattern: the backend handles OAuth, issues HttpOnly session cookies to the frontend, and stores tokens server-side. The BFF establishes a genuine security boundary where backend tokens never reach the browser, preventing XSS-based token theft. Session cookies must be configured with HttpOnly (prevent JavaScript access), Secure (HTTPS-only), and SameSite (prevent cross-site sending).

If full BFF is out of scope for MVP, the hybrid fallback is: access tokens in JavaScript memory (not localStorage), refresh tokens in HttpOnly cookies. localStorage is fundamentally vulnerable to XSS — any JavaScript running in the browser context can read stored tokens. The hybrid approach provides practical security benefits with proper UX handling.

Mobile app (native client). Cookies do not work cleanly for native clients. Bearer tokens in the Authorization header are inherently CSRF-resistant because browsers do not automatically send them in cross-origin requests. PKCE (Proof Key for Code Exchange) is mandatory in OAuth 2.1 for all client types, binding authorization requests to token exchanges and preventing code injection attacks. The Implicit flow is deprecated — use Authorization Code flow with PKCE.

Middleware is not a security boundary. Middleware in SSR frameworks improves efficiency by redirecting unauthenticated users before component rendering — but actual authentication checks must be enforced in Route Handlers, Server Actions, and the Data Access Layer. This is the frontend equivalent of not trusting reverse proxy headers as a substitute for application-level auth.

The annotation. The architecture proposal should specify: which OAuth flow (Authorization Code + PKCE), where tokens are stored per client type, how the CSRF surface is handled (SameSite + explicit CSRF tokens for cookie-based flows), and who owns the session. A proposal that says "JWT in localStorage" with no further comment is missing four decisions.


Decision 5: Testing strategy

The question. The team has six weeks to demo. They cannot test everything. Where should effort go?

Reject the naive pyramid. The traditional testing pyramid is economically misaligned for modern frontend development. Unit tests sacrifice confidence for speed because they test implementation details rather than user-facing behavior; mocking creates false confidence when real integrations fail. High code coverage metrics can provide 80%+ coverage while missing critical DOM, event, and state management bugs.

Adopt the trophy model. The testing trophy inverts the pyramid by making integration tests the primary layer — tests resembling actual usage provide more confidence.

For this portal, the allocation should be:

Component integration tests (primary layer). Test entire feature subtrees: the shipment list with filters, the invoice download flow, the tracking page with its island. Mock Service Worker intercepts at the network boundary using the Service Worker API, leaving application code unchanged and enabling mock reuse across unit tests, integration tests, E2E tests, and development. Test using accessible roles (getByRole), not test IDs or CSS class names. Query by accessible roles as the default query method enables testing through the accessibility API, making inaccessible components harder to test and embedding accessibility into test-driven development.

Unit tests (narrow layer). Pure utility functions, date formatting, status mappers. Not component structure.

E2E tests (critical paths only). Limit to critical business flows: login, shipment filter, invoice download. E2E tests should represent 5-10% of the total test suite. End-to-end tests should be independent and isolated — each test creates and cleans up its own data.

Visual regression (deferred but planned). Not required for the demo, but the architecture proposal should identify where Storybook stories will anchor future visual regression tests. Teams with limited resources should prioritize component-level visual testing as their primary regression layer before page-level testing.

Framework choice note

Playwright surpassed Cypress in weekly NPM downloads in mid-2024 and continues leading in 2026. It supports Chromium, Firefox, and WebKit through a unified API, making it the pragmatic choice for cross-browser E2E, especially when the mobile team may bring Safari into scope.


Decision 6: Coupling risk in full-stack frameworks

The question. The team is leaning toward Next.js App Router with React Server Components. What risks does this introduce, and how should they be managed?

The convergence dynamic. Co-locating server and client code in full-stack frameworks increases coupling risk, making backend extraction harder later as tight component interdependencies develop naturally. React Server Components run on the server and access databases directly — this is powerful, but it means business logic can easily end up embedded in component files rather than in a service layer.

The mitigation. Backend architectural thinking — separation of concerns, dependency inversion, domain modeling — becomes more valuable when server and client code converge because disciplined patterns protect against tight coupling in unified codebases.

Concretely:

  • Server Components that fetch data should delegate to a data access layer, not embed SQL or direct HTTP calls inline. Custom hooks function as the frontend equivalent of backend service layers — but in RSC architecture, the equivalent is a server-side service module imported by the Server Component.
  • React Error Boundaries function as frontend circuit breakers: they isolate component failures and prevent cascading crashes, paralleling backend circuit breakers that stop requests to failing services.
  • The 'use client' boundary should be kept narrow. In Next.js App Router, all components in the app/ directory are Server Components by default; only add 'use client' when specifically needing client-side interactivity. Best practice is to keep Client Components small and focused on interactivity only, extracting interactive pieces into small Client Components while keeping surrounding UI as Server Components.
The discipline that prevents a Go service from becoming a distributed ball of mud — separation of concerns, explicit interfaces, domain boundaries — is exactly what prevents an App Router codebase from becoming an undifferentiated tangle of fetch calls and JSX.

The annotation. The architecture proposal should include: a diagram of the data access layer boundary, a policy for which concerns live in Server Components vs service modules vs shared utilities, and a note on what the migration path looks like if the team later needs to expose the same logic to a non-Next.js consumer.


Thought Experiment

You are reviewing the architecture proposal from another backend team. They have made the following choices:

  1. CSR everywhere using Vite + React, with all data fetching handled via a single global SWR cache keyed by URL string.
  2. All state — user session, shipment filters, modal open/closed — in a single Redux store with no separation between server state and client state.
  3. tRPC as the sole API layer, including for the mobile team.
  4. Access tokens in localStorage with a 24-hour expiry and no refresh token rotation.
  5. Unit tests only, with 85% coverage reported.

For each decision, identify: what the failure mode is, when it will manifest, and what you would recommend instead. Write your analysis as if it were a code review comment.

There is no single correct answer. The value is in applying the frameworks from the module to a scenario where the original engineer made plausible-sounding choices that compound each other.


Key Principles

Rendering strategy is per content type, not per application. Static, dynamic, and real-time content have different performance contracts. Islands architecture and ISR exist precisely because not all content deserves the same delivery mechanism.

Server state and client state are different problems. Conflating them in a single store creates the frontend equivalent of storing session data and business entities in the same database table. Use dedicated server state libraries (TanStack Query, SWR) for fetched data and keep them separate from UI state.

API contract selection follows team topology. tRPC requires monorepo co-ownership. GraphQL rewards multi-client diversity. REST remains the universal contract for heterogeneous consumers. On modern networks, performance differences are negligible — organizational fit is the deciding factor.

The security boundary is always server-side. Client-side authorization code is documentation, not enforcement. Every authorization decision must be validated on the server for every request, independent of previous decisions. The BFF pattern eliminates browser-side token storage entirely for web clients.

Integration tests give the highest return in frontend work. Testing through accessible roles, with real component subtrees and network mocks at the boundary, provides more confidence than unit coverage metrics. E2E tests should be limited to critical paths (5-10% of the suite).

Backend discipline protects full-stack codebases. When server and client code converge, the disciplines that prevent backend systems from coupling — separation of concerns, explicit interfaces, domain modeling — become more valuable, not less.


Stretch Challenge

Return to the annotated case study scenario — the logistics portal, four-person backend team, six weeks to demo.

Write a one-page architecture decision record (ADR) for the authentication model. Include:

  • Context: what forces are acting (client types, existing identity provider, team size, timeline).
  • Decision: the chosen approach, stated precisely enough that a new engineer could implement it.
  • Consequences: what you gain, what you trade away, and what deferred complexity looks like when you revisit this at scale (10,000 users, a third client type added, a security audit).

The ADR should be written for a reader who is a senior backend engineer — assume full understanding of OAuth 2.0, zero assumption of familiarity with web-specific attack vectors (XSS, CSRF) beyond what you can explain in one sentence each.

Key Takeaways

  1. Decompose rendering by content type. Shipment lists warrant SSR or ISR; real-time tracking warrants an island. Picking one rendering mode globally is the wrong unit of analysis.
  2. Name the state category before naming the library. Server state, URL state, local UI state, and global client state have different homes. Libraries follow the categorization, not the other way around.
  3. API contracts serve their consumers. The mobile team and the web portal are different consumers with different capabilities. A single contract style often serves neither well.
  4. Authentication threat modeling is not optional. The difference between localStorage and HttpOnly cookies is not aesthetic — it determines whether XSS compromises user sessions. The BFF pattern removes the tradeoff by keeping tokens off the browser entirely.
  5. The testing trophy outperforms the testing pyramid for frontend work. Maximize integration test coverage of realistic user flows. Keep E2E to critical paths. Skip snapshot tests.
  6. Full-stack frameworks amplify good architecture and bad architecture equally. Server Components and co-located server/client code accelerate iteration — and accelerate coupling. Bring service layer discipline into the framework boundary from day one.

Further Exploration

Rendering & Performance

State Management

Authentication & Security

Testing