Philosophy

Epistemology for Engineering Decisions

From Knightian uncertainty to ADRs: how the theory of knowledge governs architectural judgment

Learning Objectives

By the end of this module you will be able to:

  • Distinguish Knightian uncertainty from risk and explain why this distinction changes how architectural decisions should be made.
  • Apply Peirce's abduction to diagnose a production incident: reasoning from surprising observations to the best causal hypothesis.
  • Use maximin, regret minimization, and satisficing as distinct decision heuristics and identify which is appropriate for which category of decision.
  • Explain why reversibility has epistemic value beyond mere convenience, and apply this to architectural choices.
  • Write an ADR that captures epistemic state—what was known, what was unknown, what was assumed—rather than just the decision outcome.

Core Concepts

Risk vs. Uncertainty: A Distinction That Changes Everything

Most engineering organizations conflate two fundamentally different situations under the word "risk." The distinction matters enormously.

Risk describes situations where you face multiple possible outcomes and you know—or can estimate—the probability distribution over those outcomes. A die roll is risk. A well-characterized failure mode with years of production data is risk.

Knightian uncertainty (named for economist Frank Knight, who formalized it in 1921) describes situations where you cannot assign probabilities because you lack the epistemic basis to do so. You do not know what you do not know. Novel technology choices, unprecedented load patterns, organizational dynamics that have never been stress-tested—these are uncertainty, not risk.

Why the distinction matters

Risk can be managed with probabilistic reasoning and quantitative models. Knightian uncertainty cannot. Treating genuine uncertainty as if it were risk—slapping a probability on the unknowable—produces the false confidence of a number without the epistemic substance to back it. Research on uncertainty in software engineering shows this conflation is one of the most persistent sources of architectural misjudgment.

This distinction is not merely semantic. It determines which decision framework is appropriate. When you have risk, expected value calculations are defensible. When you have uncertainty, you need different tools—and recognizing which situation you are in is a prerequisite for choosing the right one.

Peirce's Triad: Abduction, Deduction, Induction

Charles Peirce formulated a three-phase scientific method:

  1. Abduction: forming an explanatory hypothesis from a surprising observation. You notice something unexpected and infer the most plausible cause. Peirce called this "the only logical operation which introduces any new idea."
  2. Deduction: deriving predictions from that hypothesis. If the hypothesis is true, what should you expect to observe?
  3. Induction: testing those predictions against experience. Does the world conform to what the hypothesis predicted?
Fig 1
Abduction Hypothesis from surprise Deduction Derive predictions from hypothesis Induction Test predictions against experience Revise hypothesis if predictions fail
Peirce's three-phase inquiry: from surprising observation to tested knowledge

Abduction is where new knowledge enters. It is the moment of saying "the service latency spiked after the deploy—the most plausible explanation is that the new serializer has O(n²) behavior on the edge case we did not benchmark." That guess, if deductively elaborated and inductively tested, becomes knowledge. Without abduction, investigation is just guessing. With it, guessing becomes inquiry.

The Stanford Encyclopedia of Philosophy's treatment of Peirce's abduction makes clear that this is not a license for speculation—abduction is constrained by plausibility and economy. Among competing hypotheses, prefer the one that explains the most with the least.

The Limits of Expected Utility Theory

Classical decision theory tells you to maximize expected utility: for each option, multiply the value of each outcome by its probability and sum. The option with the highest expected utility wins.

This works well when probabilities are known and stakes are modest. It breaks down in several ways that are endemic to software architecture:

  • It requires accurate probability estimation. Under Knightian uncertainty, you do not have that. Assigning a probability to "will this technology still be maintained in five years" is not calculation—it is false precision masquerading as rigor.
  • It neglects non-quantifiable factors. Team cognition, organizational trust, the complexity that accretes around a decision over time—these influence real decisions but resist utility quantification.
  • People systematically deviate from it. Empirical research shows preferences constructed in different contexts differ significantly, and choices can depend on presentation rather than actual costs. Expected utility theory describes an idealized agent, not actual engineering teams.
  • It exceeds human cognitive capacity for complex problems. Attention, working memory, probability calibration—the cognitive load required makes full expected-utility reasoning impractical outside of narrow, well-characterized domains.
The problem is not that engineers are irrational. The problem is that expected utility theory assumes a level of epistemic access—known probabilities, enumerable outcomes—that genuine uncertainty forecloses.

Decision Heuristics for Deep Uncertainty

When expected utility theory fails, three principled alternatives are available:

Maximin (Wald's strategy, developed circa 1940): choose the option whose worst-case outcome is best. You are not optimizing for expected value; you are protecting against catastrophe. This is appropriate when some failure modes are genuinely unacceptable—data loss, security breach, regulatory violation. Maximin says: look at the floor of each option, and choose the highest floor. It is conservative by design.

Satisficing: rather than maximizing performance on a single metric, find the option that meets a minimum acceptable threshold across the widest range of uncertain scenarios. Info-gap decision theory operationalizes this formally: given that you do not know how wrong your model might be, pick the choice that remains acceptable under the greatest range of model error. Satisficing is often more robust than optimization under uncertainty precisely because optimization is fragile to model misspecification.

Regret minimization: shift focus from probabilistic calculation to comparative evaluation of forgone opportunities. Project yourself forward and ask: which choice would I regret not having made? This approach addresses uncertainty by grounding decisions in long-term coherence rather than short-term risk management, and has roots in behavioral economics, particularly loss aversion. It is especially useful for high-stakes, rare decisions where the stakes are not easily quantified but the asymmetry between options is viscerally clear.

Fig 2
Heuristic When to use Example Maximin Some failure modes are genuinely unacceptable Selecting a data store where data loss is catastrophic Satisficing No option dominates; need robustness over optimization Choosing a message broker under unclear load projections Regret minimization Rare, high-stakes, hard to quantify; asymmetric options Whether to rebuild a core platform vs. extend it
Choosing a heuristic: a rough guide

Reversibility as Epistemic Strategy

Reversibility is often treated as a convenience—"we can always roll it back." But its deeper value is epistemic: a reversible decision lets you gather evidence before you are locked in.

Real options thinking in decision theory formalizes this: the ability to defer or reverse a decision has genuine positive value when the future is uncertain. If you design your architecture so that the choice of message queue can be swapped without rebuilding your consumers, you have bought yourself the option to make that choice later—after you have more data about load patterns, team expertise, and operational costs.

The practical proxy for this is the "one-way door / two-way door" distinction:

  • Two-way doors: quick to decide, because the cost of being wrong is low. Reverse the decision when you learn more.
  • One-way doors: require deliberate, disciplined evaluation, because you will live with the consequences. Apply maximin or satisficing. Write an ADR. Involve people who will disagree with you.
The tension to hold

Cognitive research shows that keeping options open indefinitely creates decision paralysis and opportunity costs. The goal is not maximum optionality—it is appropriate optionality: defer commitment where the cost of deferral is low relative to the information value gained, and commit when the cost of waiting exceeds the benefit.

Modular, decoupled architecture is not just good engineering hygiene. It is an epistemological strategy: it aligns the structure of your system with the structure of your knowledge, keeping commitments proportional to certainty.

Epistemological Instruments: Agile, Observability, A/B Testing, ADRs

Here is the reframing that unlocks practical philosophy: Agile, observability, A/B testing, and ADRs are not project management tools. They are epistemological instruments—mechanisms for generating knowledge under uncertainty.

Agile iteration operationalizes pragmatist epistemology. Each sprint is an experiment: teams form hypotheses about what users need and what solutions will work, implement them, and test those hypotheses against real user behavior. The Agile Manifesto's emphasis on responding to change over following a plan is not a process preference—it is an epistemological position that treats requirements as provisional and continuous validation as the path to knowledge.

Observability forces empirical investigation of how systems actually behave rather than how they were designed to behave. Metrics, logs, and distributed traces are the instruments of inquiry. A dynamic dashboard that lets you ask new questions of your data instantiates the pragmatist principle that knowing is inseparable from observing real consequences. Observability tools were developed from necessity when traditional debugging methods proved inadequate for complex distributed systems—engineers were pushed toward pragmatism by the epistemic inadequacy of reasoning from first principles about systems too complex to mentally model.

A/B testing makes consequences the arbiter of product decisions. Rather than debating interface designs theoretically, teams deploy variations and measure effects on real user behavior. This directly instantiates the pragmatist principle that truth is what works—validation occurs through real consequences observed in practice, not through argument.

ADRs are crystallized epistemic humility. An ADR that only records what was decided is an organizational artifact. An ADR that records what was known, what was unknown, and what was assumed at the time of decision is an epistemological instrument—one that enables future teams to understand the epistemic state under which a choice was made, and to challenge or reaffirm it as circumstances change.


Key Principles

1. Name the type of uncertainty before choosing a strategy. Before reaching for a decision framework, ask: do I have known probabilities (risk) or unknown distributions (uncertainty)? The answer determines which tools are legitimate. Using expected utility reasoning on genuine uncertainty does not make the decision more rigorous—it makes it more misleadingly confident.

2. Prefer consequences over arguments as validation. Across the pragmatist tradition—Peirce, James, Dewey—the measure of a valid belief is not elegant reasoning but whether it produces the expected consequences when acted upon. An architectural claim is valid if the deployed system behaves as predicted, not if the design document is internally consistent.

3. Match decision heuristic to decision type. Maximin for catastrophic downside risk. Satisficing for robustness under uncertainty. Regret minimization for rare, asymmetric, hard-to-quantify choices. Use each where it belongs, not habitually.

4. Engineer for reversibility proportional to uncertainty. When you are highly uncertain, design to keep options open. When uncertainty is resolved, commit. Modularity and loose coupling are not aesthetic choices—they are epistemic strategies that let you defer commitment until evidence warrants it.

5. Treat your tools as instruments of inquiry. Observability, iteration, A/B testing, and ADRs are how engineering teams generate knowledge. Use them with that awareness. Ask not only "did the system behave correctly?" but "what have we learned that updates our model of the world?"

6. Cultivate epistemic humility as a professional virtue. Research shows that individuals with higher levels of intellectual humility process information more effectively, leading to better decision-making and improved critical thinking. For technical leaders specifically, epistemic humility counteracts the overconfidence that accumulates with specialized expertise. The goal is not to undermine expertise but to keep it accountable to evidence.

7. Know your biases by name. Cognitive biases systematically distort engineering judgment: sunk cost fallacy (continuing despite changing conditions), availability heuristic (overweighting recent salient examples in debugging), planning fallacy (optimistic timelines). Naming them is not sufficient, but it is necessary. Structured processes—retrospectives, pre-mortems, ADRs—partially mitigate them by externalizing reasoning.


Worked Example

Diagnosing a Production Incident with Peirce's Triad

The situation. At 14:23 UTC, p99 latency for the checkout service spikes from 120ms to 1800ms. No deploy has occurred in the last six hours. Error rates remain flat. Database query times are normal. The CPU on the service instances is elevated.

Abduction: what is the most plausible explanation?

The observations are surprising: latency has spiked without a deploy, without increased errors, without database degradation. What hypothesis best explains this pattern?

Candidate hypotheses:

  • A downstream dependency has degraded (but no errors, so requests are completing)
  • A cache has been evicted and requests are now doing more work (CPU elevated, latency up, no errors)
  • Garbage collection pressure from a memory leak in the service (CPU, latency)
  • A background job is competing for CPU resources

The cache eviction hypothesis is most economical: it explains elevated CPU (recomputing cached values), elevated latency (more work per request), absence of errors (computation completes, just slowly), and absence of database degradation (work is in-process, not in queries).

Deduction: what should we observe if this hypothesis is correct?

  • Cache hit rate should have dropped sharply around 14:23
  • Cache miss metrics should correlate with the latency spike
  • Memory utilization for the cache layer should have dropped at the same time

Induction: check the evidence.

You pull the observability dashboards. Cache hit rate dropped from 94% to 11% at 14:21—two minutes before the latency spike. A scheduled job that clears expired session tokens ran at 14:20 and, due to a bug introduced in a refactor three weeks prior, was clearing all cache keys rather than only expired ones.

The hypothesis is confirmed. The fix is deployed. The incident post-mortem captures the abductive chain so the next team to face a similar pattern has a template for inquiry.


Active Exercise

Write an Epistemically Honest ADR

Setup. Recall a significant architectural decision you made or observed in the last two years. It can be a technology selection, a pattern choice, a migration, or a structural boundary decision. Choose one where the outcome is now visible enough to reflect on.

Step 1: Reconstruct the epistemic state at decision time.

Answer these questions as they stood at the moment of decision—not with hindsight:

  • What did we know with confidence?
  • What did we assume but could not verify?
  • What did we know we did not know?
  • What did we not know we did not know (i.e., what surprised us later)?

Step 2: Identify which decision framework was implicitly in use.

Looking back, were you applying maximin (protecting against catastrophic failure)? Satisficing (seeking robustness under uncertainty)? Expected utility (optimizing a quantifiable metric)? Regret minimization? Or were you following convention, social pressure, or familiarity?

Name it honestly. Most architectural decisions are made under a mix of implicit heuristics and social dynamics, not explicit frameworks.

Step 3: Write the ADR.

Use this structure:

# ADR-XXX: [Decision title]

## Status
[Accepted / Superseded / Under review]

## Context
[The situation that made a decision necessary. What was uncertain? What pressures were present?]

## Epistemic state at decision time
- Known: [List what was genuinely known]
- Assumed: [List what was assumed but not verified]
- Unknown: [List what was acknowledged as unknown]
- Blind spots (added in retrospect): [What you did not know you did not know]

## Decision
[What was decided, and by what heuristic: maximin / satisficing / regret minimization / other]

## Consequences expected
[What you predicted would happen if this decision was correct]

## Consequences observed
[What actually happened—fill in after sufficient time has passed]

## Lessons for future decisions
[What the gap between expected and observed consequences teaches]

Step 4: Reflect.

What would you have done differently if you had been explicit about the epistemic state at the time? Where did you treat uncertainty as risk? Where did you apply the wrong heuristic?

What this exercise is testing

The goal is not to critique the past decision. The goal is to develop the habit of naming epistemic state explicitly—because engineers who do this prospectively make better decisions than those who only reconstruct it retrospectively.


Boundary Conditions

When maximin becomes pathological. Maximin protects against worst-case outcomes, but applied uniformly it produces extreme conservatism that prevents beneficial risk-taking. Choosing the "safest" option in every dimension compounds: you end up with a system that is locally safe but globally brittle, because safety was optimized independently at each choice. Maximin is appropriate for genuinely catastrophic, unrecoverable failure modes—not as a default heuristic for all decisions.

When satisficing licenses mediocrity. "Good enough across scenarios" can slide into "we did not commit to anything." Satisficing requires explicit criteria: what counts as acceptable? If those criteria are not set rigorously, the heuristic becomes a post-hoc rationalization for whatever was easiest to build.

When reversibility becomes decision avoidance. The cognitive research is clear: keeping options open indefinitely has real costs—decision paralysis, delayed momentum, accumulated complexity from hedging. There is a cost to optionality, and it compounds. The principle is not "maximize reversibility"—it is "defer commitment proportional to the information value of waiting and the cost of deferral."

When ADRs become compliance theater. An ADR written after the decision to document what was already done, with no honest account of uncertainty, does not preserve epistemic state—it manufactures a retrospective rationale. ADRs are only valuable if they capture genuine reasoning, including what was unknown. Most current ADR practices focus on decision documentation rather than ex-post evaluation of consequences. The exercise above is specifically designed to push against that.

When abduction becomes motivated reasoning. Peirce's abduction is constrained by the requirement to seriously consider competing hypotheses. If you form your abductive hypothesis first and then selectively gather evidence for it, you are not doing Peircean inquiry—you are doing confirmation bias with philosophical terminology. Red teaming and pre-mortems are structural countermeasures: they force the team to generate hypotheses about failure before commitment, when the cost of changing course is low.

When epistemic humility becomes epistemic cowardice. Acknowledging uncertainty is a virtue. Using it as a reason to avoid taking a position—"we can't know, so let's not commit"—is not. Epistemic humility is about calibrating confidence to evidence, not about abandoning judgment. Engineering judgment requires the courage to act under uncertainty once the available evidence has been honestly assessed.

Key Takeaways

  1. Risk and Knightian uncertainty are not the same thing. Risk admits probabilistic reasoning; uncertainty does not. The choice of decision framework depends on which situation you are actually in.
  2. Peirce's abduction — hypothesis from surprise, followed by deduction and induction — describes how engineers actually generate new knowledge during diagnosis and design. Making this structure explicit sharpens inquiry.
  3. Maximin, satisficing, and regret minimization are principled alternatives to expected utility theory when probabilities are unknown. Each is appropriate in different conditions; using them interchangeably dilutes their value.
  4. Reversibility has epistemic value. Modular, decoupled architecture is not just good engineering — it is an epistemological strategy that keeps commitment proportional to certainty and creates space for learning before you are locked in.
  5. ADRs are most valuable when they capture epistemic state at decision time, not just outcomes. A record of what was known, assumed, and unknown is a learning instrument. A record of what was decided is an audit trail.