Philosophy

Language, Meaning, and Code

Why naming is hard, why codebases drift, and why code review is an interpretive act

Learning Objectives

By the end of this module you will be able to:

Apply Frege's sense/reference distinction to explain why two engineers can use the same term to mean different things—and why synonymous function names are never truly interchangeable.
Identify the language game in play when a team debates naming conventions, and use Wittgenstein's framework to diagnose why those debates resist resolution through logic alone.
Use the hermeneutic circle to explain why onboarding into a legacy codebase is epistemologically difficult, not merely technically difficult.
Distinguish documentation as a speech act (assertion, directive, promise) from documentation as mere description, and apply the distinction when writing or reviewing an ADR.
Trace how the Sapir-Whorf hypothesis plays out in programming language choice and in the shared vocabulary of a team's domain model.

Core Concepts

1. Frege's Sense and Reference

In 1892, Gottlob Frege drew a distinction that changed philosophy of language: the difference between Sinn (sense) and Bedeutung (reference).

The reference of a term is the object it picks out in the world. The sense is the mode of presentation—the way the term frames and delivers that object to a thinking mind. Two expressions can share the same reference while carrying different senses, foregrounding different properties, and enabling different inferences.

The canonical example is astronomical: "the morning star" and "the evening star" both refer to Venus, but they carry different senses. Knowing that they co-refer is a genuine discovery, not a tautology.

"A sense is a mode of presentation of an object and a way of thinking of an object." — Frege

This maps directly to APIs. Consider process() and executeWithRetry(): both might reference the same underlying operation, but they communicate entirely different senses. The first hides error-handling semantics. The second encodes retry behavior as a design commitment and signals to readers that failures are expected and handled. According to Frege's theory, this difference is not cosmetic—different senses foreclose or enable different cognitive possibilities for callers of that API. Choosing among synonymous function names is, in Frege's terms, a choice about which sense to make available.

The concept of mode of presentation (Darstellungsweise) extends this further. save(), persist(), and flush() might describe identical underlying effects, yet each guides the developer toward a different mental model of the operation's semantics—transience vs. persistence vs. I/O commitment. The mode of presentation is not decorative; it is constitutive of what the API communicates.

A well-named variable or function encodes a sense-as-instruction: userCount encodes not only what the variable holds but how to reason about it—incrementing for new users, validating non-negativity—without requiring an explanatory comment. Empirical research confirms that poorly named variables force developers to recover through reading surrounding code what should have been implicit in the name itself.

2. Wittgenstein and Language Games

The later Wittgenstein dismantled the assumption that meaning is correspondence—the idea that a word's meaning is the object it refers to. Instead, he proposed that "the meaning of a word is its use in the language." Meaning is not fixed by an external referent; it is constituted by participation in a social practice.

He introduced the concept of language games: bounded forms of activity in which words acquire meaning through use within specific rules and contexts. The word "game" itself illustrates the point—chess, football, and solitaire share no single defining property, yet they belong together through a network of overlapping similarities Wittgenstein called family resemblance.

Language Game

A language game is a bounded social practice in which words carry meaning through their role in the activity—not through reference to abstract essences. Language use is always embedded in a form of life.

This has a precise implication for software. The meaning of User in an e-commerce codebase is not determined by what "user" means in general; it is constituted by how User is used in that domain's language game—what operations it participates in, what it is composed with, what it is distinguished from. The ubiquitous language of Domain-Driven Design operationalizes this Wittgensteinian insight: shared vocabulary must be grounded in shared practice, not in abstract definitions.

Developer communities constitute distinct language games. Naming conventions, architectural idioms, and even review norms form the rules of a language game. Onboarding is fundamentally a process of learning to participate in that game: not just learning what terms mean lexically, but internalizing how they are used in this form of life. A new engineer who imports meanings from their previous team's language game will misfire constantly—not because they are wrong in general, but because they are playing the wrong game.

Family resemblance explains a recurring frustration in naming. Polymorphic methods like process(), execute(), or handle() resist precise naming because they cover cases with no single unifying essence—they form a network of overlapping similarities. Wittgenstein's framework suggests that the search for the perfect unifying name may be based on a false assumption: that the concept has essential unity. Recognizing family resemblance means accepting that context-sensitive naming may sometimes be philosophically appropriate, not a failure of rigor.

3. Linguistic Relativity and Code

The Sapir-Whorf hypothesis, in its empirically supported weaker form, holds that the structures of a language influence—without fully determining—how speakers perceive and frame problems. Contemporary linguistics confirms this as linguistic relativity: language shapes perception without strictly constraining it.

This applies to programming in two ways.

First, programming language choice shapes the space of solutions engineers readily conceive. Kenneth Iverson, in his APL Turing Award lecture, explicitly argued that more powerful notations aid thinking about algorithms. Functional languages make certain recursive decompositions natural; object-oriented languages foreground object graphs and message passing. The language does not prevent you from implementing any particular solution, but it shapes which solutions you reach for first.

Second, a team's domain vocabulary shapes the distinctions they habitually make. Yukihiro Matsumoto cited the Sapir-Whorf hypothesis—mediated through a science fiction novel—as an explicit inspiration for Ruby's design. The naming patterns available in a team's dialect constrain and enable what kinds of distinctions developers naturally draw. A codebase that conflates two concepts under a single name is not merely inelegant—it subtly suppresses the cognitive move of treating them as distinct.

Not Determinism

The Sapir-Whorf hypothesis, as applied here, is about influence, not constraint. Developers can always overcome their language's framing—but the framing has real cognitive costs.

4. The Hermeneutic Circle and Code Understanding

Hermeneutics is the theory of interpretation. Its central mechanism is the hermeneutic circle: understanding a whole requires understanding its parts, but understanding a part requires understanding the whole. The movement is not circular in a vicious sense—it is iterative and progressive, with each pass refining the other level.

Applied to software, this describes exactly how experienced engineers approach an unfamiliar codebase. You read a function and form a partial understanding. You then read the module that calls it and revise. You read the architecture document and revise again. The initial understanding is not abandoned—it evolves through a process of widening context.

Gadamer extended this with the concept of horizon: the situated perspective that makes understanding possible in the first place. Every engineer brings a different horizon, formed by prior experience, tools used, architectural decisions made, and codebases worked on. When joining a new team, the engineer must undergo a fusion of horizons: their existing perspective meets the perspective embedded in the codebase, and genuine understanding emerges from the encounter—not from the engineer's prior knowledge alone, nor from the code alone, but from the fusion.

This makes onboarding epistemologically difficult in a specific sense. The difficulty is not that the legacy code is complex or poorly written. The difficulty is structural: understanding requires iterating between local and global comprehension, and you must bring your horizon into contact with the codebase's embedded horizon before fusion is possible. There is no shortcut.

5. Software as Text and Code Without Author

Ricoeur's hermeneutic framework treats texts as symbolic inscriptions that become autonomous once written—they outlive their authors and can be interpreted without access to authorial intent. Software is precisely this: a symbolic inscription mediating between human intention and machine execution, requiring hermeneutic reading practices.

Legacy code must be understood as an autonomous text. Author intent is unavailable, incomplete, or contradicted by later modifications. The code must be interpreted on its own terms—through the structure of the code, the tests, architectural patterns, and the traces of decision-making embedded in the system. This is not a deficiency to be overcome by asking the original author—it is the normal epistemic condition of working with any sufficiently old text.

Code has a dual character that prose does not: it must be simultaneously machine-executable (unambiguous formal semantics for the compiler) and human-interpretable (communicating design intent to readers). The executability requirement constrains what expressive possibilities are available for human-directed communication. The gap between what code does and why it was written that way defines a persistent epistemological boundary that code alone cannot fully close.

6. Documentation and Speech Acts

Austin and Searle's speech act theory distinguishes what an utterance says from what it does. Austin's foundational insight was that "to say 'I resign', 'I apologise' or 'You're fired' is, in suitable circumstances, to perform the very act"—not merely to describe it. Searle's taxonomy identifies five types of illocutionary acts: assertives (claiming), directives (instructing), commissives (committing), expressives (expressing psychological states), and declarations (bringing states of affairs into existence by the act of saying so).

Technical documentation participates in all of these. A specification does not merely describe desired system behavior—it makes promises (commissives) that constrain future work. An ADR does not merely record a decision—it declares an architectural boundary and directs future readers to preserve it. A code comment can assert what a function does, instruct how to use it, or warn what not to do.

A specification is not asserting facts about desired behavior. It is making promises and issuing directives that constrain future engineering work.

Treating documentation as performative rather than merely descriptive changes how it should be written and read. A poorly written ADR that does not make its commitments explicit fails not because it is unclear—but because it has not performed the speech acts required to constrain the team's future decisions. The illocutionary force of the document is absent.

Worked Example

Scenario: A team is naming a service that receives payment events, routes them to the appropriate handler, and triggers downstream notifications.

Round 1 — Reference-only thinking: The team proposes PaymentProcessor. This names the reference (something about payments and processing) but leaves the sense underspecified. Does "processor" mean transformation, routing, execution, or orchestration? Frege's framework would diagnose a reference without a stable sense.

Round 2 — Sense shaping: They consider alternatives:

PaymentEventRouter — encodes the sense that this component routes; callers should not expect state transformation.
PaymentOrchestrator — encodes the sense that this component coordinates downstream effects; implies broader authority.
PaymentEventDispatcher — encodes the sense that the component dispatches without concern for outcomes; the sense suggests fire-and-forget semantics.

Each name is a mode of presentation that forecloses some uses and enables others. The team's debate is not a lexical dispute—it is a dispute about which sense is correct for the domain, which requires understanding the actual business logic being modeled.

Round 3 — Language game diagnosis: A new team member from a Python background keeps suggesting handle() for internal methods. The existing team, shaped by a Go codebase, expects explicit method names. This is a collision between two language games—neither is wrong in the abstract, but only one is the game being played here. The solution is not to decide who is right; it is to make the team's language game explicit.

Round 4 — Speech act in the ADR: The team writes an ADR that concludes: "We will use Dispatcher as the suffix for all components that route events without awaiting downstream confirmation." This is not an assertion—it is a declaration that brings a naming convention into existence, and a directive to all future engineers. The ADR will outlive any individual engineer on the team. Written under Ricoeur's model, it is an autonomous text whose illocutionary force must be self-sufficient.

Annotated Case Study

The drift of "continuous integration"

Martin Fowler introduced continuous integration with a precise, narrow meaning: integrating code into the mainline multiple times per day, with automated verification at each merge. The practice was designed to eliminate long-lived branches and make integration conflicts impossible.

Within a decade, the term had drifted to mean "running automated tests in a pipeline on any push"—a practice that is compatible with long-lived feature branches, which is exactly what the original concept was designed to eliminate.

Annotation 1 — Wittgenstein: The meaning shifted because the language game changed. Different teams adopted the term, carrying different practices and different forms of life. "CI" now participates in a different language game than the one Fowler played. The word has the same sound; its use is different.

Annotation 2 — Meaning shift is not corruption: This is not merely degradation of a definition. From a Wittgensteinian perspective, meaning is use. The term "continuous integration" now means what it is used to mean in the dominant practice community. Fowler's original sense is still retrievable—but it requires understanding the genealogy of the term's use.

Annotation 3 — Staff engineers and semantic archaeology: A staff engineer leading an initiative to "adopt CI/CD" must first establish which language game they are playing. Does "CI" mean mainline integration, or does it mean pipeline automation? Understanding the genealogy of the term prevents spending months implementing pipelines while assuming the team has agreed to abandon long-lived branches—two completely different initiatives that share a name.

Annotation 4 — The hermeneutic circle in debugging drift: You cannot understand what a team means by "CI" by asking them directly—they will give you the term back. You understand it by watching how they use it: how long their branches live, when they merge, what triggers their pipelines. The whole (team practice) and the parts (individual decisions) must be read against each other.

Common Misconceptions

Misconception 1: Naming is a style preference, not a quality concern.

The evidence is clear: identifier naming flaws directly correlate with defect rates and maintenance burden. Naming is a quality concern with measurable effects on defect rates, comprehension speed (descriptive names yield approximately 19% faster comprehension), and team velocity. Treating it as a style preference misidentifies the stakes.

Misconception 2: The right name is the one that most accurately describes the function.

Frege's framework shows that accuracy of reference is not the primary challenge—the primary challenge is specifying the right sense. Two accurate names can present the same referent under incompatible senses. The right name is the one whose sense is correct for the domain's language game—which is a question about how the concept is used, not about lexical correspondence.

Misconception 3: Documentation should be as complete as possible to minimize ambiguity.

Speech act theory reveals that more description is not the same as more performative force. A ten-page specification that fails to issue clear directives, make explicit promises, or declare architectural boundaries will be less effective than a one-page ADR that performs each of those acts explicitly. The question is not: how much did we document? It is: what acts did the documentation perform?

Misconception 4: Legacy code problems are primarily technical debt.

The hermeneutic framework reveals that much of what is called "technical debt" in legacy systems is actually an interpretation problem: the code is an autonomous text whose sense has drifted from the team's current understanding of the domain. The code cannot tell you why it was written that way—and no amount of refactoring resolves the underlying epistemological gap if the team's conceptual model of the domain has not been made explicit.

Misconception 5: Onboarding difficulty is a knowledge transfer problem.

Onboarding is a horizon-fusion problem. It is not resolved by documenting more, because the new engineer's horizon must actually contact and merge with the codebase's embedded horizon. Documentation accelerates that contact—but it cannot replace the iterative hermeneutic process of reading code, forming understandings, having them revised, and gradually entering the team's language game.

Active Exercise

Part A — Sense Analysis (20 min)

Take a class, function, or module from a codebase you work in that has a generic name (Manager, Service, Handler, Processor, Util).

Write down the reference: what operation or entity does this name pick out?
Write down the current sense as you understand it: what does the name communicate about how to use this, what it assumes, and what it promises?
Propose two alternative names that would present the same reference under different senses. For each, write one inference a reader would draw from the new name that they would not draw from the current name.
Which name is most honest to the actual behavior? Which name is most honest to the team's domain language game?

Part B — Speech Act Audit (15 min)

Take an ADR, architecture document, or significant README from your organization and identify which sentences perform each of the following acts:

Assertions: claims about what is currently true
Directives: instructions to future engineers
Commissives: promises about system behavior or team behavior
Declarations: boundaries or conventions being brought into existence by the document itself

Count how many sentences are pure assertions (describing state) versus sentences that perform acts (commit, instruct, declare). Is the ratio what you would expect for a document meant to constrain future decisions?

Part C — Hermeneutic Circle Reconstruction (15 min)

Identify a piece of code in your codebase that confused you when you first read it—or identify a place where you know onboarding engineers struggle.

Write a brief account of what the horizon gap is: what does an incoming engineer's horizon contain, and what does the codebase's embedded horizon require? What is the fusion path—not how to document it better, but what iterative reading process would close the gap? What parts-to-whole and whole-to-parts moves does the engineer need to make?

Key Takeaways

Naming is an epistemological act, not a lexical one. Frege's sense/reference distinction reveals that choosing a name is choosing which mode of presentation to make available. Names encode inferences, constraints, and design commitments — not just labels. Poor naming reflects poor domain understanding, not poor linguistic skill.
Meaning is use in a community, not correspondence to a referent. Wittgenstein's language game framework explains why the same term can mean different things across teams, why onboarding is culturally difficult, and why naming debates resist resolution by logic: meaning is constituted by practice, and practices differ.
Code is an autonomous text requiring hermeneutic reading. Following Ricoeur and Gadamer, legacy code must be interpreted without access to author intent. Understanding requires iterating between parts and whole, and requires a fusion of the engineer's horizon with the codebase's embedded horizon. Onboarding is epistemologically difficult by structure, not by accident.
Documentation is performative, not merely descriptive. Speech act theory (Austin, Searle) reveals that specifications, ADRs, and comments perform acts — assertions, directives, promises, declarations — not just descriptions. A document that fails to perform the necessary acts fails at its primary function, regardless of how complete it is.
The language you use shapes the distinctions you make. The Sapir-Whorf hypothesis applies to both programming language choice and domain vocabulary. The naming patterns available in a team's dialect constrain and enable the conceptual distinctions developers habitually draw — with measurable cognitive consequences.

Further Exploration

Primary Sources

Frege, "On Sense and Reference" (1892) — The foundational paper. Read sections I and II to get the sense/reference distinction from the source.
Stanford Encyclopedia: Ludwig Wittgenstein — Comprehensive coverage of the language game concept and family resemblance. Focus on the sections on Philosophical Investigations.
Stanford Encyclopedia: Speech Acts — Austin and Searle's taxonomy, the basis for the documentation-as-performative analysis.
Internet Encyclopedia of Philosophy: Gadamer — Clear treatment of horizon, the hermeneutic circle, and fusion of horizons.

Applied Scholarship

Towards a Hermeneutic Definition of Software — Academic paper applying Ricoeur's hermeneutics directly to software. The most rigorous treatment of software-as-text.
On Software Engineering Hermeneutics — Practitioner-oriented application of hermeneutics to software engineering work. Readable and directly applicable.
Linguistic Relativity and Programming Languages — Accessible treatment of Sapir-Whorf applied to language choice.

Engineering Practice

Ubiquitous Language — Martin Fowler — DDD's operationalization of the Wittgensteinian principle that meaning is use in a community.
Two Hard Things — Martin Fowler — Fowler's reflection on Phil Karlton's aphorism; read alongside the Wittgenstein material for the philosophical underpinning.
Naming Things — namingthings.co — Practitioner-focused treatment of naming as a foundational engineering practice.
Relating Identifier Naming Flaws and Code Quality (ResearchGate) — The empirical anchor: naming quality predicts defect rates and maintenance burden.