learnings.n14n.dev
A personal AI-assisted encyclopedia and learning archive, grounded in supported claims
Lead Summary
learnings.n14n.dev is a personal learning archive whose contents — encyclopedic articles and structured learning plans — are produced by a multi-agent AI pipeline rather than written by hand. The site runs on a static site generator (Zola), with all content kept under version control as Markdown. What distinguishes it from a conventional personal blog or wiki is its commitment to a single architectural rule: every piece of generated prose is grounded in a corpus of supported claims, each of which records the sources, the strength of evidence, and a verdict before any article or learning module may quote it.
The pipeline decomposes into three layers. A research stage spawns specialised agents to discover concepts, assess bias, and evaluate evidence for a topic, writing each evaluated assertion to disk as a single-claim Markdown file with a verdict of supported, uncertain, or unsupported. A claims database indexes the supported claims with semantic embeddings, providing the rest of the system with retrieval primitives. Generation stages — articles and learning plans — pull claims out of that database and write prose that links back only to sources contained in the claims they cite.
"AI-assembled learning plans on niche topics — history, engineering, design, and wherever curiosity leads next. The curation and the questions are mine." — site homepage
Origins & Background
The site sits at the intersection of three concerns visible across its supported corpus.
The first is the well-documented epistemic looseness of large language models. LLMs have been characterised as "epistemologically indifferent" systems that operate neither in facts nor in fiction, generating linguistically plausible outputs from statistical patterns without any internal mechanism to enforce truth or falsity constraints. Some researchers further argue that hallucination may be inevitable given current transformer architectures, since these models maximise next-token likelihood with no truth oracle in the training loop. A site that intends to use generative AI to produce reference material has to take a position on this problem.
The second is a critique of personal knowledge management (PKM) systems. Practitioners and researchers have repeatedly identified the collector's fallacy — the pattern in which users confuse the act of capturing information with the act of understanding it — and the related IKEA effect in PKM, where time spent customising templates and dashboards inflates a user's sense of system value disproportionate to actual learning gains. A naive "second brain" of clipped quotes risks reproducing both pathologies in machine-readable form.
The third is the older tradition of state- or individual-sponsored encyclopedic compilation. The Chinese leishu, and especially the Yongle Dadian and Siku Quanshu, demonstrate that the impulse to organise knowledge categorically and exhaustively is not a Western-Enlightenment artefact. They also show that later compilations could be designed to preserve and integrate the materials of earlier ones — knowledge organisation as cumulative rather than disposable infrastructure. This site treats its own claim corpus as the analogous substrate: a small, curated, citable base that survives across rebuilds of the surface content.
Architecture
The repository is a single Git monorepo with three coupled subsystems.
The content layer is a Zola static site. Source Markdown lives under content/ for the public-facing pages and under articles/ and learning-plans/ for the AI-generated long-form content prior to assembly into Zola pages. Static generation removes runtime AI dependencies from the published site — the inference is offline, and what readers see is plain HTML.
The claims store is a SQLite database (claims/claims.db) backed by a directory of single-claim Markdown files (claims/{topic-id}/{verdict}/{claim-id}.md). Each claim file carries a YAML header with claim-id, topic-id, concept-id, source-type, tags, and score, followed by a structured body containing the claim itself, a sources table (with type, position, relevance, and strength), and supporting and refuting evidence sections. The Markdown files are the source of truth; the database is rebuilt from them. Embeddings are computed using Nomic's nomic-embed-text-v2-moe and stored alongside the claim metadata for semantic retrieval.
The orchestration layer is implemented as Claude Code skills and agents. Skills (/research, /research-plan, /research-run, /article, /learning-plan, /assembly-article, /assembly-learning, /claims) are user-invocable workflows that drive the main conversation; each skill spawns one or more specialised agents — concept-discoverer, concept-researcher, bias-assessor, article-writer, article-assembler, learning-sequencer, learning-writer, module-assembler — defined as separate prompt files. Agents run in parallel where they are independent and sequentially where one depends on another's output.
Storing claims as version-controlled Markdown means the corpus is auditable, diffable, and survives without the database. The SQLite store is a derived index: it can be rebuilt from the Markdown at any time. This inverts the more common pattern where the database is canonical and Markdown is exported.
Mechanism & Process
The pipeline as a whole
A topic enters the pipeline as a single phrase ("anarchism", "sleep", "sociotechnical systems") and exits as either an article, a learning plan, or both — but only after passing through claims. The intermediate claim corpus is the load-bearing artefact.
Research planning
When a user invokes /research <topic>, the orchestrator first disambiguates the topic interactively — asking about scope and source type — and then spawns five concept-discoverer agents in parallel, each assigned a different angle: overview, debate, recent, alternative, and cross-disciplinary. Each agent searches the live web from its angle and returns concepts it believes are central from that perspective. The orchestrator then merges overlapping concepts (concepts surfaced by multiple angles count as stronger signal), deduplicates against the existing claim corpus, and applies a four-question harm rubric — testing for uplift, reversibility, counterfactual availability, and operational framing — before locking in the research plan and writing a brief per concept.
The harm rubric is significant in itself: rather than blocking topics by surface pattern-matching, the system writes its editorial reasoning to research/{topic-id}/harm-assessment.md, making the editorial judgement explicit and auditable.
Research execution
For each concept in the locked plan, a concept-researcher agent searches the web again — this time with the concept's specific brief in hand — and writes its findings as one or more claims. Each claim records its sources, position (supports/neutral/refutes), relevance, strength, and an explicit verdict. A separate bias-assessor agent runs against the topic to flag systematic skews in available sources before research begins.
A storage script then ingests all claims into claims/{topic-id}/{verdict}/, where the verdict directory acts as a coarse triage gate: only supported/ claims are eligible to ground later prose. The unsupported and uncertain directories are kept rather than discarded, because their existence is editorially load-bearing — they record what the pipeline looked at and rejected, not just what it accepted.
Claim retrieval
Once stored, claims are queryable through the /claims skill, which wraps a small command-line tool over the SQLite database. Three retrieval modes matter in practice:
discover— given a free-text query, return all semantically related supported claims across topics, with similarity scores and tag frequencies. This is the entry point used by both article and learning-plan generation.similar— given an arbitrary text, return claims close to it in embedding space (used to detect duplicates before research adds redundant material).search— metadata filters (topic, concept, verdict, score, tag) for direct lookup.
This is recognisably a retrieval-augmented generation (RAG) architecture, whose canonical three-stage decomposition — retrieve relevant documents, augment the prompt with them, generate a grounded response — is the foundational architectural pattern used across industry and academic implementations. RAG-grounded systems generate more specific, diverse, and factual language than parametric-only models by anchoring outputs in non-parametric memory rather than in the model's pretraining weights.
Article generation
The /article skill spawns one article-writer agent per topic. The agent calls claims_db.py discover for its topic, fails fast if fewer than 20 supported claims exist, and otherwise reads every claim file the search returned. It then writes prose under the constraint that it may quote only what those claims contain and may link only to sources those claims cite. Articles follow a fixed frontmatter schema (title, subtitle, claim count, infobox) and a menu of permitted section types (Lead Summary, Origins & Background, Core Concepts, Controversies & Debates, Legacy, Further Reading, and so on).
Learning plan generation
The /learning-plan skill takes a different shape. After confirming that enough supported claims exist for a chosen plan size — small (≥20), medium (≥50), large (≥80), or extra-large (≥200) — and asking the user about target persona and learning themes, it spawns a learning-sequencer agent to design a module sequence, then one learning-writer per module to produce content. The output is structured pedagogically rather than encyclopedically: modules are intended to be read in order and to scaffold understanding cumulatively.
Assembly
A final pair of skills (/assembly-article, /assembly-learning) converts the AI-produced Markdown into Zola-shaped pages under content/, classifies each into one of four domains — engineering, natural-sciences, social-sciences, humanities — and assigns a sub-classification within that domain. Domain assignment is the only step where editorial judgement is applied to the placement of an article rather than its content. The site is then rebuilt with zola build.
Core Concepts
Claims as the unit of knowledge
The site does not treat articles as its unit of knowledge. Articles are derivative — they are reconstructions of the claim corpus around a particular topic frame. The unit of knowledge is the single claim: an assertion plus its sources plus its verdict.
This has practical consequences. A claim can be reused across articles. A claim that gets refuted by later research can be moved from supported/ to unsupported/ without rewriting any article that mentioned it — though the article will then need to be regenerated. The verdict directory is a soft form of provenance tracking, doing for prose what type systems do for code.
Citation grounding as hallucination defence
A persistent concern with generative writing is that LLMs are statistical coherence engines without truth constraints, producing outputs that sound right whether or not they are right. The site's defence is procedural rather than algorithmic: the article-writer agent is instructed to use claims as the only source of truth and to link only to URLs found inside claim files.
This procedural grounding is consistent with the broader RAG literature, where retrieving relevant documents into the prompt is shown to reduce — though not eliminate — fabrication. Two mechanisms in particular drive RAG hallucinations even when retrieval works: chunks that split a concept across boundaries force the model to confabulate the missing context, and retrieving noisy or irrelevant passages confuses the generation. The site mitigates the first by storing claims as small, self-contained units (a single claim per file with its sources inline) rather than chunked long-form documents, and the second by gating retrieval on the supported verdict rather than on raw similarity alone.
Multi-agent decomposition
The pipeline is structured as a many-agent system rather than as a single long prompt. Five concept-discoverers run in parallel, one per angle. One concept-researcher runs per concept, in parallel across concepts. Each agent is given a narrow, well-typed input and produces a small, well-typed output written to a known path. The orchestrator coordinates by waiting for files to exist on disk and then moving on.
This is recognisably the pattern that domain-driven design literature recommends for multi-agent AI systems: agents are organised around bounded contexts and event-driven (in this case, file-driven) interactions, rather than aggregated into ad-hoc tool collections. The mechanism by which independent agents converge on a coherent result without central control is closer to constraint propagation than to hierarchical command: each agent enforces a local constraint (the brief, the harm rubric, the verdict gate) and propagates its output downstream.
Multi-agent LLM systems exhibit emergent group behaviours that are not explicitly designed — coordination failures and unintended aggregate goals can arise even when each agent in isolation behaves acceptably. The site's mitigation is to keep agents short-lived, narrowly scoped, and stateless across runs, with all coordination state externalised to files.
Prompts as first-class artefacts
Every skill and agent is a checked-in Markdown file with structured frontmatter (name, description, tools, model) and a versioned body of instructions. Prompts are diffed, reviewed, and edited like any other source artefact. This reflects a broader shift in LLM application engineering: systematic prompt engineering frameworks treat prompts as first-class program entities with dedicated primitives for versioning, testing, composition, and optimisation — and indeed as production specifications alongside code, maintained with the same rigour as formal requirements.
From a linguistic standpoint, prompts are directive speech acts: utterances intended to direct a hearer to perform a specific action. The site's agent prompts are written in correspondingly direct, imperative language — explicit input contracts, explicit output contracts, explicit failure conditions — rather than in conversational or polite forms.
Editorial Stance
The site takes several explicit positions, recorded in CLAUDE.md rather than left implicit.
Banned sources. A small allow/deny list excludes AI-generated content sites (e.g. Grokipedia), sources with a commercial incentive in the conclusion ("websites having an incentive to say something is true, e.g. because they are selling similar products"), and predatory publishers. The list is curator-defined; there is no peer review.
Verdict triage. Claims are sorted into supported, uncertain, and unsupported directories at storage time, and only supported claims may be cited. The other directories are retained rather than deleted: they record what the pipeline considered and rejected.
Harm rubric. Topics and individual concepts are assessed against a four-question harm rubric (uplift, reversibility, counterfactual availability, operational framing) before research begins, and the answers are written to a file alongside the research output. This is intentionally an editorial gate, not a content classifier — it produces a justification, not a score.
Output lengths gated by evidence. Learning plan size is determined by available supported claims (≥20 for small, ≥80 for large), so the system cannot be asked to produce a long output for a thinly-evidenced topic.
Process over comprehensiveness. The site is not trying to cover all of human knowledge. The homepage frames it as following "wherever curiosity leads next". The selection bias is acknowledged rather than denied.
Sociotechnical Position
The site is itself a small sociotechnical system: a coupled technical pipeline (agents, embeddings, build tooling) and a social subsystem (a single curator's choice of topic, scope, harm tolerance, and editorial banlist). Joint optimisation is visible at several joins: the harm rubric is technical (rule-driven) but its answers are written by the curator-in-the-loop; the verdict directories are filesystem artefacts but the verdict itself is assigned by an evaluating agent; banned-source rules are encoded in CLAUDE.md but enforced only by the curator's review of cited URLs.
The agent topology exhibits a small instance of Conway's Law in reverse: because the editorial pipeline is staged (research → claims → articles, learning plans), the agent set is staged the same way, with no agent crossing stage boundaries. The pipeline shape is not incidental; it mirrors what it would look like to do this work without AI.
Limitations & Open Problems
Hallucination is not eliminated, only narrowed. The architecture is RAG, not formal verification. Hallucination may be inevitable in transformer-based systems, and embedding-based safety is provably insufficient on aligned models. Citation grounding reduces the failure surface; it does not close it.
Single-curator scale. The corpus has one editor, one harm rubric, and one banned-source list. There is no peer review, no community editing, and no formal disagreement mechanism. This is the site's main divergence from Wikipedia-class encyclopedias, and from the imperial-scale emperor-sponsored leishu — both of which scale through editorial multiplicity.
Claims are not facts. Verdicts are assigned by an agent reading sources, not by independent reproduction or formal proof. A supported claim is one for which the pipeline found and recorded supporting sources at research time; it is not a guarantee that the claim is true, that the sources are correct, or that newer evidence has not displaced it. The verdict structure is auditable but not authoritative.
Knowledge organisation encodes a worldview. Scholars of Chinese encyclopedic traditions emphasise that the inclusion and exclusion of categories, the hierarchical relationships between them, and their sequence all encode epistemological commitments about how knowledge maps onto cosmic order. The same is true here: the four-domain taxonomy (engineering, natural sciences, social sciences, humanities) is a Western academic structure, and the choice of what counts as a "concept" is shaped by the discovery angles the orchestrator was designed to spawn.
The justified-true-belief gap. Even with citations, LLM-produced prose does not satisfy classical justified-true-belief accounts of knowledge: the justification of an LLM output (statistical conditioning on training data plus retrieved context) differs in kind from epistemic warrant grounded in understanding or reliable truth-tracking. The site is best read as a structured digest of what its sources say, not as a system that itself knows anything.
Comparison with Adjacent Practices
Versus conventional PKM. Traditional personal knowledge management systems are vulnerable to the collector's fallacy and the IKEA effect — the more time spent on capture and customisation, the stronger the illusion of value, regardless of whether material is actually being processed. By contrast, the active-learning literature describes a rephrase-relate-revisit cycle that does produce learning gains: distil ideas in your own words, connect them to other notes, and return periodically to revise. This site automates a fragment of that loop — claims rephrase a source's findings into a constrained schema; articles relate claims across topics — but the revisit part remains the curator's responsibility.
Versus Wikipedia. The two share the goal of citation-grounded reference content, but the editorial and authorship mechanisms differ. Wikipedia is human-edited with community consensus and a formal source-quality policy; this site is AI-written with a single human's editorial gates. Wikipedia survives through editor multiplicity; this site survives through verdict directories and version control.
Versus the leishu and Siku Quanshu. Imperial Chinese encyclopedic projects compiled vast bodies of text (the Yongle Dadian ran to over 11,000 volumes) into rhyme-based or four-part categorical systems that encoded a particular cosmological worldview. The scale and the patronage are incomparable. The structural similarity is in the cumulative posture: the Siku Quanshu deliberately preserved excerpts from the depleted Yongle Dadian, treating an earlier compilation as substrate rather than as a competitor. The claim corpus is intended to play the same role here — a small, durable substrate from which surface content can be regenerated.
Further Exploration
Foundational concepts
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020) — The seminal paper on the retrieve–augment–generate decomposition that this site's pipeline implements
- Rethinking Error: Hallucinations and Epistemological Indifference — Duke (2025) — Frames LLM hallucination as a property of the model class
- The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems — arXiv 2512.15068 — Shows why semantic similarity is insufficient as a safety guarantee
Methodology and systems
- Making Prompts First-Class Citizens for Adaptive LLM Pipelines — VLDB/CIDR 2026 — Treating prompts as versioned, tested artefacts
- Applying Domain-Driven Design Principles to Multi-Agent AI Systems
- Emergent Group Behaviours in Multi-Agent LLM Systems