Structured Output and Schema-Driven Prompting
Make your prompts behave like typed interfaces — testable, verifiable, and maintainable
Learning Objectives
By the end of this module you will be able to:
- Design schema-first prompts that define terms before use and avoid underspecification patterns.
- Explain how structured output constraints interact with prompt instructions and where they can conflict.
- Apply validation strategies to catch model output failures before they reach downstream systems.
- Identify LLM knowledge gap and translation failure patterns, and write compensating prompt language.
Core Concepts
Underspecification: The Measurable Problem
The reason structured output matters is not aesthetic — it is quantitative. Research on LLM prompts shows that when a requirement is left unspecified, accuracy drops by an average of 22.6%, with some cases reaching a 93.1% drop compared to when the requirement is explicitly stated. The model does not fail because it lacks capability; it fails because no one told it what was expected.
This is not an LLM-specific phenomenon. Studies across twelve companies consistently find that ambiguity, inconsistency, and incompleteness in natural language specifications are the most frequently reported obstacles practitioners face — across different organizations and engineering cultures, after decades of research on the subject.
Empirical analysis found that ineffective prompts contain knowledge gaps in 44.6% of cases, compared to only 12.6% in effective ones. If your prompts fail intermittently, missing specifications are the most likely cause.
Knowledge gaps fall into four categories:
- Missing context — the model lacks background to correctly interpret the task
- Missing specifications — the expected output shape, constraints, or behavior is not stated
- Multiple context conflicts — the prompt contains contradictory assumptions
- Unclear instructions — the task description is syntactically parseable but semantically ambiguous
The Semantic-Pragmatic Gap
There is a subtler failure mode than missing information. Research shows that models can correctly answer multiple-choice questions about a specification and still fail to apply its pragmatic meaning when generating a response. A model may "know" a constraint in the sense of correctly classifying it — and still violate it during generation.
This gap is not closed by adding more tokens to the prompt. It reflects how autoregressive generation compounds errors across context, and why declarative constraints at the schema level enforce what prose instructions can only suggest.
Schema as Specification
The engineering response to underspecification is schema-first development: define the data contract before writing the prompt. The schema — whether Pydantic, JSON Schema, or Zod — becomes the primary specification of what the model must produce. Field types, validation rules, and field descriptions are not supplementary to the prompt; they are the prompt, formalized.
This pattern inverts the traditional ad-hoc iteration loop. Instead of writing a prompt and iterating until the output looks right, you:
- Define the schema contract (the ground truth)
- Write the prompt to clarify that schema's semantics
- Validate outputs against the schema at runtime
The result is that constraints are explicit in the data model rather than embedded in prose, and the schema definition doubles as formal documentation for downstream consumers.
Field Descriptions Are Prompt Instructions
One non-obvious implication of schema-first development: the description field in a Pydantic Field() (and its equivalents in other schema systems) is not just documentation — it gets sent to the model as part of the instruction. The field description is dual-purpose: it tells developers what the field means, and it tells the model what to generate.
This means crafting good field descriptions is part of prompt engineering, even though the surface looks like pure data definition.
A field description that is vague for a developer is vague for the model. Treat every description= as a prompt instruction.
Structured Output as Industry Standard
Schema-based output control has converged as standard practice across OpenAI, LangChain, Instructor, and LangGraph. The pattern is consistent: Pydantic models define output shape, types, and constraints; frameworks automatically convert these models to JSON Schema and inject them into the model call.
The Model Context Protocol (MCP) goes a step further: it elevates JSON Schema from best practice to protocol component. MCP defines its core primitives — Tools, Resources, Prompts — using structured JSON Schema specifications, making schema a first-class runtime element for AI applications rather than a post-hoc validation concern. Following its November 2024 release, major providers including OpenAI and Google DeepMind adopted the protocol.
Key Principles
1. Define before you use. All terms, acronyms, and domain-specific concepts must be defined before they appear in the prompt or schema. Glossaries built during the design phase, not after initial drafting, prevent ambiguity from compounding downstream.
2. Write instructions in present tense and active voice. Present tense removes temporal ambiguity about when an obligation applies. "The system returns an ISO 8601 timestamp" is more precise than "The system should return a timestamp" — the latter admits degrees of optionality that the former does not.
3. Treat field descriptions as instructions, not documentation. The description you write for a schema field is sent to the model verbatim. Write it as if you are directly instructing the model on what to generate for that field.
4. Validate twice: syntax and semantics. JSON-valid output can be semantically wrong. "Age must be between 18 and 100" is a semantic constraint that JSON syntax validation does not catch. Use custom validators (Pydantic validators, Instructor's retry loop) to enforce semantic correctness, not just structural shape.
5. Decompose before translating. LLMs are autoregressive and prone to error compounding in single-pass generation of complex, deeply nested structures. Break multi-level schemas into sub-tasks. Validate intermediate outputs before proceeding to the next stage.
Step-by-Step Procedure
Writing a Schema-First Prompt
Step 1: Define the output schema first. Before writing a single line of prompt prose, define what the model must return. Use Pydantic, JSON Schema, or Zod — whichever your runtime supports. Include:
- Field names (use explicit names, avoid abbreviations without definition)
- Field types (be specific:
datetime, notstr) - Validation constraints (
ge=0,max_length=200, enum values) - Field descriptions — this is your primary instruction surface
Step 2: Identify knowledge gaps. Ask: what does the model need to know to fill each field correctly that is not already expressed in the schema? This is where your prompt prose lives. Common gaps:
- What counts as a valid value for this field in your domain?
- What should the model do when the input is ambiguous?
- Are there edge cases the model should handle explicitly?
Step 3: Write the prompt to close the gaps. Write instructions that address only the gaps identified in Step 2. Use:
- Present tense, active voice
- Explicit definitions for domain terms
- Concrete examples for non-obvious fields
- Explicit fallback instructions for ambiguous inputs (e.g., "If the date cannot be determined from the input, return null.")
Step 4: Add a validation layer. Wire the schema to a validation step that runs on every model response:
- Use Pydantic's
ValidationErroror Instructor's retry loop for structural and semantic validation - Define custom validators for domain-specific constraints that the type system cannot express
- Log validation failures — they are your primary signal for prompt improvement
Step 5: Iterate on field descriptions before iterating on prose. If the model produces well-formed but semantically incorrect output, the problem is almost always in a field description, not in the top-level prompt instructions. Update the description first.
Decision point — when to retry vs. fail:
- If a validation error is recoverable (a missing optional field, a constraint violation that the model can fix), use a retry loop with the error appended to the prompt
- If a validation error is structural (the model produced non-JSON, or a required field is absent), log and escalate — retrying with the same prompt will not fix a systemic gap
Worked Example
Task: Extract meeting metadata from an unstructured transcript.
Without Schema-First Approach (Before)
Extract the key information from this meeting transcript: {transcript}
This prompt produces inconsistent output. The model may return a bullet list, a paragraph, or JSON depending on context. The date might be formatted as "Tuesday" in one response and "2026-04-15" in another. There is no way to validate the output programmatically.
With Schema-First Approach (After)
Step 1 — Define the schema:
from pydantic import BaseModel, Field
from datetime import date
from typing import Optional, List
class MeetingMetadata(BaseModel):
title: str = Field(
description="A concise title for the meeting. Use the main topic discussed. Max 80 characters."
)
date: Optional[date] = Field(
description="The date the meeting occurred, in ISO 8601 format (YYYY-MM-DD). Return null if no date is mentioned."
)
participants: List[str] = Field(
description="Full names of all participants explicitly mentioned in the transcript. Do not infer names not stated."
)
decisions: List[str] = Field(
description="Each decision reached during the meeting. A decision is a statement of agreed action or position, not a discussion point."
)
action_items: List[str] = Field(
description="Each action item assigned. Format: '[Owner]: [Task] by [Deadline]'. Omit the deadline clause if no deadline is stated."
)
Step 2 — Identify knowledge gaps: The schema handles structure and types. The remaining gaps are:
- What distinguishes a "decision" from a "discussion point" (defined in the
decisionsdescription) - How to handle an absent date (handled by
Optional[date]+ description) - How to format action items (handled by the
action_itemsdescription)
Step 3 — Write the prompt:
You are a meeting analyst. Extract structured metadata from the meeting transcript below.
Follow the schema precisely. Do not infer participant names not explicitly stated.
If a field cannot be determined from the transcript, use the null or empty value as specified.
Transcript:
{transcript}
Note how sparse the prompt prose is. The schema descriptions carry the load.
Step 4 — Validate:
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
metadata = client.chat.completions.create(
model="gpt-4o",
response_model=MeetingMetadata,
messages=[{"role": "user", "content": prompt}]
)
# Instructor automatically retries with the ValidationError message if schema is violatedThe field description annotations are not passive documentation. They are sent to the model as instructions during the structured output call. Every constraint you write in a description= is a line of prompt you are not repeating in prose — and it is co-located with the field it governs.
Common Misconceptions
"Adding more prompt instructions will fix unreliable outputs."
Unreliable outputs are usually a sign of underspecification in the schema, not insufficient prose. Before adding instructions, identify which field is inconsistent and improve its description. Schema-level fixes are more durable than prose-level patches.
"The model knows what I mean by 'name' / 'date' / 'status'." LLMs can fail at pragmatic inference even when they can correctly identify the right answer in a multiple-choice context. Ambiguous field names — particularly those with domain-specific meanings that differ from their colloquial meaning — require explicit definition. "Status" in a CRM context has a specific set of valid values; the model's training data has seen dozens of other interpretations of that word.
"Structured output mode guarantees correct output."
Structured output mode (where supported by the provider) guarantees that the response is valid JSON matching your schema's structural shape. It does not guarantee semantic correctness. A sentiment field constrained to ["positive", "negative", "neutral"] will always produce one of those values — but may produce the wrong one. Semantic validation is your responsibility.
"Field descriptions are just for developers."
They are prompt instructions sent to the model. If you write description="The user's name", that is the instruction the model receives for that field. Write descriptions as if you are telling the model exactly what to put there.
"Schema-first is more work upfront." It is — the same way writing a typed function signature is more work than writing a duck-typed one. The payoff is that constraints are explicit in the data model rather than embedded in prose, making the system testable, debuggable, and maintainable without re-reading the full prompt.
Active Exercise
Exercise: Diagnose and repair an underspecified schema
Below is a schema and prompt that produce inconsistent outputs in production. Your task is to identify the knowledge gaps and fix them.
Current schema:
class SupportTicket(BaseModel):
priority: str
category: str
summary: str
resolved: bool
Current prompt:
Classify this support ticket: {ticket_text}
Observed failures:
priorityreturns values like "high", "High", "HIGH", "urgent", "P1" — all different models, or sometimes the same model on different dayssummarysometimes returns a single sentence, sometimes five paragraphscategoryreturns values that don't map to the system's actual categories: "billing", "technical", "account"resolvedis frequently wrong for ambiguous cases ("I think I fixed it but not sure")
Your task:
- Rewrite the schema with corrected field types, validation constraints, and explicit
descriptionvalues. - Identify which failures require prompt prose vs. which are fixed entirely at the schema level.
- Write the new minimal prompt prose for the remaining gaps.
- Add one custom semantic validator that cannot be expressed in the type system alone.
When finished, apply the "define before use" principle: could a reader who has never seen your system understand every field constraint from the schema alone?
Key Takeaways
- Underspecification has a measurable cost. Unspecified requirements produce a 22.6% average accuracy drop. Knowledge gaps appear in 44.6% of ineffective prompts vs. 12.6% of effective ones.
- Schema-first inverts the iteration loop. Define the data contract first, then write prompt prose to close only the gaps the schema cannot express.
- Field descriptions are prompt instructions. Everything in a description= is sent to the model. Write it as an instruction, not a comment.
- Validation must be semantic, not just structural. JSON-valid and schema-compliant outputs can still be semantically wrong. Use custom validators to catch domain constraint violations.
- Schema is now a protocol standard. The Model Context Protocol makes JSON Schema a first-class runtime element, not just a best practice — which means schema fluency is a durable investment, not a framework-specific skill.
Further Exploration
Foundational Guides
- How to Use Pydantic for LLMs: Schema, Validation & Prompts — Official Pydantic guide covering schema-first development and field descriptions as instructions
- Using Pydantic Models for Structured Outputs — Instructor — Implementation of validation loops and semantic constraint enforcement
Research & Empirical Evidence
- What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts — Source for the 22.6% accuracy drop figure and underspecification taxonomy
- Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution — Empirical study behind the 44.6% knowledge gap statistic
- FASTRIC: Prompt Specification Language for Verifiable LLM Interactions — Formal conformance checking via execution-trace verification
Tools & Standards
- Model Context Protocol — Architecture Overview — How JSON Schema is embedded as a protocol primitive
- Control LLM output with LangChain's structured and Pydantic output parsers — Practical walkthrough in LangChain