Most production LLM bugs are not "the model is dumb" — they're "the model returned {"price": "twelve dollars"} when downstream code expected a number." This page covers the four lines of defense: native function calling per provider, JSON-Schema-constrained outputs, Pydantic-typed wrappers (instructor), and constrained decoding (Outlines, vLLM grammar mode) when you really cannot tolerate a parse failure.
Three failure modes appear with naked "respond in JSON" prompts:
```json\n{...}\n``` and your parser explodes.{"a": 1}\n\nNote that this assumes... — perfectly readable, completely unparsable."42" instead of 42.Native function-calling APIs solve the first two by construction (the API never returns prose around a tool call). Strict schema modes (OpenAI Structured Outputs, Gemini's response_schema with strict mode) and constrained decoding solve the third.
OpenAI's tools field with strict: true guarantees the response matches your JSON Schema (subject to schema constraints — no $ref, all properties required and listed in required).
from openai import OpenAI
client = OpenAI()
extract_invoice = {
"type": "function",
"function": {
"name": "extract_invoice",
"description": "Extract structured fields from an invoice document.",
"strict": True,
"parameters": {
"type": "object",
"additionalProperties": False,
"required": ["vendor", "invoice_number", "total_cents", "currency", "line_items"],
"properties": {
"vendor": {"type": "string"},
"invoice_number": {"type": "string"},
"total_cents": {"type": "integer"},
"currency": {"type": "string", "enum": ["USD", "EUR", "GBP"]},
"line_items": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": False,
"required": ["description", "quantity", "unit_cents"],
"properties": {
"description": {"type": "string"},
"quantity": {"type": "integer"},
"unit_cents": {"type": "integer"},
},
},
},
},
},
},
}
resp = client.chat.completions.create(
model="gpt-4o-2024-08-06",
tools=[extract_invoice],
tool_choice={"type": "function", "function": {"name": "extract_invoice"}},
messages=[{"role": "user", "content": INVOICE_TEXT}],
)
import json
data = json.loads(resp.choices[0].message.tool_calls[0].function.arguments)
OpenAI also exposes response_format={"type": "json_schema", "json_schema": {...}} for the same guarantees without a tool wrapper.
Anthropic enforces the schema strictly when you set tool_choice={"type": "tool", "name": "..."} — the model must respond with that tool call.
import anthropic, json
client = anthropic.Anthropic()
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=[{
"name": "extract_invoice",
"description": "Extract structured invoice fields.",
"input_schema": {
"type": "object",
"required": ["vendor", "invoice_number", "total_cents", "currency"],
"properties": {
"vendor": {"type": "string"},
"invoice_number": {"type": "string"},
"total_cents": {"type": "integer", "minimum": 0},
"currency": {"type": "string", "enum": ["USD", "EUR", "GBP"]},
},
},
}],
tool_choice={"type": "tool", "name": "extract_invoice"},
messages=[{"role": "user", "content": INVOICE_TEXT}],
)
data = next(b.input for b in resp.content if b.type == "tool_use")
Bedrock's Converse API normalizes tool use across Claude, Llama, Mistral, Cohere, and Nova — same shape, different modelId.
import boto3
bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")
resp = bedrock.converse(
modelId="anthropic.claude-opus-4-7",
messages=[{"role": "user", "content": [{"text": INVOICE_TEXT}]}],
toolConfig={
"tools": [{
"toolSpec": {
"name": "extract_invoice",
"description": "Extract structured invoice fields.",
"inputSchema": {"json": {
"type": "object",
"required": ["vendor", "total_cents"],
"properties": {
"vendor": {"type": "string"},
"total_cents": {"type": "integer"},
},
}},
}
}],
"toolChoice": {"tool": {"name": "extract_invoice"}},
},
)
for block in resp["output"]["message"]["content"]:
if "toolUse" in block:
data = block["toolUse"]["input"]
Gemini supports both function declarations and a stricter response_schema with response_mime_type="application/json".
from google import genai
from google.genai import types
client = genai.Client()
config = types.GenerateContentConfig(
response_mime_type="application/json",
response_schema={
"type": "OBJECT",
"required": ["vendor", "total_cents"],
"properties": {
"vendor": {"type": "STRING"},
"total_cents": {"type": "INTEGER"},
},
},
)
resp = client.models.generate_content(
model="gemini-2.5-pro",
contents=INVOICE_TEXT,
config=config,
)
import json
data = json.loads(resp.text)
Writing JSON Schema by hand is tedious and error-prone. instructor patches the OpenAI / Anthropic / Bedrock / Gemini clients so you can pass a Pydantic model as response_model and get a typed object back.
pip install instructor pydantic anthropic openai
import instructor, anthropic
from pydantic import BaseModel, Field
from typing import Literal
class LineItem(BaseModel):
description: str
quantity: int = Field(ge=1)
unit_cents: int = Field(ge=0)
class Invoice(BaseModel):
vendor: str
invoice_number: str
total_cents: int = Field(ge=0)
currency: Literal["USD", "EUR", "GBP"]
line_items: list[LineItem]
client = instructor.from_anthropic(anthropic.Anthropic())
invoice: Invoice = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
response_model=Invoice,
messages=[{"role": "user", "content": INVOICE_TEXT}],
)
# instructor validates with Pydantic; on ValidationError it automatically
# re-prompts the model with the validation errors as context, up to max_retries.
print(invoice.vendor, invoice.total_cents)
The retry-with-validation-errors loop is the killer feature — Pydantic's error messages are good enough that the model usually fixes its mistake on the first retry.
For self-hosted models, you can guarantee schema compliance at the token level: at each decoding step, mask out tokens that would make the partial output invalid against a regex, JSON Schema, or context-free grammar. This is "structured generation" or "constrained decoding."
Outlines is the most popular library; it works with Hugging Face, vLLM, and llama.cpp backends.
import outlines
from pydantic import BaseModel
class Invoice(BaseModel):
vendor: str
total_cents: int
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
generator = outlines.generate.json(model, Invoice)
invoice = generator(INVOICE_TEXT) # always a valid Invoice — guaranteed by the decoder
vLLM exposes the same capability via its OpenAI-compatible API: pass guided_json, guided_regex, or guided_grammar in extra_body.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
schema = {
"type": "object",
"required": ["vendor", "total_cents"],
"properties": {
"vendor": {"type": "string"},
"total_cents": {"type": "integer"},
},
}
resp = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": INVOICE_TEXT}],
extra_body={"guided_json": schema, "guided_decoding_backend": "outlines"},
)
Constrained decoding eliminates parse errors but it does not eliminate semantic errors — the model can still produce a perfectly-shaped JSON with the wrong vendor name. Always combine with evals.
When you cannot use a strict mode (some providers, older models, free-form responses with embedded JSON), recover instead of crashing:
import json, re
from json_repair import repair_json # pip install json_repair
def extract_json(text: str) -> dict:
# 1. Strip markdown fences.
text = re.sub(r"^```(?:json)?\s*|\s*```$", "", text.strip(), flags=re.M)
# 2. Try strict parse.
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# 3. Try the largest balanced {...} substring.
match = re.search(r"\{.*\}", text, re.DOTALL)
if match:
try:
return json.loads(match.group(0))
except json.JSONDecodeError:
# 4. Repair (handles trailing commas, single quotes, missing braces).
return json.loads(repair_json(match.group(0)))
raise ValueError("no JSON object found")
ValidationError, send the error message back as a user turn ("Your previous response failed validation: error. Return only valid JSON matching the schema."). Two retries catches almost everything.gpt-4o-2024-08-06, claude-opus-4-7. Schema strictness changes between snapshots; do not let it surprise you in production.temperature to 0 or 0.1 for extraction tasks. There is no creative upside.tool_choice set to a specific tool name removes the "model decides whether to call" branch.enum: [...], do it.Strict mode (OpenAI's strict: true, Anthropic's tool-use schema validation, Gemini's response_schema) constrains the decoder so the next token is always one that keeps the output a valid prefix of the JSON schema. The model literally cannot emit an invalid character. The cost is a slight latency hit from the constrained-decoding mask, plus a one-time schema-compilation cost cached server-side. Without strict mode you get "JSON mode" which guarantees valid JSON but not adherence to your schema — missing fields, wrong types, and extra fields all sneak through.
Strict mode supports a subset of JSON Schema — no oneOf, no $ref in some providers, no pattern regex on strings, all properties must be required (you mark optionality with type: ["string", "null"] instead). Deeply-nested or recursive schemas may fail to compile or hurt accuracy. So you flatten and normalize schemas to fit, and for genuinely complex shapes you fall back to JSON mode plus Pydantic validation with a repair loop.
Three layers. First, try a permissive parser like json5 or dirty-json — that fixes trailing commas, single quotes, unquoted keys. Second, if Pydantic validation still fails, feed the validation error back to the model as a tool-result-style message: "your previous response failed validation: {error}. respond again with valid output." One round-trip fixes most cases. Third, if that fails twice, fall back to a smaller, cheaper model with strict mode and a simplified schema, or surface the error to the caller. Never silently drop fields — that's how data corruption ships.
Instructor is a thin wrapper that gives you Pydantic-typed responses across providers with auto-retry on validation failures — a good default for application code where you want types and don't want to hand-write the repair loop. Outlines goes deeper: it does constrained decoding locally for open-source models (Llama, Mistral) where you don't have a server-side strict mode, using regex/CFG to mask the logits at each step. Reach for Outlines when you're self-hosting and need strict-mode-equivalent guarantees on a model that doesn't ship one. For frontier models, native strict mode + Instructor is usually enough.
Keep it shallow — two levels of nesting is the sweet spot, three is the limit. Use enums everywhere a field has bounded values; "status: pending|approved|rejected" is way more reliable than a free-form string. Add docstrings (the description field) on every property — the model reads them and uses them as in-context guidance. Mark every property required in strict mode and use nullable types for optionality. Avoid additionalProperties: true; the model fills it with garbage. Test the schema with 20 representative inputs before shipping — the failures show you which fields need clearer descriptions.
Validation pass-rate is the floor, not the ceiling — a response can be schema-valid and semantically wrong. I keep a labeled fixture set of (input, expected_structured_output) and score with field-level precision/recall: did extracted_amount match? Did parties contain the right names? For free-text fields inside the structure I use LLM-as-judge with a rubric. The validation failures themselves are gold — I dump them into the eval set so the next prompt iteration has to handle them. CI fails the build if pass-rate or field-F1 drops below baseline.