
Native Structured Outputs: When to Skip the Framework
In my previous post I covered three Python frameworks for getting structured data from LLMs: instructor, outlines, and pydantic-ai. But there’s another option I didn’t mention: using the native structured output features built directly into OpenAI and Anthropic’s APIs.
This is an interesting tradeoff. You get vendor lock-in, but you also get features that frameworks can’t easily replicate.
OpenAI’s Structured Outputs
OpenAI launched Structured Outputs in August 2024. Unlike JSON mode (which just makes the model very likely to output valid JSON), Structured Outputs guarantees 100% schema adherence.
Here’s how it works:
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class Company(BaseModel):
name: str
founded_year: int
employee_count: int
completion = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
],
response_format=Company,
)
company = completion.choices[0].message.parsedThe key difference from frameworks: this happens during token generation, not after. OpenAI constrains which tokens can be selected to ensure structural validity. If you ask for an integer, the model literally cannot generate a string.
This is faster than validation-based approaches because there are no retries. It’s also faster than unconstrained generation because tokens with no valid alternatives are automatically placed rather than generated.
Two ways to use it:
- Function calling with
strict: true- For tool use and multi-step workflows response_formatparameter - For final responses
Both use the same underlying constraint mechanism.
The catch:
- You must set
"additionalProperties": falseon all objects - All object keys must be in
required(no optional fields) - Only a subset of JSON schema is supported
- No parallel function calls when using structured outputs
The “no optional fields” limitation is the biggest pain point. If your schema has fields that might not always be present, you’ll need to work around it (usually by making them nullable instead of optional).
Anthropic’s Structured Outputs
Anthropic launched structured outputs in November 2025, available for Claude Sonnet 4.5 and Opus 4.1.
Their approach is different. Instead of one API, they offer two modes:
1. JSON mode - for data extraction:
import anthropic
from pydantic import BaseModel
client = anthropic.Anthropic()
class Company(BaseModel):
name: str
founded_year: int
employee_count: int
message = client.beta.messages.parse(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[
{"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
],
output_format=Company,
betas=["structured-outputs-2025-11-13"]
)
company = message.output2. Strict tool use - for complex workflows:
message = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
tools=[{
"name": "extract_company",
"description": "Extract company information",
"input_schema": Company.model_json_schema(),
"strict": True
}],
messages=[
{"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
],
betas=["structured-outputs-2025-11-13"]
)The separation makes sense. JSON mode is simpler for extraction tasks. Strict tool use gives you the full tool-calling machinery when you need it.
Important limitations:
- Citations don’t work with structured outputs (returns 400 error)
- Extended thinking mode (Sonnet 3.7) doesn’t support forced tool calling
- Still in beta as of November 2025
The Unique Feature: Prompt Caching
Here’s where native APIs pull ahead of frameworks: prompt caching.
Both OpenAI and Anthropic support caching parts of your prompt to reduce costs and latency. But Anthropic’s implementation is particularly good for structured output use cases.
As of March 2025, Claude 3.7 Sonnet automatically identifies and reuses cached content. You don’t need to manually track which segments to cache. And with the 1-hour cache TTL option, you can cache your schema definitions and system prompts for extended sessions.
The impact is real: up to 90% cost reduction and 85% latency improvement for prompts with repeated context.
Why this matters for structured outputs:
Your schema definitions are often identical across requests. Cache the schema once, and every subsequent extraction request only pays for the actual content being processed.
Here’s an example:
import anthropic
client = anthropic.Anthropic()
# First request - caches the schema and system prompt
message = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a company information extraction system...",
"cache_control": {"type": "ephemeral"}
}
],
tools=[{
"name": "extract_company",
"description": "Extract company information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"founded_year": {"type": "integer"},
"employee_count": {"type": "integer"}
},
"required": ["name", "founded_year", "employee_count"]
},
"strict": True,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "First document..."}]
)
# Subsequent requests reuse the cached schema
# You only pay input tokens for "Second document..."
message2 = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=[{
"type": "text",
"text": "You are a company information extraction system...",
"cache_control": {"type": "ephemeral"}
}],
tools=[{
"name": "extract_company",
"description": "Extract company information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"founded_year": {"type": "integer"},
"employee_count": {"type": "integer"}
},
"required": ["name", "founded_year", "employee_count"]
},
"strict": True,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "Second document..."}]
)For batch processing hundreds or thousands of documents with the same schema, this adds up fast.
Token-Efficient Tool Use
Anthropic also introduced token-efficient tools in February 2025. When enabled, Claude calls tools using 70% fewer output tokens (14% reduction on average in practice).
This is particularly useful for complex schemas with many fields. The model generates more compact tool calls without sacrificing accuracy.
Enable it with the beta header token-efficient-tools-2025-02-19.
When to Use Native APIs
Choose OpenAI Structured Outputs if:
- You’re using GPT-4o or GPT-4o-mini exclusively
- You need guaranteed schema compliance with zero retries
- Your schemas don’t have optional fields
- You want the fastest possible generation
- You’re building production systems and want vendor support
Choose Anthropic Structured Outputs if:
- You’re using Claude Sonnet 4.5 or Opus 4.1
- You’re processing batches with repeated schema/context (caching wins)
- You need both simple extraction and complex tool use
- You want token-efficient tool calling
- You don’t need citations or extended thinking mode
Choose a framework (instructor/outlines/pydantic-ai) if:
- You need multi-provider support
- You want to avoid vendor lock-in
- Your schemas have optional fields (OpenAI limitation)
- You’re prototyping and want flexibility
- You need features like streaming partial models
My Take
I still mostly use pydantic-ai.
That probably sounds weird after explaining all the performance benefits of native structured outputs but using the lower-level model APIs is an optimization that shouldn’t be made prematurely in my view.
Validation-based approaches like instructor and pydantic-ai let you put complex custom logic in the validation loop. That’s something pure structured generation approaches can’t do.
Here’s an example: Let’s say you’re extracting financial data from research papers. The schema is simple: company name, revenue, year. But the validation logic is not:
from pydantic import BaseModel, field_validator
from pydantic_ai import Agent
class FinancialData(BaseModel):
company_name: str
revenue_millions: float
year: int
@field_validator('revenue_millions')
def validate_revenue(cls, v, info):
# Check if revenue is reasonable for the year
year = info.data.get('year')
if year and year < 2000 and v > 100000:
raise ValueError(
f"Revenue of ${v}M seems unrealistic for {year}. "
f"Check if units are correct (millions vs billions)."
)
return v
@field_validator('year')
def validate_year(cls, v):
if v < 1800 or v > 2025:
raise ValueError(f"Year {v} is outside reasonable range")
return v
agent = Agent(
"openai:gpt-4o-mini",
result_type=FinancialData,
retries=3
)When validation fails, pydantic-ai sends the error message back to the model and asks it to try again. The model sees “Revenue of $150000M seems unrealistic for 1995. Check if units are correct” and can correct its mistake.
With native structured outputs, you get valid JSON that matches your schema, but you can’t inject this kind of domain-specific validation logic. The model could generate {"revenue_millions": 150000, "year": 1995} and it would pass structural validation even though it’s obviously wrong.
You could add these checks after extraction, but then you lose the retry loop. You’d have to handle the error yourself, craft a new prompt, and call the API again. That’s exactly what pydantic-ai does automatically.
But there’s a deeper point here about premature optimization. The native APIs are faster and cheaper, but they’re also more constraining. OpenAI doesn’t support optional fields. Anthropic’s structured outputs are still in beta. Both lock you into a specific provider.
Starting with pydantic-ai means:
- I can switch providers if pricing changes or a better model comes out
- I can add complex validation logic as I discover edge cases
- I can prototype quickly without worrying about schema limitations
- I’m not debugging beta features in production
When the application proves itself and hits real scale, that’s when I optimize. Profile first, then optimize the bottleneck. Don’t start by optimizing for hypothetical scale.
For most applications, the flexibility and developer experience of validation-based frameworks is worth more than the performance gains of native structured outputs. When you’re processing millions of requests and cost/latency actually matter, the tradeoff flips.
Start with pydantic-ai. Optimize to native APIs when you have a real problem to solve.
Sources:
- Introducing Structured Outputs in the API | OpenAI
- Structured model outputs - OpenAI API
- Structured outputs - Claude Docs
- Prompt caching with Claude | Anthropic
- Token-saving updates on the Anthropic API
- OpenAI’s structured output vs. instructor and outlines
- When should I use function calling, structured outputs or JSON mode?
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.