Exploring Ideas: A Blog on Technology, Startups, Food, and More

Native Structured Outputs: When to Skip the Framework

In my previous post I covered three Python frameworks for getting structured data from LLMs: instructor, outlines, and pydantic-ai. But there’s another option I didn’t mention: using the native structured output features built directly into OpenAI and Anthropic’s APIs.

This is an interesting tradeoff. You get vendor lock-in, but you also get features that frameworks can’t easily replicate.

OpenAI’s Structured Outputs

OpenAI launched Structured Outputs in August 2024. Unlike JSON mode (which just makes the model very likely to output valid JSON), Structured Outputs guarantees 100% schema adherence.

Here’s how it works:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class Company(BaseModel):
    name: str
    founded_year: int
    employee_count: int

completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
    ],
    response_format=Company,
)

company = completion.choices[0].message.parsed

The key difference from frameworks: this happens during token generation, not after. OpenAI constrains which tokens can be selected to ensure structural validity. If you ask for an integer, the model literally cannot generate a string.

This is faster than validation-based approaches because there are no retries. It’s also faster than unconstrained generation because tokens with no valid alternatives are automatically placed rather than generated.

Two ways to use it:

Function calling with strict: true - For tool use and multi-step workflows
response_format parameter - For final responses

Both use the same underlying constraint mechanism.

The catch:

You must set "additionalProperties": false on all objects
All object keys must be in required (no optional fields)
Only a subset of JSON schema is supported
No parallel function calls when using structured outputs

The “no optional fields” limitation is the biggest pain point. If your schema has fields that might not always be present, you’ll need to work around it (usually by making them nullable instead of optional).

Anthropic’s Structured Outputs

Anthropic launched structured outputs in November 2025, available for Claude Sonnet 4.5 and Opus 4.1.

Their approach is different. Instead of one API, they offer two modes:

1. JSON mode - for data extraction:

import anthropic
from pydantic import BaseModel

client = anthropic.Anthropic()

class Company(BaseModel):
    name: str
    founded_year: int
    employee_count: int

message = client.beta.messages.parse(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
    ],
    output_format=Company,
    betas=["structured-outputs-2025-11-13"]
)

company = message.output

2. Strict tool use - for complex workflows:

message = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    tools=[{
        "name": "extract_company",
        "description": "Extract company information",
        "input_schema": Company.model_json_schema(),
        "strict": True
    }],
    messages=[
        {"role": "user", "content": "TechVision Analytics, founded 2015, 450 employees"}
    ],
    betas=["structured-outputs-2025-11-13"]
)

The separation makes sense. JSON mode is simpler for extraction tasks. Strict tool use gives you the full tool-calling machinery when you need it.

Important limitations:

Citations don’t work with structured outputs (returns 400 error)
Extended thinking mode (Sonnet 3.7) doesn’t support forced tool calling
Still in beta as of November 2025

The Unique Feature: Prompt Caching

Here’s where native APIs pull ahead of frameworks: prompt caching.

Both OpenAI and Anthropic support caching parts of your prompt to reduce costs and latency. But Anthropic’s implementation is particularly good for structured output use cases.

As of March 2025, Claude 3.7 Sonnet automatically identifies and reuses cached content. You don’t need to manually track which segments to cache. And with the 1-hour cache TTL option, you can cache your schema definitions and system prompts for extended sessions.

The impact is real: up to 90% cost reduction and 85% latency improvement for prompts with repeated context.

Why this matters for structured outputs:

Your schema definitions are often identical across requests. Cache the schema once, and every subsequent extraction request only pays for the actual content being processed.

Here’s an example:

import anthropic

client = anthropic.Anthropic()

# First request - caches the schema and system prompt
message = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a company information extraction system...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=[{
        "name": "extract_company",
        "description": "Extract company information",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "founded_year": {"type": "integer"},
                "employee_count": {"type": "integer"}
            },
            "required": ["name", "founded_year", "employee_count"]
        },
        "strict": True,
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "First document..."}]
)

# Subsequent requests reuse the cached schema
# You only pay input tokens for "Second document..."
message2 = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are a company information extraction system...",
        "cache_control": {"type": "ephemeral"}
    }],
    tools=[{
        "name": "extract_company",
        "description": "Extract company information",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "founded_year": {"type": "integer"},
                "employee_count": {"type": "integer"}
            },
            "required": ["name", "founded_year", "employee_count"]
        },
        "strict": True,
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Second document..."}]
)

For batch processing hundreds or thousands of documents with the same schema, this adds up fast.

Token-Efficient Tool Use

Anthropic also introduced token-efficient tools in February 2025. When enabled, Claude calls tools using 70% fewer output tokens (14% reduction on average in practice).

This is particularly useful for complex schemas with many fields. The model generates more compact tool calls without sacrificing accuracy.

Enable it with the beta header token-efficient-tools-2025-02-19.

When to Use Native APIs

Choose OpenAI Structured Outputs if:

You’re using GPT-4o or GPT-4o-mini exclusively
You need guaranteed schema compliance with zero retries
Your schemas don’t have optional fields
You want the fastest possible generation
You’re building production systems and want vendor support

Choose Anthropic Structured Outputs if:

You’re using Claude Sonnet 4.5 or Opus 4.1
You’re processing batches with repeated schema/context (caching wins)
You need both simple extraction and complex tool use
You want token-efficient tool calling
You don’t need citations or extended thinking mode

Choose a framework (instructor/outlines/pydantic-ai) if:

You need multi-provider support
You want to avoid vendor lock-in
Your schemas have optional fields (OpenAI limitation)
You’re prototyping and want flexibility
You need features like streaming partial models

My Take

I still mostly use pydantic-ai.

That probably sounds weird after explaining all the performance benefits of native structured outputs but using the lower-level model APIs is an optimization that shouldn’t be made prematurely in my view.

Validation-based approaches like instructor and pydantic-ai let you put complex custom logic in the validation loop. That’s something pure structured generation approaches can’t do.

Here’s an example: Let’s say you’re extracting financial data from research papers. The schema is simple: company name, revenue, year. But the validation logic is not:

from pydantic import BaseModel, field_validator
from pydantic_ai import Agent

class FinancialData(BaseModel):
    company_name: str
    revenue_millions: float
    year: int

    @field_validator('revenue_millions')
    def validate_revenue(cls, v, info):
        # Check if revenue is reasonable for the year
        year = info.data.get('year')
        if year and year < 2000 and v > 100000:
            raise ValueError(
                f"Revenue of ${v}M seems unrealistic for {year}. "
                f"Check if units are correct (millions vs billions)."
            )
        return v

    @field_validator('year')
    def validate_year(cls, v):
        if v < 1800 or v > 2025:
            raise ValueError(f"Year {v} is outside reasonable range")
        return v

agent = Agent(
    "openai:gpt-4o-mini",
    result_type=FinancialData,
    retries=3
)

When validation fails, pydantic-ai sends the error message back to the model and asks it to try again. The model sees “Revenue of $150000M seems unrealistic for 1995. Check if units are correct” and can correct its mistake.

With native structured outputs, you get valid JSON that matches your schema, but you can’t inject this kind of domain-specific validation logic. The model could generate {"revenue_millions": 150000, "year": 1995} and it would pass structural validation even though it’s obviously wrong.

You could add these checks after extraction, but then you lose the retry loop. You’d have to handle the error yourself, craft a new prompt, and call the API again. That’s exactly what pydantic-ai does automatically.

But there’s a deeper point here about premature optimization. The native APIs are faster and cheaper, but they’re also more constraining. OpenAI doesn’t support optional fields. Anthropic’s structured outputs are still in beta. Both lock you into a specific provider.

Starting with pydantic-ai means:

I can switch providers if pricing changes or a better model comes out
I can add complex validation logic as I discover edge cases
I can prototype quickly without worrying about schema limitations
I’m not debugging beta features in production

When the application proves itself and hits real scale, that’s when I optimize. Profile first, then optimize the bottleneck. Don’t start by optimizing for hypothetical scale.

For most applications, the flexibility and developer experience of validation-based frameworks is worth more than the performance gains of native structured outputs. When you’re processing millions of requests and cost/latency actually matter, the tradeoff flips.

Start with pydantic-ai. Optimize to native APIs when you have a real problem to solve.

Sources:

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.