MCP Sampling and Elicitation: The Features That Make Servers Smart

Most MCP servers are passive. A client calls a tool, the server does something, and returns a result. But two features in the MCP spec flip that relationship: sampling lets a server ask the client’s LLM to think about something, and elicitation lets a server ask the human user for input. Together, they turn MCP servers from simple tool providers into something much more interesting.

The Standard MCP Flow

In a typical MCP interaction, the flow is one-directional. The AI model decides to use a tool, the client calls the MCP server, and the server returns a result:

sequenceDiagram
    participant User
    participant Client as AI Client
    participant LLM as LLM
    participant Server as MCP Server

    User->>Client: "What's the weather?"
    Client->>LLM: User message + available tools
    LLM->>Client: Call get_weather(city="Atlanta")
    Client->>Server: Execute tool
    Server->>Client: Result: 72°F, sunny
    Client->>LLM: Tool result
    LLM->>Client: "It's 72°F and sunny in Atlanta"
    Client->>User: Display response

The server here is purely reactive. It gets called, does its thing, and returns data. It has no ability to reason, ask questions, or interact with the user. Sampling and elicitation change that.

Sampling: Borrowing the Client’s Brain

Sampling lets an MCP server send a prompt to the client’s LLM and get a completion back. The server doesn’t need its own API keys or model access. It just says “hey client, can you ask your LLM this for me?” and the client handles it.

sequenceDiagram
    participant Client as AI Client
    participant LLM as LLM
    participant Server as MCP Server

    Client->>Server: Call tool
    Server->>Client: sampling/createMessage
    Note right of Server: "Analyze this code for bugs"
    Client->>LLM: Server's prompt
    LLM->>Client: Analysis result
    Client->>Server: Return completion
    Server->>Client: Tool result (using LLM analysis)

This is powerful because the MCP server can now use LLM reasoning as part of its own logic, without needing to be an AI application itself.

The Protocol

Under the hood, the server sends a sampling/createMessage request:

{
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "Analyze this code for security vulnerabilities:\n\ndef login(user, password):\n    query = f'SELECT * FROM users WHERE name={user}'\n    ..."
        }
      }
    ],
    "systemPrompt": "You are a security analyst. Be specific and actionable.",
    "maxTokens": 2048,
    "modelPreferences": {
      "hints": [{ "name": "claude-sonnet" }],
      "intelligencePriority": 0.8,
      "speedPriority": 0.5
    }
  }
}

The client receives this, optionally shows it to the user for approval (human-in-the-loop), runs it through its LLM, and returns the completion. The modelPreferences are hints, not commands. The client maps them to whatever model it has available.

Using Sampling in FastMCP

If you’re building MCP servers with FastMCP, sampling is straightforward. The Context object gives you a sample() method:

from fastmcp import FastMCP, Context

mcp = FastMCP("my-server")

@mcp.tool
async def analyze_code(code: str, ctx: Context) -> str:
    """Analyze code for potential issues."""
    result = await ctx.sample(
        messages=f"Review this code for bugs and security issues:\n\n{code}",
        system_prompt="You are a senior code reviewer. Be concise.",
        max_tokens=2048,
    )
    return result.text

That’s it. No API keys, no model configuration. The client’s LLM does the thinking.

For more complex workflows, you can give the sampled LLM its own tools to call:

def search_docs(query: str) -> str:
    """Search internal documentation."""
    # ... search logic ...
    return results

@mcp.tool
async def research(question: str, ctx: Context) -> str:
    """Research a question using docs and LLM reasoning."""
    result = await ctx.sample(
        messages=question,
        system_prompt="Answer the question using the search tool when needed.",
        tools=[search_docs],
        max_tokens=4096,
    )
    return result.text

Here the sampled LLM can call search_docs as part of its reasoning loop. FastMCP handles the tool execution automatically.

Structured Output

You can also ask the LLM for structured responses using Pydantic models:

from pydantic import BaseModel

class RiskAssessment(BaseModel):
    risk_level: str
    description: str
    recommendations: list[str]

@mcp.tool
async def assess_risk(config: str, ctx: Context) -> RiskAssessment:
    """Assess security risk of a configuration."""
    result = await ctx.sample(
        messages=f"Assess the security risk of:\n\n{config}",
        result_type=RiskAssessment,
    )
    return result.result  # Validated Pydantic object

FastMCP handles the JSON schema generation, prompt engineering for structured output, and response validation.

Elicitation: Asking the Human

Elicitation is the complement to sampling. Where sampling asks the LLM, elicitation asks the human user. The server sends a structured request to the client, the client shows the user a form or prompt, and the user’s response comes back as validated data.

This was introduced in the 2025-06-18 MCP spec revision, with URL-mode added in 2025-11-25.

Form Mode

The most common mode. The server defines a JSON Schema for the data it needs, and the client renders an appropriate input form:

from fastmcp import FastMCP, Context

mcp = FastMCP("my-server")

@mcp.tool
async def deploy(ctx: Context) -> str:
    """Deploy the application to production."""
    # Ask user to confirm and provide details
    result = await ctx.elicit(
        "Please confirm deployment details",
        response_type=DeployConfig,
    )

    if result.action == "accept":
        config = result.data
        return f"Deploying {config.version} to {config.environment}..."
    return "Deployment cancelled."

Where DeployConfig is a Pydantic model or dataclass:

from dataclasses import dataclass
from typing import Literal

@dataclass
class DeployConfig:
    version: str
    environment: Literal["staging", "production"]
    run_migrations: bool

The client renders this as a form with appropriate input types: a text field for version, a dropdown for environment, and a checkbox for migrations. The response comes back typed and validated.

You can also use simple Python types for quick questions:

# Yes/no confirmation
result = await ctx.elicit("Deploy to production?", response_type=bool)

# Pick from a list
result = await ctx.elicit(
    "Which environment?",
    response_type=["staging", "production", "development"],
)

# Free text
result = await ctx.elicit("What's the release name?", response_type=str)

The Three Response Actions

Elicitation responses always include an action field:

  • accept: User submitted data. The data field contains their response.
  • decline: User explicitly said no.
  • cancel: User dismissed the prompt without choosing.

Handle all three:

match result:
    case AcceptedElicitation(data=config):
        return f"Proceeding with {config}"
    case DeclinedElicitation():
        return "You declined. No action taken."
    case CancelledElicitation():
        return "Cancelled."

URL Mode

For sensitive data like API keys, passwords, or OAuth flows, form mode isn’t appropriate since data passes through the client. URL mode lets the server direct the user to an external URL where they interact directly with the server:

{
  "method": "elicitation/create",
  "params": {
    "mode": "url",
    "url": "https://myserver.example.com/auth/setup",
    "message": "Please provide your API key to continue."
  }
}

The client shows the user the URL and message. The user navigates there, completes the flow, and the server sends a notifications/elicitation/complete notification when done. The sensitive data never touches the client.

Sampling vs. Elicitation: When to Use Which

ScenarioUse
Need AI reasoning or analysisSampling
Need user confirmation or approvalElicitation
Need structured data from the userElicitation
Need to generate text, summaries, or codeSampling
Need sensitive credentialsElicitation (URL mode)
Need to classify, categorize, or assessSampling
Need to choose between optionsElicitation

They also compose well. A server might use elicitation to ask the user which files to analyze, then sampling to have the LLM do the analysis.

Client Support: The Current Reality

Here’s the part that matters for deciding whether to build on these features today:

ClientSamplingElicitation
VS Code (Copilot)SupportedSupported
Visual StudioSupportedSupported
JetBrains (Copilot)SupportedSupported
CursorNot supportedSupported
Claude DesktopNot supportedNot supported
Claude CodeNot supportedNot supported

The irony is that Anthropic created MCP but their own clients don’t yet support these features. Both have highly-upvoted open issues on GitHub. VS Code with GitHub Copilot currently has the most complete implementation.

For servers that need sampling but want to work with clients that don’t support it, FastMCP provides a fallback handler:

from fastmcp.client.sampling.handlers.anthropic import AnthropicSamplingHandler

server = FastMCP(
    "my-server",
    sampling_handler=AnthropicSamplingHandler(
        default_model="claude-sonnet-4-5-20250929"
    ),
    sampling_handler_behavior="fallback",
)

This uses a direct API call when the client can’t handle sampling itself. The tradeoff is that you now need an API key on the server side, which partly defeats the purpose. But it’s a practical bridge until client support catches up.

Practical Patterns

Pattern 1: Discover-Then-Analyze

Use sampling twice: once to gather information, once to reason about it.

@mcp.tool
async def audit(ctx: Context) -> str:
    """Audit the current environment."""
    # Step 1: Discover
    inventory = await ctx.sample(
        messages="List all tools and capabilities you have access to.",
        system_prompt="Catalog everything thoroughly.",
        max_tokens=8192,
    )

    # Step 2: Analyze
    analysis = await ctx.sample(
        messages=f"Given this tool inventory:\n{inventory.text}\n\nIdentify risks.",
        system_prompt="You are a security analyst.",
        max_tokens=4096,
    )

    return analysis.text

Pattern 2: Elicit-Then-Execute

Ask the user what they want, then do it.

@mcp.tool
async def batch_process(ctx: Context) -> str:
    """Process files with user-specified options."""
    options = await ctx.elicit(
        "Configure batch processing",
        response_type=BatchConfig,
    )
    if options.action != "accept":
        return "Cancelled."

    # Now do the work with validated config
    return process_files(options.data)

Pattern 3: Confirm Before Acting

Use elicitation as a gate before destructive operations.

@mcp.tool
async def delete_records(table: str, ctx: Context) -> str:
    """Delete records from a database table."""
    confirm = await ctx.elicit(
        f"This will delete all records from '{table}'. Are you sure?",
        response_type=bool,
    )
    if confirm.action == "accept" and confirm.data:
        return do_delete(table)
    return "Aborted."

Looking Forward

Sampling and elicitation are what turn MCP from a tool-calling protocol into something that enables genuinely intelligent server-side behavior. A server that can reason (via sampling) and interact with the user (via elicitation) can handle complex, multi-step workflows that simple request-response tools can’t.

The main bottleneck right now is client support. As more clients implement these features, expect to see MCP servers that are less like API endpoints and more like specialized agents that happen to run as plugins.

If you’re building MCP servers today, I’d recommend designing with sampling and elicitation in mind even if your target client doesn’t support them yet. The spec is stable, FastMCP makes the API clean, and client support is expanding. When it lands, you’ll be ready.