Governance in AI Agent Security
Two papers dropped in the last month that should make anyone deploying AI agents nervous. Not because they reveal new attacks, but because they prove our defenses don’t work.
Simon Willison summarized both, but the governance implications deserve more attention.
The Bad News
“The Attacker Moves Second” from researchers at OpenAI, Anthropic, and Google DeepMind tested 12 published prompt injection defenses that claimed near-zero vulnerability rates. Adaptive attacks broke them with 90%+ success rates. Human red-teamers: 100% success.
We’ve been testing defenses against static attack strings instead of adaptive adversaries. It’s like testing a lock by checking if it’s locked, not whether someone can pick it.
The Rule of Two
Meta’s “Agents Rule of Two” framework says an agent should satisfy no more than two of these three properties:
- [A] Processing untrustworthy inputs
- [B] Access to sensitive systems or private data
- [C] Ability to change state or communicate externally
An email agent with all three [ABC] is vulnerable: a malicious email can tell it to exfiltrate inbox data. Break one property and the attack fails:
- [BC]: Only process trusted inputs (filter senders)
- [AC]: Sandbox without sensitive data access
- [AB]: Require human approval before actions
This works, but it’s architectural defense, not model-level security. We’re admitting we can’t solve prompt injection through better prompts.
The Governance Problem
The Rule of Two is a good framework, but frameworks don’t enforce themselves. Most organizations don’t have visibility into what their agents can do. You might know you’ve enabled MCP servers for email, calendar, and file access. But do you know which configuration profile each agent satisfies? Can you detect when an agent crosses from [AB] into [ABC]?
MCP lets you give agents access to email, databases, CRMs, codebases, Slack, cloud infrastructure. Powerful, but terrifying without governance.
Here’s what enterprise MCP governance needs:
1. Registry with capability metadata. Not “this is an email server,” but “this tool reads untrusted content [A], accesses private data [B], sends external communications [C].”
2. Track which servers are enabled per agent. Know that your customer support agent has Salesforce [B], email [A+C], and web search [A] enabled.
3. Detect the lethal trifecta in real-time. When an agent enters [ABC] territory, trigger controls: require human approval, restrict to read-only, alert security.
4. Audit trails. Show exactly which agents had which capabilities at which times.
5. Policy enforcement. Make policies like “no [ABC] without VP approval” enforceable, not just documented.
Gateway Architecture
The practical implementation is a gateway MCP server. Agents connect only to the gateway, which proxies requests to actual MCP servers based on permissions and policies.
This gives you:
Centralized visibility. Every tool call flows through the gateway. Complete audit trail, no shadow IT.
Runtime policy enforcement. The gateway evaluates each call in real-time. Agent with [ABC] capabilities trying to write externally? Require human approval. Agent without database permissions querying your CRM? Block it.
Tool-level capability classification. The gateway knows “this specific tool reads untrusted content [A]” and “this one sends external emails [C].” It detects lethal trifecta combinations.
Dynamic policy updates. Lock down all agents during a security incident by updating the gateway once.
Gradual rollout. Security vets new MCP servers once, then controls which teams get access through policy.
The tradeoff is latency. Every call goes through an extra hop. For most use cases, the visibility is worth it. For latency-critical apps, the gateway can issue time-limited tokens for direct access while tracking authorizations.
One complication: AI coding tools like Cursor, Claude Code, and GitHub Copilot ship with native capabilities. Bash, web search, file access aren’t MCP servers you can proxy. The gateway only governs MCP servers. Native tools are baseline risk you accept by choosing that client.
But you can still apply the Rule of Two. If the client already has bash and web search [A+C], any MCP server providing sensitive data access [B] creates the lethal trifecta. The gateway needs to account for native capabilities when enforcing policies. Alternatively you can provide server-based alternatives and lock down the access to the native tools.
What’s Available Now
Several companies and open source projects are already building MCP governance tooling:
Commercial Solutions:
Zenity - Full-featured MCP security platform with automatic discovery of all MCP servers, mapping to agents and owners, real-time detection of anomalous usage, complete audit logs with step-level context, and policy enforcement with allowlists. Covers all five governance requirements.
Reco - Real-time visibility into tool access and usage, correlates identities to access actions, detects prompt injection and abnormal behaviors, provides complete audit trails of MCP activity, and enables policy-as-code for time-bound and role-specific limits.
Prompt Security (acquired by SentinelOne) - MCP Gateway with endpoint-level monitoring, dynamic risk scoring of 13,000+ MCP servers, real-time protection and automatic enforcement, and full audit trails. Includes MCP Risk Assessment for continuous threat identification.
Harmonic Security - Locally installed gateway that intercepts all MCP traffic for discovery, enforces granular policies to block risky actions, and applies sensitive data models to prevent exfiltration.
Portkey AI - MCP hub with centralized authentication and authorization, input filtering and output sanitization, detailed logging of tool invocations with context, and policy checks across all servers.
Airia - MCP Gateway with automatic application of enterprise security policies, complete audit logging of agent interactions, granular permission controls, and real-time observability into tool usage.
Open Source Projects:
Lasso Security MCP Gateway - Plugin-based gateway with real-time threat mitigation for prompt injection and data leakage, comprehensive tracing and logging, security scanner for server reputation analysis, and flexible natural language policies. Includes guardrail plugins for token masking and PII detection.
Microsoft MCP Gateway - Kubernetes-focused reverse proxy with OAuth 2.0 and Azure Entra ID authentication, RBAC for adapter and tool-level permissions, operational visibility through status and log endpoints.
Docker MCP Gateway - Open source gateway built into Docker Desktop with signature verification for MCP container images and secret scanning on inbound/outbound payloads.
None explicitly implement the [A], [B], [C] framework from Meta’s Rule of Two yet, but several have the infrastructure to support it. The ecosystem is young but moving quickly.
What You Can Do Today
If you’re deploying AI agents with MCP:
1. Audit agent configurations. List which MCP servers each agent has and which [A], [B], [C] properties each satisfies. If you find [ABC] combinations, add human approval or sandbox a capability.
2. Build a simple registry. Even a spreadsheet tracking agents, servers, and capabilities shows you where your exposure is.
3. Assume defenses will fail. Don’t rely on prompt injection defenses as primary security. Architect so a successful injection can’t do catastrophic damage.
4. Require approval for state changes. The easiest way to break [ABC] is human review before writes. Yes, it’s slower. It also means prompt injection can’t silently exfiltrate data.
5. Evaluate the tools listed above. If you’re serious about MCP governance, look at what Zenity, Reco, Prompt Security, and others are building. The ecosystem is early, but several have production-ready solutions.
We’re not solving this at the model level. We need architectural controls, and the tooling to make those controls practical at scale is emerging now.
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.