SEP-1913: Trust and Sensitivity Annotations#1913
SEP-1913: Trust and Sensitivity Annotations#1913SamMorrowDrums wants to merge 11 commits intomodelcontextprotocol:mainfrom
Conversation
| Note over Web MCP: Detects prompt injection<br/>in page content | ||
| Web MCP-->>Client: Result (maliciousActivityHint: true,<br/>openWorldHint: true) | ||
|
|
||
| Client->>User: ⚠️ Warning: Potential malicious content detected |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Nit: Should we call them MCP Server (FILE) and MCP Server (HTTP)
Although, it is kind of implied hence nit.
| User->>Client: "Summarize this webpage" | ||
| Client->>Web MCP: tools/call (fetch URL) | ||
|
|
||
| Note over Web MCP: Detects prompt injection<br/>in page content |
There was a problem hiding this comment.
Should we also highlight that this is best opportunity for servers to apply any preventative measures against indirect prompt injection (ex: Spotlighting, Prompt Sandwich etc)?
For example: Server applies Spotlighting and marks the data along with additional instruction. reference
OR do we want clients to deal with it, since the real attack of prompt injection(s) begin with LLMs?
There was a problem hiding this comment.
I haven't read it fully, but it seems like this is just a notification mechanism, and this should be, maybe, a new field inside the schema for suggestion mitigation if the server wants to do it.
|
@localden, @rreichel3 (Open AI) is seeking to co-author this SEP as they see significant value for MCP Apps, and want to ensure that it does what they need, especially with respect to consequences of tool calls (such as being irreversible), would you be happy to also take a look at Robert's PR?
He's going to get Nick Cooper to take a look also. |
|
@localden @nickcoai I merged @rreichel3's PR so now have co-author. |
936c53b to
d255f08
Compare
d255f08 to
271fcba
Compare
Introduces trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations. Key features: - Result annotations: sensitiveHint, privateHint, openWorldHint, maliciousActivityHint, attribution - Request annotations for propagating trust context - Propagation rules ensuring sensitivity markers persist across agent sessions - Integration with Tool Resolution (modelcontextprotocol#1862) for pre-execution annotations - Per-item annotations for mixed results (e.g., search results) - Defense-in-depth approach complementing tool-level annotations Closes modelcontextprotocol#711
… type - Extend existing ToolAnnotations with trust fields (privateHint, sensitiveHint, etc.) - Leverage existing openWorldHint with refined meaning per context - Remove per-item annotations (response-level aggregation only) - Remove _meta nesting - trust annotations live in flat annotations field - Add Alternative 1 explaining why separate type was rejected - Update Tool Resolution integration to use flat annotations
Co-authored-by: Sam Morrow <sammorrowdrums@github.com>
Co-authored-by: Sam Morrow <sammorrowdrums@github.com>
- Rename DRAFT-trust-annotations.md to 1913-trust-and-sensitivity-annotations.md - Update header to match SEP-1850 template format (dash-prefixed list) - Add full PR URL - Move issue reference to note below header - Regenerate SEP documentation for docs site
271fcba to
f46d45e
Compare
| - **User consent** cannot be meaningfully enforced without knowing a tool's real-world impact. | ||
| - **Distrust by default** leads to confirmation fatigue and bad user experience. | ||
|
|
||
| Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit. |
There was a problem hiding this comment.
Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit.
Just for my understanding. Suppose my mcp is hosted inside a cluster as a pod and it needs egress to my internal service or maybe external, why do I enforce the security rule for data flow inside code running in that pod(I mean at protocol level), shouldn't I do it at infra(egress) level?
where inputs go, where outputs originate
I mean, shouldn't it be controlled at the infra level, not the protocol level? Since LLM clients are not deterministic, shouldn't we enforce security rules deterministically?
There was a problem hiding this comment.
Annotations are handled by clients, not LLMs themselves, so deterministic policy enforcement is exactly the sort of thing this could enable.
|
|
||
| Indicates the origin of returned data. | ||
|
|
||
| - **untrustedPublic** — Public but unverified sources. |
There was a problem hiding this comment.
are enterpise setup allowing untrustedPublic? There must have been a check at the egress controller , whatever the company is using.
|
Maintainer Activity CheckHi @localden! You're assigned to this SEP but there hasn't been any activity from you in 19 days. Please provide an update on:
If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer. This is an automated message from the SEP lifecycle bot. |
|
About to propose an Annotations working group that will look at this amongst others. |
|
@connor4312 agreed on most of that. With regards to:
I think this feature will not be one coding agents care about, but agents with no terminal access that access medical records for example, might well want a complete account of all records accessed for example (HIPAA compliance etc.), and also might want to build agents that will not allow for mixing of records from say a different identity in the same session. So I think that you are right to flag this is a niche feature, but one that people trying to do this stuff can use for example for https://github.com/mcp-security-standard/mcp-server-security-standard |
JustinCappos
left a comment
There was a problem hiding this comment.
Really interesting proposal!
My biggest comment: I'm a little worried about how usable and how useful this is. I don't know if different people creating the same tool would use the same security annotations. I feel the categories are quite squishy / broadly defined at times and as such, I wonder how clients would use this. It feels like clients will want to err on the side of using dangerous tools instead of blocking tools and that might neuter much of the usefulness here, but this is just a gut reaction. I think having a small, informal study of a few people would give you some ammo to argue for usefulness and consistency.
| This pattern enables: | ||
|
|
||
| - **Deterministic enforcement** through declarative contracts for tool behavior | ||
| - **Data exfiltration prevention** by tracking when sensitive data flows to open-world destinations |
There was a problem hiding this comment.
Can you really do this in situations where it might be transformed by some other step? What assumptions do we make about the destination of this? How
|
|
||
| - **Deterministic enforcement** through declarative contracts for tool behavior | ||
| - **Data exfiltration prevention** by tracking when sensitive data flows to open-world destinations | ||
| - **Prompt injection defense** by marking untrusted data sources |
There was a problem hiding this comment.
Is a binary (labeled / not labeled) likely sufficient for this? Would something indicating what is data vs instructions be better?
|
|
||
| **1. Indirect Prompt Injection** | ||
|
|
||
| Data from untrusted sources (web pages, emails, user-generated content) enters the context without markers indicating its origin. An attacker can embed instructions in this data that the model may execute. |
There was a problem hiding this comment.
If we want to stop prompt injection, would it be better to instead have these be labeled as coming from email (i.e. data not code/prompts)? I don't really understand how to label github vs email with a trust level of high, low, medium, etc.
|
|
||
| **2. Data Exfiltration** | ||
|
|
||
| Sensitive information (credentials, PII, proprietary data) can be passed to tools that write to external destinations. Without declared data classifications and action metadata, clients cannot enforce policies like "don't email private repo content to external addresses." |
There was a problem hiding this comment.
How do you track flow in the system given data with a label? How do you associate this with a later LLM output?
|
|
||
| **4. Compliance Requirements** | ||
|
|
||
| Regulated industries (healthcare, finance) need audit trails and sensitivity classifications. Without standardized annotations, each implementation reinvents this wheel. |
There was a problem hiding this comment.
Are you thinking about in-toto attestations or some other cryptographically verifiable means for this?
|
|
||
| `DataClass` keeps sensitivity simple for common cases while allowing regulated data to be scoped. The `regulated` form declares applicable regimes; it does not assert compliance. | ||
|
|
||
| `RegulatoryScope` accepts arbitrary strings. The following are suggested examples for common regimes: GDPR, CCPA, HIPAA, GLBA, PCI-DSS, FERPA, COPPA, SOX. |
There was a problem hiding this comment.
Wouldn't you also want to indicate versions and specific parts of the scope which are implicated? These also often will touch on a huge array of these in different places. How should lists of these be handled? Do these need to be normalized in some way?
| - **system** — Data is stored by the platform and not accessible to users or developers. | ||
| - **user** — Data is stored and visible only to the end user. | ||
| - **internal** — Data is stored and visible to a restricted internal audience. | ||
| - **public** — Data may be transmitted to or stored in publicly accessible systems. |
There was a problem hiding this comment.
I'm not sure what these mean in some cases. Is something visible to my friends on Facebook public? Is data about my taxes which is not individually identifiable but which goes into statistics on a website listed as public or???
|
|
||
| Describes the real-world impact of invoking the tool. | ||
|
|
||
| - **benign** — No persistent state change outside the tool's execution context, or changes limited to private drafts that are not transmitted or shared. |
There was a problem hiding this comment.
The items in this section feel hard to reason about.
If I make a change which doesn't really matter (e.g., increment a counter of visitors), this is not very impactful, but is technically irreversible, since I have now way to undo it.
|
|
||
| #### Why Not Information Flow Control (IFC) Labels? | ||
|
|
||
| @JustinCappos [suggested](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/711#issuecomment-2967516811) IFC-style categorical labels instead of linear sensitivity. This is a valid approach with tradeoffs: |
There was a problem hiding this comment.
I should note that my thinking about if IFC should be used anywhere in this space has evolved since then. 😄 I don't think IFC is a great fit, but I do think it does fit relatively well if you use annotations, so I guess the comment still stands.
| - Missing annotations treated as unknown (not as "safe") | ||
| - Clients should apply appropriate defaults for unlabeled data | ||
| - No enforcement happens without annotation support | ||
|
|
There was a problem hiding this comment.
I'd love to see a quick and informal study where you get 5 people to apply annotations to the same set of MCP tools and see if their annotations are the same or vary.
SEP: Trust and Sensitivity Annotations
Summary
This SEP proposes trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations.
Motivation
As MCP adoption grows, data flows across tool boundaries without standardized trust metadata. This creates security gaps:
Key Features
Annotations
sensitiveHint: Granular sensitivity levels (low,medium,high)privateHint: Marks internal/private dataopenWorldHint: Indicates untrusted/external data sourcesmaliciousActivityHint: Signals detected suspicious patternsattribution: Provenance tracking for audit trailsPropagation Rules
Integration Points
trustedHintTool Annotation #1487, SEP-1560: Addition of secretHint Tool Annotation #1560, SEP-1561: Addition of unsafeOutputHint Tool Annotation #1561)Related Work
Open Questions
Closes #711
/cc @dend (sponsor)