Suggested way to pass files from the client #1197

nickwinder · 2024-12-01T18:28:34Z

nickwinder
Dec 1, 2024

Pre-submission Checklist

I have checked that this question would not be more appropriate as an issue in a specific repository
I have searched existing discussions and documentation for answers

Question Category

Your Question

👋
I’m exploring ways to enable LLMs to process files using MCP servers. As a first step, the client needs to pass or report the file to the server for processing.

From the specification, I see two potential approaches:

Ask the LLM to send a base64 blob as part of the tool call.
- This approach is unreliable, particularly as file sizes increase.
Request the roots/list endpoint from the client to identify readable files in the file system.
- This depends on the client correctly implementing the roots protocol. However, in my testing with version 0.7.1 of the Claude desktop app, the roots protocol doesn’t appear to be supported (requests fail with -32601 (Method not found)).

Does anyone have suggestions for better ways to pass files from the client to the server for processing? Am I overlooking something fundamental in the protocol?

jerome3o-anthropic · 2024-12-03T15:31:26Z

jerome3o-anthropic
Dec 3, 2024
Maintainer

Hey @nickwinder - thanks for reaching out, I'm on the anthropic team and this is something that I've been looking into recently, and want to figure out a good way to implement this sort of thing at the protocol level. I don't think you're overlooking anything, your use-case is currently finicky in the current state of the protocol.

My initial thoughts/work have been about allowing MCP server tools to return "files" - basically a tool result content type that is not intended for the LLM's context window, but more for use in an conversation's workspace. For an example scenario, this is what I'd like to enable:

A BigQuery (or other SQL/data warehouse tool) is used by an LLM to make an SQL query that returns too much data for the LLM context (say something in the order of ~10mb)
- This is returned from the tool using a tool result content "file" type
- The "file" tool result indicates to the MCP host that this shouldn't enter the prompt verbatim, but it's creation should be indicated to the LLM (i.e. by adding "file XYZ was created")
The LLM uses another tool to process that file
- The processing tool then saves a file, which is also returned to the MCP host as a tool result of the "file" type
The LLM then loads this file back to the warehouse, or elsewhere, using the processed file

This effectively allows bespoke ETL style workflows to be done by the LLM using MCP (along with many other use cases).

My thinking so far is to extend the tool result format to have a FileContent type, with B64 encoded data as the content. This would solve the "Tools creating files" usecase, but I'm not sure how we could implement the intention for the model to provide a file to another MCP server - I'm open to ideas.

Some things that are top of mind here for me are:

We don't want to make too many assumptions around this "meta-context" should work for other apps, i.e. do we expect most MCP hosts to maintain a filesystem?
How should the model indicate that it wants to include a file, and how should we accommodate this in the protocol

1 reply

nickwinder Dec 4, 2024
Author

Thanks for replying!
I follow what you're saying, but one thing I may suggest changing is the way "file" data is reported and recieved.
For the tool result you refer to, I feel it would make more sense to have a new URIResource which can refer to a URI, but importantly does not contain the content blob.
The client is then free to fetch that content blob from the Server if it feels the need to do so.
I see some concepts in the schema that already relate to this kind of behavior, namely ReadResourceResult, and ReferenceResource. These seem like they can be borrowed to make what you describe possible fairly easily.

As for the client to provide a file (or resource) to a server, I would imagine, a similar mechanism must be reported to the server in order to process.

Model sends a tool request with a URI
Server can query client for the given URI and retrieve a blob to work with

I believe I need to sit with the protocol for longer to fully understand its limitations, but this seems logical to me right now.

evalstate · 2024-12-03T21:19:15Z

evalstate
Dec 3, 2024
Collaborator

I started a related discussion in the Specification area here: #90

I think this is what the "Audience" annotation is for -
https://github.com/modelcontextprotocol/specification/blob/bb5fdd282a4d0793822a569f573ebc36804d38f8/schema/schema.ts#L814-L836

But this doesn't quite answer how best to deal with the specific implementation (which could be Roots, Resource Templates, or custom handling).

At the moment, the Client doesn't expose Resource Templates to the server; having a symmetrical approach might make sense for this case where the client can offer Metadata and Resource Identifiers that are made available to the server (and potentially scoped to a session).

This feels cleaner than a shared file system via Roots.

Another thing I'm finding is that I'm enjoying playing with MCP Servers and having Claude orchestrate them - but in a lot of cases once I've seen 2 clients working together, I don't necessarily need inference between them - some way of chaining them might be good.

0 replies

cpryland · 2025-03-29T21:37:20Z

cpryland
Mar 29, 2025

@evalstate Any progress here?

0 replies

mheubi · 2025-04-03T03:54:05Z

mheubi
Apr 3, 2025

I ran across the same challenge when implementing a PDF text extractor tool as an MCP tool.
Since these PDFs can be arbitrarily large, I do not want the extracted text to be loaded directly into the LLM context, but rather stored into a text file which can then be pulled in piecewise as the LLM sees fit.

While this can all be done by having an external file location and passing file references to and from the tool, it is much more preferable from a security perspective to pass files to and from the tool directly.

This is where I arrived at the same approach as @jerome3o-anthropic is suggesting, introducing a file return type:

result = [types.FileContent(type="file", data=data, name='myresult.txt (optional)', mimeType='text/markdown)]

The client then stores that file locally (in my case in a virtual file system inside the client's session). It then only loads the VFS file-id into LLM context. The LLM can now access the file or split it into chunks (depending on size) using additional client side tools.

P.S: When initiating the tool call, I upload the file to the tool based on annotations in the tool description, which indicate which tool parameters are file-ids. My generic MCP client, upon detcting a tool invocation, first POSTs all the referenced files from its VFS to the MCP server (which has its own VFS, aka a session dictionary ;-). This post operation lives outside the MCP spec, so for my proprietary use case, no real protocol change is required in the upstream direction. Having an official upstream mechanism as well, however, would allow for cross-vendor tool compatibility, of course.

I think a FileContent return type would make a great addition to the protocol. We are currently left either sequeezing multipart-mime into a text response or abusing the image response to pass non-image binary data, both of which is a hack.

10 replies

mheubi Apr 4, 2025

The MCP provides document conversion / text extraction. The result may be huge and must not be loaded directly into LLM context.
That is why it is passed back as a file, stored in a virtual filesystem inside the client.
The LLM then accesses it piecemeal by using additional tools.

The whole process is completely transparent:

The user drags a file into the client prompt.
It is stored in the virtual file system.
Once the user asks questions about it, the LLM tries to access it
The LLM realizes that it is non-text and invokes the MCP converter, which places a text file in the file system (as described above)
The LLM tries to access the converted file and realizes it is to big
The LLM then invokes additional tools (default ones, directly integrated into the client) to do the piecewise access.

cpryland Apr 4, 2025

So you're depending on the client LLM to recognize all these facts and do the right thing?

Doesn't sound reliable...

mheubi Apr 4, 2025

Well, yes and no….
The trick is to give the LLM guidance without having to actively manage control flow:

the file reader tool, when instructed to read a read a file will yield a „file format is not text“ error on the pdf
it will yield a „file is too big„ error if the file is Text but larger than, say, 50K

this in turn motivates the llm to invoke the file converter tool and the file splitter tool, respectively

It actually works surprisingly reliable, if you’re willing to let the llm trigger those errors once rather than getting it immediately right

cpryland Apr 4, 2025

In the case of producing a file, I wonder if the server couldn't output to the recently-created output, then return text/markdown with a link to the file either as a remote URL or a local file:/// URL?

At least then the user could have something to click on to view? Especially if this is an endpoint in the LLM tool usage chain?

mheubi Apr 5, 2025

That's how it's normally done. However I'm bound by company policy that confidential information must not be physically stored in the AI subsystem. This is why I'm trying to pass the data directly to and from the tool and only keep it in session memory.

BTW here's what the LLM does with this toolset:
the query_file tool accesses chunked embeddings that have been asynchronously built as soon as the user uploaded the file.
(I agree, it's quite amazing how the LLM reliably figures all this out by itself)

{ContextElement: actor=ASSISTANT, content="How can I help you?"}
{FileData: actor:USER, content="[context][file id=69d767d0-M-Gruppe-FIBE-FER-2024_de.pdf name=M-Gruppe-FIBE-FER-2024_de.pdf type=application/pdf]"}
{ContextElement: actor=USER, content="how many employees did migros have in 2023"}
{ToolInvocation: id=call_1uEt2EceHOJAp3ZfD9u1kgJz, name=query_file, args={"id":"69d767d0-M-Gruppe-FIBE-FER-2024_de.pdf","query":"number of employees in Migros in 2023"}}
{ToolResult: success=false, id=call_1uEt2EceHOJAp3ZfD9u1kgJz, content=[Error] File ID: 69d767d0-M-Gruppe-FIBE-FER-2024_de.pdf - File is not text.}
{ToolInvocation: id=call_dmCQzDZwMzAgwciB0J2Vk788, name=convert_to_text, args={"file_id":"69d767d0-M-Gruppe-FIBE-FER-2024_de.pdf"}}
{ToolResult: success=true, id=call_dmCQzDZwMzAgwciB0J2Vk788, content=[File] File ID: 8140311b}
{ToolInvocation: id=call_GAKkDyNftVgkV2Tw65nppRve, name=query_file, args={"id":"8140311b","query":"number of employees in Migros in 2023"}}
{ToolResult: success=true, id=call_GAKkDyNftVgkV2Tw65nppRve, content=[Content] File ID: 8140311b - Migros Kompakt > Kennzahlen 5...}
...
{ContextElement: actor=USER, content="give me a summary of the whole document"}
{ToolInvocation: id=call_O90EGlmx1kRIWIDBJiLoKaKY, name=split_file, args={"id":"8140311b"}}
{ToolResult: success=true, id=call_O90EGlmx1kRIWIDBJiLoKaKY, content=File ID: 8140311b split into 18 parts:
8140311b:0
8140311b:1
...}

cpryland · 2025-04-03T04:37:24Z

cpryland
Apr 3, 2025

That would be great, but won't most MCP clients reject the mimeType text/markdown? Or are you saying you actually return that string for the LLM to pick up and vibe on later?

1 reply

mheubi Apr 3, 2025

The pdf extraction generates markdown, as it allows the llm to better analyze (or "vibe" ;-) the content (hierarchy of headings, for example).

The code above is just a proposal, currently there is not even a FileContent return type, only Text and Image.
The ImageContent return type already implements mimeType, the proposal essentially only adds a file name and allows for the type to be literal 'file'.

from SDK types.py:

class ImageContent(BaseModel):
    """Image content for a message."""

    type: Literal["image"]
    data: str
    """The base64-encoded image data."""
    mimeType: str
    """
    The MIME type of the image. Different providers may support different
    image types.
    """
    annotations: Annotations | None = None
    model_config = ConfigDict(extra="allow")

evalstate · 2025-04-03T05:50:54Z

evalstate
Apr 3, 2025
Collaborator

Hi, this PR is related : #223 as a way to have Server's intelligently adapt content types based on what is tokenizable. I assume that everything which is text is "good" (although the Anthropic Messages API doesn't for example accept text/markdown as a document type - so juts send as text/plain).

One other point on file transfer (not recommending this...) would be to have your Client respond to Sampling requests (e.g. parse the request and look for a resource name) and use the SamplingMessage to respond with your data.

1 reply

mheubi Apr 3, 2025

thx for pointing out the PR

dang-1234 · 2025-04-04T12:54:01Z

dang-1234
Apr 4, 2025

I guess I have a similar issue of huge responses from an API and context size.
Is there a similar possible solve?

https://github.com/orgs/modelcontextprotocol/discussions/272

0 replies

ravi0613 · 2025-06-03T06:33:09Z

ravi0613
Jun 3, 2025

How can I handle large file transfers between an MCP server and client without exceeding token limits in LLM calls?

I'm using the Model Context Protocol (MCP) for a file processing pipeline where:
On the client side, I have an agent called File Processing Agent.This agent uses tools exposed by the MCP server.

The process:

I instruct the agent by providing a filename and URL.The agent invokes an MCP server tool that downloads and reads a large Excel file from the provided URL.
Let us assume the file has 4 sheets, each with around 4,000 rows and 12 columns (total ~16,000 rows).
The content is then converted to JSON and sent back to the client.
After receiving the file content, the agent is tasked to pass it to another tool that writes the content to a local file on the client.

The issue:

Due to the file’s size, the JSON content is massive.

The LLM (Azure OpenAI) throws a 400 Bad Request error stating:

Content length is more than 128,000 tokens, but got around 1.95 million tokens.

I’ve already tried:

Splitting the file into chunks and sending them sequentially.
Encoding the file in Base64.

But I still hit the token limit after processing 1–2 chunks, with Base64 still reaching around 1.12M tokens.

What I need help with:

Are there alternative strategies like:

Streaming?
External file references?
Bypassing the LLM’s context for large data payloads?

What is the best approach to transfer large file content (like Excel data) using MCP, in a way that avoids hitting the token limit?

3 replies

tomelliot Jun 3, 2025

You're inherently limited by the protocol for this use case. If you have two separate tools, then the data has to pass through the LLM and do inference. Architecturally this is going to be bad practice - your data may get modified during the inference and you won't know about it.

Some alternatives:

write one single tool that does both tasks
update both tools to use a filesystem - e.g. the first tool stores the file to disk, and the second tool reads it from disk again

boylec Nov 2, 2025

Please see SEP-1708 which I have introduced and am taking ownership of. I believe it is related and may address the needs mentioned in this thread. Would love to hear your opinion(s).

#1708

abhiram-ar Jan 29, 2026

@ravi0613 The core bottleneck here is that we’re asking an LLM to edit a very small portion of an Excel file, while LLMs operate under strict context window limits. Converting the entire Excel sheet into a JSON string and asking the model to modify it is both inefficient and unnecessary.

Instead, I’d propose introducing a local MCP server that exposes domain-specific tools for Excel manipulation. These tools would use the Excel API (or a suitable library) to programmatically modify the sheet.

For example:

Need to compute the average of column 2? → Expose a dedicated tool for that.
Need to insert or update a row? → Expose a tool for row-level mutations.

All the business logic and file-level operations would live inside the MCP server, while the LLM would only orchestrate which tool to call and with what parameters. This keeps the LLM out of large text processing, avoids context window issues, and results in a more scalable, deterministic, and maintainable system.

Suggested way to pass files from the client #1197

Uh oh!

Pre-submission Checklist

Question Category

Your Question

Replies: 8 comments · 16 replies

Uh oh!

jerome3o-anthropic Dec 3, 2024 Maintainer

Uh oh!

nickwinder Dec 4, 2024 Author

Uh oh!

evalstate Dec 3, 2024 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evalstate Apr 3, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 8 comments 16 replies

jerome3o-anthropic
Dec 3, 2024
Maintainer

nickwinder Dec 4, 2024
Author

evalstate
Dec 3, 2024
Collaborator

evalstate
Apr 3, 2025
Collaborator