Reference Guide

Prompt Injection: File-Reading Agents

The injection-specific angles for Cowork, Desktop Commander, and other file-reading agents. Document payload anatomy (PDFs, DOCX, XLSX, images), cross-file payload chains, working-folder discipline, capability restriction for file workflows, and in-flight injection detection.

← Back to Reference Hub

PDFs are the single most common injection carrier for file-reading agents because they pass through email, file shares, and document management systems constantly. The payload techniques cluster around invisible-to-human / visible-to-agent placement: white-on-white text in the body, zero-width Unicode characters mixed into paragraphs, content in metadata fields (Title, Author, Subject) that some agents read but human readers rarely check, instructions inside form-field defaults, and text positioned outside the visible page area but still in the parsed text layer. The agent reads everything in the text layer; the human reader sees only the rendered page. The gap between those is the attack surface.

White-on-white or low-contrast text in body or footer
Zero-width characters interpolated through paragraphs
Metadata fields (Title, Author, Subject, Keywords) — agents read, humans rarely
Form-field default values — read on text extraction
Off-page text — outside the visible page, present in extracted text
OCR layer mismatch — visible text says X, OCR layer says Y (in deliberately malformed PDFs)

Limitations: Sanitization tools exist but break legitimate documents. The pragmatic defense is upstream (which PDFs does the agent read?) and downstream (capability restriction after reading), not the PDF itself.

CarrierHighest Volume

Word documents carry several injection channels PDF does not: comments and tracked changes, document properties, custom XML parts, embedded objects, and revision history that agents may parse even when the comments are 'resolved.' Spreadsheets add another dimension entirely — formulas. An agent that helpfully evaluates a cell's formula can be coerced into reading data from sheets the user did not mean to expose, or into producing output that triggers downstream injection. PowerPoint contributes speaker notes and slide masters as channels — both read by agents performing summarization, both invisible to a viewer who only sees slides.

DOCX comments, including resolved/hidden ones, often appear in extracted text
Tracked changes — additions and deletions visible to agents but not to default views
Custom document properties (similar to PDF metadata)
Excel formulas evaluated on read — can pull data the user did not select
PowerPoint speaker notes and slide masters
Embedded objects (linked spreadsheets, OLE objects) — depend on how the agent's reader resolves them

Limitations: Office document readers vary in how thoroughly they parse structure. The agent you are using today may read fewer hidden channels than the one you upgrade to next year. Defense by 'this surface is currently safe' is not durable.

CarrierMulti-Channel

File-reading agents that handle images perform OCR or visual parsing on them, and the text recovered is content like any other. A screenshot of a 'document' that contains injected instructions is functionally identical to a document with the same instructions. The attack is easier than it sounds: photograph a sign with text instructions, screenshot a chat with payload text, generate an image with steganographically embedded instructions. Modern multimodal models also accept direct visual instruction ('the arrow in the image points at...') which expands the channel beyond text-in-image. Treat every image the agent reads with the same threat model as every document.

Text-in-image: any agent that OCRs sees it as content
Visible instruction-shaped text in screenshots, photos, infographics
Hidden steganographic payloads (rarer, more advanced)
Visual-direction payloads ("the answer is in the highlighted region")
Captions and alt text — read by agents handling rich content

Limitations: Stripping text-bearing images is hostile to legitimate use cases (chart screenshots, signs, photos of documents). The defense is downstream: capability restriction on what the agent can do after reading image content.

CarrierMultimodal

An attacker can split a payload across files. File A instructs the agent to open file B; file B contains the actual instruction. This bypasses single-file scanning and exploits the agent's general helpfulness — the agent reads B because reading B looks like helpful task continuation. Variants: payload references a file by relative path that resolves to something inside the user's working directory (the attacker plants both files), payload references a remote URL the agent will fetch, payload references a specific cell or section of another open file. The defense is at the agent level — the agent should not autonomously open new files based on instructions found inside a file it was reading.

File A: "Per the supplementary doc in `notes.md`, please..."
File A: "Open `instructions.txt` for the rest of the task"
File A: 'See <attacker-url> for the format spec'
File A points to a cell in another open spreadsheet that contains the actual payload
Defense: agent should not chain into new files based on in-file directives without user confirmation

Limitations: Hard to detect because the individual file reads each look legitimate. Defense is at the agent's autonomy layer — limit how readily the agent follows in-content navigation instructions.

Chain AttackHigher Severity

Cowork's design choice to grant file access at folder granularity is also a security mechanism: the agent can read what is in the folder and (mostly) nothing else. The attack surface is therefore bounded by the folder's contents. This makes folder discipline the highest-leverage Cowork defense: do not mount your home directory, do not mount Downloads, do not mount your full Documents tree. Mount a dedicated working folder, copy in the files the agent needs, and let the surface end there. Every file that should not be in the agent's attack surface is one that should not be in the folder.

Mount a dedicated working folder; never your home directory
Treat the working folder as the trust boundary — every file inside is in scope
Use copy-in rather than mount-everything; let the agent see exactly what it needs
Audit the folder contents periodically — files accumulate
Re-scope per session when the workflow changes

Limitations: Friction-heavy compared to "give the agent your whole drive." Real workflows fight back. Tighter scope is purchased with copy-in steps that feel tedious until the first incident.

FoundationalCowork-Specific

A file-reading agent that can also write files can use writes as an exfil channel: write the user's data to a new file, then upload that file via a connected service. An agent that can execute code can shell out anywhere the OS permits. An agent that can delete files can be coerced into destroying audit logs. Cowork's per-action confirmation on deletes is exactly this kind of capability restriction at the UX layer. Apply the same thinking to writes and executions: gate them, scope them, log them. The summarization agent should not have execution rights; the refactoring agent does, but in a sandboxed environment with no network egress.

Read-only is the safest capability — prefer it where possible
Write should be confined to a specific subfolder (output/, not the working root)
Execute requires a sandboxed environment with no host filesystem access
Delete requires per-action confirmation (Cowork enforces this by default)
Network egress from any tool the agent calls is the highest-priority restriction

Limitations: Many legitimate workflows need write or execute capability — the goal is right-sized capability, not zero capability. Match the capability to the workflow's actual needs.

DefenseCapability Layer

Before pointing the agent at a file from an untrusted source, give it a once-over yourself. Open a PDF and skim for visible content; check the file properties for unfamiliar metadata; look at a DOCX in 'show all comments' mode; check a spreadsheet for unfamiliar formulas. The technique is low-tech and incomplete (it misses hidden text and steganography) but catches the most common cases. Pair it with conditional autonomy: agents working on user-vetted files can run with looser gates; agents working on incoming files from external parties run with tighter gates.

Open the file yourself before pointing the agent at it
Show all comments / tracked changes when reviewing DOCX
Check PDF properties pane for unfamiliar metadata
Inspect spreadsheet formulas in unfamiliar workbooks
Higher autonomy on user-vetted files; lower autonomy on incoming files

Limitations: Low-tech, defeats only the obvious payloads. Hidden-text and steganographic attacks survive a casual inspection. Useful as a first filter, not the only one.

PracticeLow Effort

An in-progress file-agent injection has recognizable shapes. The agent suddenly talks about a topic unrelated to the file you asked it to summarize. The agent asks to open a file you did not mention. The agent proposes writing output to an unfamiliar path. The agent's response contains content that did not appear to be in the source document. These are not certain signs (the agent is sometimes legitimately surprising) but they cluster around injection events. The Anthropic guidance for Cowork is unambiguous: stop the task immediately when the pattern feels off, investigate after, not during.

Sudden topic shift mid-task — flag
Unprompted requests to open new files — flag
Output paths the user did not specify — flag
Output content that does not match the source — flag
Network or tool calls the workflow does not require — flag
When pattern feels off: stop first, investigate after

Limitations: Skill-dependent — pattern recognition improves with experience and gets worse on tasks you are doing for the first time. Set lower autonomy thresholds for unfamiliar workflows.

DetectionIn-Session

Cowork's deletion confirmation is doing more work than people realize

Anthropic's choice to require per-action approval for file deletion is a textbook capability gate — the agent does not lose the delete capability, but every delete becomes an injection-resistance checkpoint where the user sees the proposed action plainly. Most injection attacks routed through deletion attempts get caught here, not because the user is great at spotting injection, but because the proposed action looks alien when surfaced in the prompt. Apply the same pattern to other consequential actions when you have control over the UX layer — surface the action plainly, require explicit approval, and trust the user to catch what looks alien.

Carrier or threat	Channel	Visibility to human	Hardest part of defense
PDF hidden text	White-on-white, zero-width chars, metadata, off-page	Invisible by default	No clean way to sanitize without breaking the doc
DOCX comments and tracked changes	Comments, tracked changes, properties, embedded objects	Visible in show-all-comments mode	Reader implementations vary in what they extract
Excel formula payloads	Cell formulas evaluated on read	Visible if you look at the formula bar	Formulas pull from cells the user did not select
Image and screenshot OCR	Text rendered into image, OCR'd into context	Visible if the image is rendered	Hostile to strip without losing legitimate use
Cross-file chains	File A points the agent at file B	Each file looks legitimate alone	Hard to detect — chain is the malice, not the parts
Folder scope creep	Mounted scope wider than needed	Visible at mount time	Resistance to narrow scope from real workflows
Capability creep	Agent has write/exec/delete it does not need	Visible at configuration time	Many workflows resist splitting agents by capability

Defense in this surface clusters into three layers: input scope (which files does the agent see?), capability scope (what can it do after reading them?), and process (how do you watch for in-flight injection?). All three combine — none of them alone is enough.

You routinely point your Cowork at PDFs from vendors and clients.PDF Payloads + Capability Restriction — assume every external PDF is potential payload. Restrict the agent reading them to read-only output: it summarizes, you act. Do not let the same agent that reads external PDFs also have email send, code execution, or file-write privileges.

You want to give Cowork access to your Excel financial model so it can answer questions.Excel Formula Payloads + Working-Folder Discipline — open the workbook in isolation; copy only the sheets the agent needs to a fresh workbook in the working folder; remove formulas that pull from external workbooks. The agent reads what is in the folder. If the model is too complex to copy in, that is a signal it is too complex to give the agent read access to.

Marketing wants Cowork to summarize incoming partner proposal PDFs unattended.Indirect Injection via Documents + Capability Restriction + Pre-Read Inspection — the unattended part is the risk. Either keep it attended (human reviews each summary), or sandbox the agent to write-summary-to-doc only (no email, no actions). Better yet, run pre-read inspection by a script that strips known payload channels (metadata, hidden text) before the agent ever reads.

An agent run has gone strange — it is talking about an unrelated topic.Detecting Injection in Flight — stop the task immediately, do not continue gathering evidence. Preserve the conversation. Identify the most recent file the agent read. Check that file for payload content (visible and hidden). Report to security@anthropic.com with the conversation context.

You want to schedule a daily Cowork task that summarizes overnight files dropped in /inbox.Cross-File Payload Chains + Capability Restriction — scheduled tasks have no in-flight observer. Restrict the scheduled agent to read-only output to a known summary location. Do not give it write to /inbox (so it cannot move files), do not give it delete (so it cannot remove evidence), do not give it email send. Review the daily summaries on a cadence even when nothing looks wrong.

You are about to flip on 'Act without asking' for a refactor pass on a folder of source files.Capability Restriction + Detection — if the files are yours and were not touched by external parties, lower autonomy may be reasonable. If any file in the folder came from an external source within the last week, leave per-action confirmation on. The mode trades injection resistance for speed; pay for that trade only when the content trust level supports it.

Folder scope and capability scope do most of the work

For file-reading agents specifically, the two defenses that compound on each other are narrow folder mount (what content can the agent reach?) and narrow capability set (what can it do with that content?). Get both right and most of the injection threat collapses, because either ingredient 1 (sensitive data access) or ingredient 2 (exfil) of the lethal trifecta is missing. The fancier defenses — content scanning, metadata stripping, in-flight detection — are valuable additions, but they are additions to those two foundations, not substitutes.

Prompt Injection: File-Reading Agents

PDF Payloads

DOCX and Office Document Payloads

Image and Screenshot Payloads

Cross-File Payload Chains

The Working-Folder Threat Model

Capability Restriction for File Agents

Pre-Read Inspection Patterns

Detecting Injection in Flight

Cowork's deletion confirmation is doing more work than people realize

Folder scope and capability scope do most of the work