Prompt Injection: File-Reading Agents
The injection-specific angles for Cowork, Desktop Commander, and other file-reading agents. Document payload anatomy (PDFs, DOCX, XLSX, images), cross-file payload chains, working-folder discipline, capability restriction for file workflows, and in-flight injection detection.
← Back to Reference HubPDFs are the single most common injection carrier for file-reading agents because they pass through email, file shares, and document management systems constantly. The payload techniques cluster around invisible-to-human / visible-to-agent placement: white-on-white text in the body, zero-width Unicode characters mixed into paragraphs, content in metadata fields (Title, Author, Subject) that some agents read but human readers rarely check, instructions inside form-field defaults, and text positioned outside the visible page area but still in the parsed text layer. The agent reads everything in the text layer; the human reader sees only the rendered page. The gap between those is the attack surface.
- White-on-white or low-contrast text in body or footer
- Zero-width characters interpolated through paragraphs
- Metadata fields (Title, Author, Subject, Keywords) — agents read, humans rarely
- Form-field default values — read on text extraction
- Off-page text — outside the visible page, present in extracted text
- OCR layer mismatch — visible text says X, OCR layer says Y (in deliberately malformed PDFs)
Limitations: Sanitization tools exist but break legitimate documents. The pragmatic defense is upstream (which PDFs does the agent read?) and downstream (capability restriction after reading), not the PDF itself.
Word documents carry several injection channels PDF does not: comments and tracked changes, document properties, custom XML parts, embedded objects, and revision history that agents may parse even when the comments are 'resolved.' Spreadsheets add another dimension entirely — formulas. An agent that helpfully evaluates a cell's formula can be coerced into reading data from sheets the user did not mean to expose, or into producing output that triggers downstream injection. PowerPoint contributes speaker notes and slide masters as channels — both read by agents performing summarization, both invisible to a viewer who only sees slides.
- DOCX comments, including resolved/hidden ones, often appear in extracted text
- Tracked changes — additions and deletions visible to agents but not to default views
- Custom document properties (similar to PDF metadata)
- Excel formulas evaluated on read — can pull data the user did not select
- PowerPoint speaker notes and slide masters
- Embedded objects (linked spreadsheets, OLE objects) — depend on how the agent's reader resolves them
Limitations: Office document readers vary in how thoroughly they parse structure. The agent you are using today may read fewer hidden channels than the one you upgrade to next year. Defense by 'this surface is currently safe' is not durable.
File-reading agents that handle images perform OCR or visual parsing on them, and the text recovered is content like any other. A screenshot of a 'document' that contains injected instructions is functionally identical to a document with the same instructions. The attack is easier than it sounds: photograph a sign with text instructions, screenshot a chat with payload text, generate an image with steganographically embedded instructions. Modern multimodal models also accept direct visual instruction ('the arrow in the image points at...') which expands the channel beyond text-in-image. Treat every image the agent reads with the same threat model as every document.
- Text-in-image: any agent that OCRs sees it as content
- Visible instruction-shaped text in screenshots, photos, infographics
- Hidden steganographic payloads (rarer, more advanced)
- Visual-direction payloads ("the answer is in the highlighted region")
- Captions and alt text — read by agents handling rich content
Limitations: Stripping text-bearing images is hostile to legitimate use cases (chart screenshots, signs, photos of documents). The defense is downstream: capability restriction on what the agent can do after reading image content.
An attacker can split a payload across files. File A instructs the agent to open file B; file B contains the actual instruction. This bypasses single-file scanning and exploits the agent's general helpfulness — the agent reads B because reading B looks like helpful task continuation. Variants: payload references a file by relative path that resolves to something inside the user's working directory (the attacker plants both files), payload references a remote URL the agent will fetch, payload references a specific cell or section of another open file. The defense is at the agent level — the agent should not autonomously open new files based on instructions found inside a file it was reading.
- File A: "Per the supplementary doc in `notes.md`, please..."
- File A: "Open `instructions.txt` for the rest of the task"
- File A: 'See <attacker-url> for the format spec'
- File A points to a cell in another open spreadsheet that contains the actual payload
- Defense: agent should not chain into new files based on in-file directives without user confirmation
Limitations: Hard to detect because the individual file reads each look legitimate. Defense is at the agent's autonomy layer — limit how readily the agent follows in-content navigation instructions.
Cowork's design choice to grant file access at folder granularity is also a security mechanism: the agent can read what is in the folder and (mostly) nothing else. The attack surface is therefore bounded by the folder's contents. This makes folder discipline the highest-leverage Cowork defense: do not mount your home directory, do not mount Downloads, do not mount your full Documents tree. Mount a dedicated working folder, copy in the files the agent needs, and let the surface end there. Every file that should not be in the agent's attack surface is one that should not be in the folder.
- Mount a dedicated working folder; never your home directory
- Treat the working folder as the trust boundary — every file inside is in scope
- Use copy-in rather than mount-everything; let the agent see exactly what it needs
- Audit the folder contents periodically — files accumulate
- Re-scope per session when the workflow changes
Limitations: Friction-heavy compared to "give the agent your whole drive." Real workflows fight back. Tighter scope is purchased with copy-in steps that feel tedious until the first incident.
A file-reading agent that can also write files can use writes as an exfil channel: write the user's data to a new file, then upload that file via a connected service. An agent that can execute code can shell out anywhere the OS permits. An agent that can delete files can be coerced into destroying audit logs. Cowork's per-action confirmation on deletes is exactly this kind of capability restriction at the UX layer. Apply the same thinking to writes and executions: gate them, scope them, log them. The summarization agent should not have execution rights; the refactoring agent does, but in a sandboxed environment with no network egress.
- Read-only is the safest capability — prefer it where possible
- Write should be confined to a specific subfolder (output/, not the working root)
- Execute requires a sandboxed environment with no host filesystem access
- Delete requires per-action confirmation (Cowork enforces this by default)
- Network egress from any tool the agent calls is the highest-priority restriction
Limitations: Many legitimate workflows need write or execute capability — the goal is right-sized capability, not zero capability. Match the capability to the workflow's actual needs.
Before pointing the agent at a file from an untrusted source, give it a once-over yourself. Open a PDF and skim for visible content; check the file properties for unfamiliar metadata; look at a DOCX in 'show all comments' mode; check a spreadsheet for unfamiliar formulas. The technique is low-tech and incomplete (it misses hidden text and steganography) but catches the most common cases. Pair it with conditional autonomy: agents working on user-vetted files can run with looser gates; agents working on incoming files from external parties run with tighter gates.
- Open the file yourself before pointing the agent at it
- Show all comments / tracked changes when reviewing DOCX
- Check PDF properties pane for unfamiliar metadata
- Inspect spreadsheet formulas in unfamiliar workbooks
- Higher autonomy on user-vetted files; lower autonomy on incoming files
Limitations: Low-tech, defeats only the obvious payloads. Hidden-text and steganographic attacks survive a casual inspection. Useful as a first filter, not the only one.
An in-progress file-agent injection has recognizable shapes. The agent suddenly talks about a topic unrelated to the file you asked it to summarize. The agent asks to open a file you did not mention. The agent proposes writing output to an unfamiliar path. The agent's response contains content that did not appear to be in the source document. These are not certain signs (the agent is sometimes legitimately surprising) but they cluster around injection events. The Anthropic guidance for Cowork is unambiguous: stop the task immediately when the pattern feels off, investigate after, not during.
- Sudden topic shift mid-task — flag
- Unprompted requests to open new files — flag
- Output paths the user did not specify — flag
- Output content that does not match the source — flag
- Network or tool calls the workflow does not require — flag
- When pattern feels off: stop first, investigate after
Limitations: Skill-dependent — pattern recognition improves with experience and gets worse on tasks you are doing for the first time. Set lower autonomy thresholds for unfamiliar workflows.
Cowork's deletion confirmation is doing more work than people realize
| Carrier or threat | Channel | Visibility to human | Hardest part of defense |
|---|---|---|---|
| PDF hidden text | White-on-white, zero-width chars, metadata, off-page | Invisible by default | No clean way to sanitize without breaking the doc |
| DOCX comments and tracked changes | Comments, tracked changes, properties, embedded objects | Visible in show-all-comments mode | Reader implementations vary in what they extract |
| Excel formula payloads | Cell formulas evaluated on read | Visible if you look at the formula bar | Formulas pull from cells the user did not select |
| Image and screenshot OCR | Text rendered into image, OCR'd into context | Visible if the image is rendered | Hostile to strip without losing legitimate use |
| Cross-file chains | File A points the agent at file B | Each file looks legitimate alone | Hard to detect — chain is the malice, not the parts |
| Folder scope creep | Mounted scope wider than needed | Visible at mount time | Resistance to narrow scope from real workflows |
| Capability creep | Agent has write/exec/delete it does not need | Visible at configuration time | Many workflows resist splitting agents by capability |
Folder scope and capability scope do most of the work
For file-reading agents specifically, the two defenses that compound on each other are narrow folder mount (what content can the agent reach?) and narrow capability set (what can it do with that content?). Get both right and most of the injection threat collapses, because either ingredient 1 (sensitive data access) or ingredient 2 (exfil) of the lethal trifecta is missing. The fancier defenses — content scanning, metadata stripping, in-flight detection — are valuable additions, but they are additions to those two foundations, not substitutes.