Using Cowork Safely
Anthropic's official Cowork safety guidance, adapted into the practices that matter — granting access narrowly, monitoring tasks not commands, scheduled-task discipline, when 'Act without asking' is appropriate, computer-use caution, MCP and plugin vetting, and what to do when something looks wrong.
← Back to Reference HubCowork can read, write, and (with explicit permission) permanently delete the files it can see. The single most effective guardrail is keeping the visible surface small. Mount a dedicated working folder rather than your home directory or your full Documents tree, and keep backups of anything you can't afford to lose. Deletion protection still requires you to click Allow, but a tighter scope reduces the blast radius of every other class of mistake too — including ones that don't trip the deletion prompt.
- Create a dedicated /cowork (or similar) folder for Cowork work
- Avoid granting access to financial documents, credentials, or personal records
- Keep backups of important files independently of Cowork
- Re-mount narrowly per session when possible — broad mounts persist
- Deletion protection requires explicit per-action Allow, even inside a granted folder
Limitations: Doesn't help when the work genuinely requires sensitive data — in that case the right move is to scope the session itself, not the folder. Doesn't cover MCP-mediated file access or files reached through Claude in Chrome.
Cowork executes a lot of commands. Trying to validate each one is impractical and gives a false sense of safety. The right defensive posture is pattern recognition: is Claude touching files or sites you didn't mention? Is the task scope creeping? If something feels off, stop immediately. This is a different mental model from code review — you're watching for drift, not auditing instructions.
- Watch for unexpected file or site access — the article calls this the primary tell
- Watch for scope creep beyond the original ask
- Stop the task immediately if something feels off; investigate after, not during
- Read the high-level task surface, not every command line
- Build a baseline expectation for each task type so drift is easier to spot
Limitations: Requires you to be actively present — doesn't transfer to scheduled tasks or 'Act without asking' runs. Skill-dependent: the more types of work you delegate, the better your pattern recognition gets.
Scheduled tasks run without you watching, which makes them structurally higher risk. The article's guidance is unambiguous: start with low-risk tasks like summaries and information compilation, avoid sensitive data and consequential actions (outbound email, purchases, financial transactions), review outputs after each run from the Scheduled page in the sidebar, and pause anything you're not actively using. Tasks only run while your computer is awake and the Claude Desktop app is open — that's a constraint, not a safety property.
- Begin with summarizing, drafting, and information-compilation tasks
- Avoid scheduling outbound email, payments, or anything hard to undo
- Review the Scheduled page in the left sidebar after each run
- Pause or delete tasks you no longer need — do not let them idle
- Remember the runtime constraint: desktop awake + Claude Desktop open
Limitations: Discipline-dependent — easy to add a 'just one more' task that drifts past safe scope. Reviews lag the action: scheduled tasks have already executed by the time you read the output.
'Act without asking' removes the per-step approval gate — the same gate that gives you a chance to interrupt a prompt-injection attempt mid-task. It is faster for well-defined work but materially increases injection risk. Use it only when all three conditions hold: you're actively supervising, you're working with trusted files/sites/tools, and you can stop Claude immediately if something looks wrong. If any of the three is shaky, leave per-step approval on.
- Condition 1: active supervision throughout the run
- Condition 2: trusted files, sites, and tools — no untrusted web content
- Condition 3: you can stop the run immediately
- Best fit: short, well-scoped tasks over files you just created or curated
- Worst fit: anything that ingests external content (email, web pages, documents from outside parties)
Limitations: Not a sandbox — it removes the most accessible defense against injection. Speed gains do not justify the mode for ambiguous or unattended work.
Computer use is the highest-risk Cowork capability because it bypasses both the file-operation permission prompts and the code-execution VM. Claude clicks, types, and navigates the actual screen with no intervening sandbox. Build trust gradually like you would with a new colleague: start with lower-stakes tasks, block sensitive apps (banking, healthcare portals, dating apps), and stay aware that Claude is taking screenshots to read the screen. Even within granted apps, a click on a link in one app will open the target — so app-level permissions do not fully contain action.
- Start with low-stakes tasks and expand only as trust accrues
- Block sensitive apps explicitly (banking, healthcare, dating, personal messaging)
- Be aware screenshots are taken to interpret the screen
- Cross-app click-through: a link opened in an allowed app reaches its destination
- No sandbox — different risk class from file or code-execution tools
Limitations: Blocklist only protects against intended access — a chain through an allowed app can still reach blocked surfaces via opened links. Productivity loss is real if you over-block.
Web pages, emails, and documents pulled in from outside are the primary delivery surface for prompt-injection attacks. Cowork's default network access is intentionally restricted — only extend it to sites you trust. Note the asymmetry: network egress permissions do NOT apply to web fetch, web search, or MCPs (including Claude in Chrome). Web fetch runs server-side and is limited to search results and URLs you've shared, but the broader Chrome extension surface is governed separately. Team and Enterprise admins can disable web search and Claude in Chrome at the org level under Capabilities.
- Default network access is restricted — extend deliberately, not by default
- Egress allowlists do not cover web fetch, web search, or MCPs (Chrome included)
- Web fetch is scoped to search results and URLs you have shared
- Org-wide controls: Organization settings > Capabilities (web search)
- Org-wide controls: Organization settings > Claude in Chrome (extension)
- Avoid using Claude in Chrome on sensitive sessions (banking, payroll, identity)
Limitations: Egress allowlist gaps are easy to forget — adding a single MCP can reopen network reach you thought you closed. Org-wide controls are blunt instruments.
Each MCP or plugin you install introduces a new path for attacks to reach Claude. Plugins are bundles — a single install can add skills, connectors, and sub-agents at once, significantly expanding scope of action. Local MCP servers bundled with plugins run on your computer with the same permissions as any other program you run; there is no Cowork-imposed sandbox around them. Stick to verified extensions from the Claude Desktop directory and read the requested permissions before installing.
- Install only from the Claude Desktop directory or other verified sources
- Read what permissions an extension requests before clicking install
- Treat plugins as multi-component installs — assess every bundled piece
- Local MCP servers run with your user permissions — no Cowork sandbox
- Audit installed MCPs and plugins periodically; remove ones you do not use
Limitations: Reviewing every permission is realistic for paid Cowork users but tedious. Bundled plugins make it harder to opt into a single capability without inheriting the rest.
The Claude for Excel and Claude for PowerPoint add-ins can read, edit, and pass context between applications. Claude might analyze a workbook and slide a chart into a deck without you directly instructing that transfer. Treat the Office add-ins + Cowork pairing as a single shared context — don't work with sensitive data in those add-ins while Cowork is active unless you intend the data to be available to whatever else Claude touches in the session.
- Add-ins share context with Cowork — data can move between apps
- Transfers can happen without an explicit 'move this' instruction
- Avoid sensitive data in Excel/PPT add-ins while Cowork is active
- Disable add-ins for sensitive workbooks rather than relying on prompt discipline
Limitations: Limits productivity for workflows that genuinely benefit from cross-app context. Doesn't apply to Excel/PPT used without the Claude add-ins.
When you message Cowork from your phone, the agent still runs on your desktop with whatever access you previously granted. The phone is effectively a remote control for your desktop's resources. This matters most on managed corporate computers: granting Cowork desktop access now extends that access to a personal mobile device that may not be under the same management posture. Review what you've granted Cowork and confirm the scope is appropriate for off-device triggering, not just for the corporate laptop in front of you.
- Phone messaging does not move Cowork — it triggers the desktop agent
- All previously-granted folders, connectors, and plugins remain in scope
- Especially relevant for MDM-managed laptops paired with personal phones
- Audit granted access through the lens of remote triggerability
- Treat phone-side authentication as the new perimeter for desktop scope
Limitations: No way to scope mobile-triggered sessions more narrowly than desktop-triggered ones today.
The recognizable signals of a prompt injection in progress: Claude suddenly discussing unrelated topics, attempting to access unexpected resources, or asking for sensitive information unprompted. The recommended response is to stop the task immediately and report to security@anthropic.com or via the in-app feedback button. Reports feed back into model training and content classifiers — the defenses everyone else benefits from get sharper because of yours.
- Signal 1: Claude shifts to unrelated topics mid-task
- Signal 2: attempts to access resources you did not mention
- Signal 3: unprompted requests for sensitive information
- First action: stop the task — do not continue to gather more evidence
- Then report: security@anthropic.com or in-app feedback button
Limitations: Recognition is skill-dependent — subtler injections that mimic plausible task drift are harder to flag.
Your phone is now a remote control for your desktop
| Practice | When it applies | Risk addressed | Effort | |
|---|---|---|---|---|
| Narrow file access | Pre-session | Accidental read, write, deletion of sensitive files | Low — set once | Foundational |
| Monitor tasks, not commands | In-session | Scope creep, hidden tool drift, prompt injection in progress | Low — passive attention | Always |
| Scheduled task discipline | Recurring work | Unattended action, consequential side effects | Medium — setup + periodic review | High value |
| Avoid 'Act without asking' | Mode setting | Mid-task prompt injection | Low — leave default on | Default |
| Computer use caution | Per-task | Direct screen access without sandbox | Medium — blocklist + supervision | Highest stakes |
| Restrict browser/web | Pre-session + ongoing | Prompt injection from untrusted content | Low — admin or per-session | Foundational |
| Vet MCPs and plugins | Install-time | Bundled capability creep, untrusted code with full permissions | Low — read before install | Pre-install |
| Mobile awareness | Cross-device | Remote triggering of desktop scope from personal phone | Low — audit scope once | Managed devices |
Cowork safety is mostly about scope, not heroics
Most Cowork incidents trace back to broad scope at session start, not to clever attacks mid-session. The three pre-session practices — narrow file access, restrict web access, vet what you install — do more than any in-session vigilance can. The in-session practices (watch tasks not commands, leave per-step approval on by default) are the second layer. Reporting is the third. Each layer matters less if the layer above it was tight.