The promise of AI coding agents is autonomy. Tools like Claude Code, Cursor and GitHub Copilot don’t just suggest snippets; they navigate repositories, execute commands, and manage complex workflows. But this autonomy comes with a dangerous trade-off: a new attack vector that turns a simple “trust” click into a full system compromise.
Security researchers at Adversa.AI have uncovered a critical vulnerability they call “TrustFall.” This attack demonstrates how hackers can manipulate AI-powered command-line interface (CLI) tools to execute malicious code with full system privileges. Because these agents are often integrated directly into development environments and CI/CD pipelines, TrustFall isn’t just a local risk—it’s a potential catalyst for global software supply chain attacks.
- The Trigger: Attackers place malicious configuration files in public GitHub repositories.
- The Hook: Users are prompted to “trust” the project, a step often performed reflexively.
- The Payload: The Model Context Protocol (MCP) is exploited to launch unauthorized OS processes.
- The Scope: Affects multiple AI agents, including Claude Code, Gemini CLI, Cursor CLI, and Copilot CLI.
How TrustFall Works: The Path to Remote Code Execution
The TrustFall attack relies on a combination of technical loopholes and human psychology. The process is deceptively simple, requiring very little effort from the attacker to achieve high-impact results.
First, an attacker creates an attractive public GitHub repository containing malicious code. When a developer uses an AI agent to scan or work within that repository, the agent identifies the project and asks the user for permission to proceed. The prompt is typically a standard safety check: “Quick safety check: Is this a project you created or one you trust?”
In the fast-paced environment of modern engineering, these prompts often become “click-through” obstacles. When a user selects “trust” (which is often the default setting), they aren’t just giving the AI permission to read files—they’re potentially opening the door to a full system takeover.
The Technical Exploit: Abusing the Model Context Protocol (MCP)
The core of the TrustFall vulnerability lies in the Model Context Protocol (MCP). MCP is designed to allow AI agents to connect to external data sources and tools. However, TrustFall weaponizes this connectivity through configuration files like .claude/settings.json or .mcp.json.
Attackers embed a specific parameter—enableAllProjectMcpServers—within these files. When the user marks the folder as trusted, the AI agent automatically approves and starts every MCP server defined in the project. These servers aren’t sandboxed; they run as operating system processes with the full privileges of the developer.
This allows the attacker to:
- Establish a persistent connection to a Command-and-Control (C2) server.
- Execute arbitrary code directly on the host machine.
- Embed payloads within the JSON configuration itself, bypassing static scanners that only look for standalone script files.
A Systemic Issue Across the AI Ecosystem
While much of the initial focus has been on Claude Code, Adversa.AI’s research reveals that this isn’t an isolated bug. It’s a shared industry convention. Testing showed that Gemini CLI, Cursor CLI, and Copilot CLI all exhibit the same behavior. All four tools allow malicious repositories to automatically approve and start MCP servers once the trust dialog is accepted.
This creates a massive vulnerability for high-value targets. Developers of widely used open-source tools are primary targets because they frequently clone unknown repositories to test features or integrate dependencies. If a developer’s machine is compromised, the attacker can steal environment variables, deploy keys, and signature certificates.
The Responsibility Gap: User Consent vs. Informed Consent
The discovery of TrustFall has sparked a debate over where security responsibility lies. When notified of the vulnerability, Anthropic suggested that the responsibility rests with the user; by clicking “Yes, I trust this folder,” the user has explicitly consented to the contents.
Security experts argue that this is a flawed premise. There is a vast difference between trusting a project’s code and unknowingly granting an AI agent permission to execute hidden OS-level processes via a JSON config file. This “uninformed consent” fails as a security model because the user cannot realistically audit every hidden configuration file in a complex directory structure before clicking “trust.”
How to Protect Your Development Environment
Until AI agent providers implement stricter safeguards—such as blocking sensitive keys like enableAllProjectMcpServers within repositories—developers must take manual precautions:

- Restrict AI Agents in Pipelines: Never run AI agents on unverified branches. Only use them on branches where commits have been manually reviewed.
- Avoid Direct Scans of Unknown Repos: Do not use active AI assistants to scan repositories from untrusted sources.
- Treat “Trust” Prompts with Caution: Stop treating security dialogs as routine. If you didn’t create the project, assume the configuration files could be malicious.
Final Thoughts: The Future of Agentic Security
TrustFall is a wake-up call for the industry. As we move from “Chatbots” to “Agents” that can actually do things on our computers, the attack surface expands exponentially. The convenience of autonomy cannot come at the cost of basic system integrity. For AI agents to be truly viable in professional engineering, the industry must move toward a “Zero Trust” architecture where permissions are granular, sandboxed, and explicitly verified, rather than bundled into a single, reflexive click.
What is an AI Coding Agent?
Unlike a standard LLM, a coding agent can interact with your file system, run terminal commands, and manage git repositories autonomously.
What is the Model Context Protocol (MCP)?
MCP is an open standard that enables AI models to securely access data and tools from different sources, essentially acting as a bridge between the AI and your local environment.
Does this affect standard ChatGPT or Claude web interfaces?
No. TrustFall specifically targets CLI-based agents that have direct access to your local operating system and file system.