The AI security gap that no one is willing to acknowledge is already present.
On March 31, 2026, Anthropic inadvertently released the complete source code of Claude Code to the public npm registry. This included approximately 512,000 lines of TypeScript across 1,906 files, featuring 44 hidden feature flags and references to an unreleased model known as Mythos, which remained publicly accessible on a Cloudflare storage bucket until a security researcher discovered it and shared the link on X. Within hours, the codebase was replicated on GitHub, garnering thousands of stars before Anthropic could implement DMCA takedowns. Anthropic attributed the incident to a human error in packaging, an explanation that, while accurate, somewhat misses the larger issue.
By revealing the blueprints of Claude Code, Anthropic provided a guide for anyone looking to create malicious repositories designed to deceive Claude Code into executing background commands or extracting data before a user encounters a trust prompt. The logic for permission enforcement, the architecture for sandboxing, and the precise orchestration mechanics governing how the agent verifies its permitted actions are now permanently available in countless forked repositories that will never be fully addressed by DMCA notices. The leak’s implications for AI security are more troubling than the leak itself.
One Side Is Advancing More Rapidly
The traditional perspective of AI in cybersecurity views it as a balanced arms race, where offense and defense evolve at similar rates. However, this viewpoint does not hold when considering the specifics of the March incident or the experiences described by security teams in their daily operations.
The exposed permission logic from the Claude Code leak enhances the ability of attackers to silently seize control of devices if they know where to look. In contrast, defenders are working to integrate AI into their current security architectures and ensure it does not produce false positives before it can be effectively utilized. These two timelines cannot be compared.
Tim Burke, who has managed security operations for over three decades at Quest Technology Management, clearly highlights this asymmetry: “Attackers received the complete blueprint of how an agentic AI checks permissions and manages credentials without needing to reverse-engineer anything,” he notes. “This means attackers can use AI that operates more swiftly than most detection systems were designed to cope with, while security teams are still learning how to implement AI tools without adding to the workload of already strained SOCs.”
Earlier this month, Google’s Threat Intelligence Group identified the first confirmed zero-day exploit entirely created with AI support and halted a planned mass exploitation attempt before it could occur, which is the optimistic side of this narrative. However, most organizations combating these capabilities are not on Google’s level, and their detection systems were not designed for the possibilities now present.
“Most organizations still rely on detection systems that were meant to identify human attackers who systematically navigate networks over days or weeks,” says Burke. “AI has compressed those timelines to mere hours and, in some cases, minutes, which means the interval between an intrusion and damage is now shorter than the time it typically takes SOCs to investigate a single alert.”
The Missing Alert
Beneath the issue of speed lies a more foundational problem. Security platforms are designed to identify behavioral anomalies, essentially flagging what appears to be malicious activity based on observable behavior rather than the motivations behind it. They currently cannot discern whether an attack was launched by a human or an AI agent acting independently. No current platform effectively highlights that difference.
The vulnerability revealed in the Claude Code leak exemplifies this challenge: a malicious file can instruct the AI to create a command pipeline that mimics a legitimate build process, triggering actions that circumvent the permission system without raising any conventional SIEM alerts.
“AI agents can be manipulated via tool descriptions and prompts in ways that evade traditional access controls without ever triggering an authentication failure or alerting your SIEM,” Burke explains. “This means detection must begin to monitor what the agent believed it was doing and the reasoning behind its decisions, rather than simply flagging policy breaches afterward.”
The references to Mythos in the leaked files add an additional layer that has not garnered much focus. What was disclosed was not only the current tool but also the architectural trajectory that agentic AI is pursuing, which includes enhanced reasoning abilities and a deeper integration of native tool use. Security teams are designing defenses against current capabilities of these systems, while the leaked roadmap points to something significantly more advanced.
“Currently, the vast majority of platforms cannot differentiate between AI and human sources,” Burke states, “leaving security teams defending blindly against a whole category of threats they cannot see."
The Anthropic leak originated from a misconfigured debug file. Organizations now assessing whether their security infrastructure can detect actions an AI agent believed it was authorized to take are grappling with an issue that predates March 31 and will remain long after the DMCA notices are processed.
As of now, there is no clear resolution to this problem.
Other articles
The AI security gap that no one is willing to acknowledge is already present.
The leak of the Anthropic Claude Code source code uncovered more than just a packaging mistake; it highlighted the significant lead attackers have with AI compared to the challenges defenders face in keeping up.
