Anthropic, Google, and Microsoft offered bug bounties for AI agent vulnerabilities but chose to remain silent about the issues.
In summary, security researcher Aonan Guan exploited AI agents from Anthropic, Google, and Microsoft through prompt injection attacks on their GitHub Actions integrations, successfully stealing API keys and tokens in each instance. While all three companies discreetly paid bug bounties—$100 from Anthropic, $500 from GitHub, and an undisclosed sum from Google—they did not release public advisories or assign CVEs, leaving users of older versions unaware of the vulnerabilities.
The issues, uncovered by Aonan Guan over several months, impact AI tools that work with GitHub Actions: Anthropic’s Claude Code Security Review, Google’s Gemini CLI Action, and GitHub’s Copilot Agent. Each tool processes GitHub data, including pull request titles, issue texts, and comments, using it as context for task execution. However, they fail to consistently differentiate between genuine content and injected commands.
**Mechanics of the Attacks**
The main strategy utilized was indirect prompt injection. Instead of attacking the AI models outright, Guan inserted malicious commands in areas that the agents implicitly trusted, such as PR titles and issue comments. Once the agents processed this content in their workflows, they executed the commands as if they were legitimate.
In the case of Anthropic’s Claude Code Security Review, which checks pull requests for vulnerabilities, Guan manipulated a PR title to include a prompt injection payload. Claude executed the hidden commands and revealed sensitive credentials in its JSON response, which was then posted as a comment on the PR. This attack could lead to the exposure of the Anthropic API key, GitHub access tokens, and other secrets within the GitHub Actions runner environment.
The Gemini attack adhered to a similar pattern; Guan added a fictitious “trusted content section” following valid content in a GitHub issue. This approach bypassed Gemini’s safety protocols, leading the agent to mistakenly publish its API key as a comment. Google’s Gemini CLI Action treated the injected text as credible.
The Copilot attack was more discreet. Guan concealed malicious commands within an HTML comment in a GitHub issue, rendering the payload invisible in the rendered Markdown as seen by humans but visible to the AI agent interpreting the raw text. When a developer assigned the issue to the Copilot Agent, the bot executed the hidden commands without hesitation.
The reactions of the companies following these findings were telling. Anthropic received Guan's report on its HackerOne bug bounty platform in October 2025. After confirming that the method could also compromise more sensitive information like GitHub tokens, Anthropic paid a $100 bounty in November, raising the critical severity rating from 9.3 to 9.4, and updated its documentation's “security considerations” section without issuing a public advisory or assigning a CVE.
GitHub initially regarded the Copilot finding as a “known issue” that could not be replicated, but eventually awarded a $500 bounty in March. Google provided an undisclosed payment for the Gemini vulnerability, yet none of the three companies assigned CVEs or published warnings to inform users of vulnerable versions.
For Guan, the core issue lies in the fact that users of older versions of these AI agent integrations may remain unaware of their exposure. Without a CVE, vulnerability scanners won't highlight the problem, and without an advisory, security teams lack the necessary reference to monitor it.
**A Systemic Issue Rather Than a Singular Bug**
The attacks leverage a fundamental flaw in how AI agents interpret context. Large language models struggle to reliably distinguish between data and instructions. When an agent processes a GitHub issue, it assumes the text serves as input for reasoning; however, a cleverly devised prompt injection can turn that input into a command. Any data source that feeds into an AI agent’s reasoning—be it an email, calendar invite, Slack message, or code comment—poses a potential attack vector.
This concern is not merely theoretical. In January 2026, researchers from Miggo Security showed that Google Gemini could be exploited using calendar invitations embedded with hidden commands. Shortly thereafter, the “Reprompt” attack against Microsoft Copilot revealed that injected prompts could hijack entire user sessions. Further, Anthropic’s Git MCP server was found to have three CVEs that allowed attackers to implant backdoors through the repositories it processed. A comprehensive analysis of 78 studies published in January confirmed that all tested coding agents, including Claude Code, GitHub Copilot, and Cursor, were vulnerable to prompt injection, with adaptive attack success rates surpassing 85%.
The issue is exacerbated by supply chain factors. A security audit of nearly 4,000 agent skills on the ClawHub marketplace identified that over one-third contained at least one security vulnerability, with 13.4% classified as critical. When AI agents incorporate third-party tools and data sources with the same level of trust they apply to their own commands, a single compromised element can affect the entire development pipeline.
**The Disclosure Gap**
The vendors' hesitance to issue advisories highlights an uncomfortable truth: no
Other articles
Anthropic, Google, and Microsoft offered bug bounties for AI agent vulnerabilities but chose to remain silent about the issues.
Researchers exploited prompt injection to take control of Claude, Gemini, and Copilot AI agents in order to obtain API keys and tokens. Each of the three companies offered bounties but did not make a public announcement.
