A counterfeit AI agent skill successfully passed all security checks and is said to have reached 26,000 agents.
Security company AIR successfully created a fake AI agent skill that bypassed all major scanners and claims it reached around 26,000 agents by altering an external URL once the scan was cleared. AIR developed a deceptive skill called brand-landingpage, designed to generate a landing page using Google’s Stitch tool, targeting non-technical users. To enhance its credibility, the firm focused on two trust signals regarded as indicators of safety: GitHub stars and positive scanner results.
To acquire stars, AIR submitted a pull request to a repository for a skill marketplace that had approximately 36,000 stars and 156 skills. After a few days, this pull request was merged, allowing the skill to inherit the star count. They also ran an Instagram advertisement aimed at marketers, sales professionals, and designers, leading to installations and usage of the skill.
The scanners that AIR tested evaluate the submitted package, which includes the skill definition file and accompanying components. This includes tools from companies like Cisco and NVIDIA as well as those integrated into major skill registries. AIR's skill did not contain any malicious setup instructions but directed the agent to install the “Stitch SDK” from an externally controlled link, rather than the legitimate Google domain. Initially, the link directed users to the actual Stitch documentation, which allowed the scanners to mark it as safe. However, after the skill gained traction, AIR modified the linked page to instruct the agent to download and execute a script.
This technique is not new. Just three weeks prior to AIR’s findings, Trail of Bits managed to circumvent ClawHub’s malicious-skill detector, Cisco's scanner, and all three scanners available within the main skill registries. The takeaway was that scanners assess a fixed package, while attackers can continuously alter the payload to achieve a pass.
Real-world campaigns have utilized this method for several months, submitting clean skills while hosting the payload on a site that the agent accesses only during installation. The issue is systematic; scanning occurs only once, but the content a skill directs the agent to can be modified at any subsequent time. Documentation from Anthropic cautions that skills fetching external URLs present risks precisely for this reason, as the content may change after the skill undergoes vetting.
Recent research this year found that seven major scanners agreed on fewer than one in five hundred of their collective flags since each scanner evaluates a skill independently, unaware of external links and any alterations after review.
The figures announced by AIR stem from their own data and should be viewed skeptically. The firm is introducing a managed skill marketplace and concludes its report promoting it, leading to the possibility that the 26,000 figure, along with details about corporate accounts and claims of potential complete control over each agent, remains unverified. However, the method itself is sound: the identified scanners assess only the submitted package, the oversight regarding external links is genuine and has been documented, and the trust signals utilized by AIR, such as stars and favorable scans, are indeed perceived as indicators of safety within the ecosystem.
The experiment illustrates vulnerabilities in trust signals related to agent skills—borrowed stars, a snapshot-based scan, and a link that can be modified after the initial assessment. Regardless of whether the actual count is 26,000 or a smaller number, it highlights a security gap that remains unaddressed by defenders.
For security teams, the key takeaway is to consider skills as software rather than text, scrutinizing what a skill points to, not just its internal content. New skills should be routed through a controlled source, re-evaluated whenever changes occur, version-controlled, and agents held to the principle of least privilege.
Other articles
A counterfeit AI agent skill successfully passed all security checks and is said to have reached 26,000 agents.
The security company AIR created an innocuous fake skill that successfully bypassed Cisco and NVIDIA scanners, claiming it reached 26,000 agents and highlighting a weakness in the skill vetting process.
