Claude generates 80% of its code and advocates for an AI pause.
TL;DR: Anthropic has announced that Claude now generates over 80% of its production code, leading engineers to produce eight times more code per quarter than in 2024. The company's new paper from the Anthropic Institute outlines the route to recursive self-improvement and calls for a verifiable global pause mechanism.
One Anthropic engineer has not written any code in five months—not due to a lack of work, but because Claude handles it now. As of May 2026, more than 80% of the code added to Anthropic’s production codebase has been created by Claude, a significant jump from the low single digits at the launch of Claude Code in February 2025.
This statistic, shared in a recent Anthropic Institute paper titled “When AI builds itself,” isn’t the focus the company wants to highlight. The key point is the next step: AI capable of designing and training its own successors. Anthropic claims it hasn’t reached this point yet, but may be closer than many institutions realize.
The shift in productivity is significant. In Q2 2026, the average Anthropic engineer merged eight times more code daily compared to 2024. An internal survey of 130 research staff indicated that the median respondent estimated approximately four times the output with the latest model, Mythos Preview, than without AI.
For the most challenging and open-ended engineering tasks, Claude’s success rate rose to 76% by May 2026, an increase of 50 percentage points in just six months. Anthropic cites a specific instance where a routine update caused crashes in numerous training jobs; an engineer directed Claude to the live incident with minimal context. Claude managed to identify a rare debugging issue, recreate the crash, and validate a solution in roughly two hours, a process that would typically take two to three days.
The quality of the code is also improving. Anthropic’s team noted that in late 2025, Claude's code was "somewhat worse" than that written by humans. Currently, it is considered roughly on par, and is expected to surpass human quality within a year. An automated reviewer by Claude now evaluates every proposed change to Anthropic's codebase prior to merging. A review concluded that it would have identified about one-third of the bugs associated with past claude.ai issues before they entered production.
Transitioning from coding to research poses a bigger challenge. The question remains whether Claude can conduct research involving open-ended scientific reasoning that propels AI advancement.
The evidence for research capabilities is still early but compelling. In April 2026, Anthropic showcased Claude conducting a comprehensive AI safety research project independently. Nine parallel agents tackled a problem, generating hypotheses, running experiments, sharing results, and iterating collaboratively. Over 800 hours of work at approximately $18,000 in computing costs allowed the agents to recover 97% of the performance gap in the task, while two human researchers took a week to recover just 23%.
Another internal test assessed Claude’s ability to choose better “next steps” compared to a human researcher during complex research sessions. In November 2025, Claude agreed with human decisions 51% of the time, which increased to 64% by April 2026. The daily tasks in research often hinge on such next-step choices. If this trend continues, the distinction between AI acting as an assistant and as a researcher will diminish quickly.
Anthropic’s internal data reflects a wider trend monitored by METR, a non-profit organization that benchmarks AI capabilities. The length of tasks that AI can reliably complete autonomously has been doubling approximately every four months, quickening from an earlier timeframe of every seven months.
In March 2024, Claude Opus 3 could manage tasks taking humans about four minutes. By early 2025, Claude Sonnet 3.7 handled tasks lasting an hour and a half. Currently, Claude Opus 4.6 can take on tasks that extend over 12 hours, while METR indicates that Mythos Preview can sustain work for at least 16 hours, nearing the limit of current benchmarking metrics. If this pattern continues, it is possible that tasks requiring several days of skilled human work will soon be achievable, with tasks spanning weeks anticipated in 2027.
The resulting impact is becoming apparent. In 2025, GitHub, the primary platform for software development globally, recorded around one billion code commits. However, by mid-2026, it was processing 275 million commits weekly, leading to an anticipated total of 14 billion for the year. Claude Code is responsible for 4.5% of all public commits on GitHub, contributing 2.6 million each week.
GitHub’s COO stated that the company is “pushing incredibly hard” to increase capacity to keep pace. Within Anthropic, the bottleneck has shifted; as Claude produces more code, human code reviews have become the limiting factor. The company has encountered a classic instance of
Other articles
Claude generates 80% of its code and advocates for an AI pause.
Claude currently generates 80% of Anthropic's production code. The organization's recent paper outlines a strategy for recursive self-improvement and advocates for the implementation of a global pause mechanism.
