Claude completes 80% of its coding and requests a pause on AI development.

Claude completes 80% of its coding and requests a pause on AI development.

      TL;DR: Anthropic has announced that Claude is now responsible for over 80% of its production code, enabling engineers to deliver code at a rate eight times higher per quarter than in 2024. The latest paper from the Anthropic Institute outlines advancements toward recursive self-improvement and advocates for a verifiable global pause mechanism.

      One engineer at Anthropic hasn't written any code in five months—not due to a lack of work, but because Claude now handles it. As of May 2026, more than 80% of code integrated into Anthropic’s production codebase was generated by Claude, a significant increase from the low single digits since the launch of Claude Code in February 2025.

      This figure, highlighted in a new Anthropic Institute paper titled "When AI Builds Itself," is not the main focus for the company; instead, they want attention on the potential for AI to design and train its own successors. Anthropic admits they are not there yet but suggest they may be closer than most are ready to accept.

      The data driving this change reveals significant productivity improvements. In Q2 2026, the average Anthropic engineer merged eight times more code daily compared to 2024. An internal survey of 130 research staff indicated that the median output with Anthropic’s latest model, Mythos Preview, was approximately four times greater than without AI assistance.

      Claude's accuracy in tackling complex and open-ended engineering challenges improved to 76% in May 2026, a substantial increase of 50 percentage points in just six months. For instance, when a routine update caused numerous training jobs to crash, an engineer directed Claude to the live incident with minimal context and cluster access. Claude quickly identified a rare debugging flag, replicated the crash, and confirmed a solution in about two hours—a process that typically takes two to three days.

      The quality of code written by Claude is also improving. Anthropic employees noted that Claude's code was “somewhat worse” than that produced by humans in late 2025 but is now on par and expected to surpass human quality within the year. An automated reviewer powered by Claude assesses each proposed change to Anthropic’s codebase before it’s merged. A retrospective analysis indicated that this reviewer would have caught about one-third of the bugs leading to past issues before they reached production.

      Shifting from coding to research, Anthropic is exploring whether Claude can engage in research involving open-ended scientific reasoning. While evidence in this area is still emerging, it is compelling. In April 2026, the company showcased Claude conducting an open-ended AI safety research project autonomously. Nine agents tackled a problem, formulated hypotheses, conducted experiments, shared insights in a common forum, and refined their approaches. After more than 800 hours of cumulative work and approximately $18,000 in computing costs, they closed 97% of the performance gap, while two human researchers managed only 23% in a similar timeframe.

      Another internal experiment assessed whether Claude could make better "next step" choices than human researchers at crucial points during real research sessions. In November 2025, Claude matched human judgment 51% of the time, which rose to 64% by April 2026. Research largely consists of these ongoing decision-making sequences; if the trend continues, the distinction between AI as a helper and AI as a researcher may diminish rapidly.

      Anthropic's internal findings correspond with broader trends observed by METR, a non-profit organization that tracks AI capabilities. The duration of tasks that AI can reliably complete independently is reportedly doubling approximately every four months, a shift from the previous average of every seven months.

      In March 2024, Claude Opus 3 was capable of managing tasks that took humans about four minutes to complete. By early 2025, Claude Sonnet 3.7 could tackle hour-and-a-half tasks. Currently, Claude Opus 4.6 is adept at handling 12-hour tasks, and METR found that Mythos Preview could maintain work for at least 16 hours, the upper limit of current benchmarks. If this trend continues, tasks taking days of skilled human effort could soon be achievable, with week-long tasks potentially becoming possible by 2027.

      The consequences of these developments are already evident. GitHub, the main platform for software development globally, recorded approximately one billion code commits in all of 2025. By mid-2026, this number surged to processing 275 million commits weekly, with a projection of 14 billion for the year. Claude Code represents 4.5% of all public commits on GitHub, producing 2.6 million weekly.

      The COO of GitHub noted that the company is "pushing incredibly hard" on capacity to keep pace. Within Anthropic, the bottleneck has shifted: as Claude generates more code, the human code review process has become the limiting factor. The company claims it has encountered a classic example of Amdahl’s law, where increasing the speed of

Other articles

Japan faces the danger of becoming an "AI colony," cautions its digital minister. Japan faces the danger of becoming an "AI colony," cautions its digital minister. Digital Minister Hisashi Matsumoto cautioned that Japan might become an 'AI colony' if it lags behind, advocating for a bill aimed at simplifying data-use consent regulations. Russia prepares a compact version of Starlink and continues to shift its 2027 deadline. Russia prepares a compact version of Starlink and continues to shift its 2027 deadline. Russia's Bureau 1440 intends to launch commercial satellite internet in 2027 using its Rassvet constellation, which is a purposely smaller alternative to Starlink. Spirit AI surpasses Nvidia on the RoboArena robotics benchmark. Spirit AI surpasses Nvidia on the RoboArena robotics benchmark. Chinese startup Spirit AI has taken the lead on the RoboArena leaderboard co-developed by Nvidia, achieving a score of 1,924, compared to Nvidia's 1,881, as physical AI emerges as the next frontier in technology. Die Ernennung von Von der Leyens KI-Beauftragtem wird wegen möglicher Interessenkonflikte kritisiert. Die Ernennung von Von der Leyens KI-Beauftragtem wird wegen möglicher Interessenkonflikte kritisiert. The EU designated Siemens chairman Jim Hagemann Snabe as an AI envoy shortly after the company contributed to the modification of the AI Act. Detractors argue that this grants policy power to industry lobbyists. Chesky is establishing an AI lab, competing with Altman's OpenAI. Chesky is establishing an AI lab, competing with Altman's OpenAI. Airbnb CEO Brian Chesky is supporting a new AI laboratory dedicated to user interaction and design, indicating that the leading founders in Silicon Valley no longer rely on frontier labs to create what they require. Spirit AI surpasses Nvidia in the RoboArena robotics benchmark. Spirit AI surpasses Nvidia in the RoboArena robotics benchmark. Chinese startup Spirit AI has taken the lead on the RoboArena leaderboard co-developed by Nvidia, achieving a score of 1,924 compared to Nvidia's 1,881, marking the rise of physical AI as the new frontier in technology.

Claude completes 80% of its coding and requests a pause on AI development.

Claude currently writes 80% of the production code for Anthropic. The company's latest paper outlines a strategy for recursive self-improvement and advocates for a global pause mechanism.