Claude generates 80% of its code and advocates for an AI pause.

Claude generates 80% of its code and advocates for an AI pause.

      TL;DR: Anthropic has announced that Claude now generates over 80% of its production code, leading engineers to produce eight times more code per quarter than in 2024. The company's new paper from the Anthropic Institute outlines the route to recursive self-improvement and calls for a verifiable global pause mechanism.

      One Anthropic engineer has not written any code in five months—not due to a lack of work, but because Claude handles it now. As of May 2026, more than 80% of the code added to Anthropic’s production codebase has been created by Claude, a significant jump from the low single digits at the launch of Claude Code in February 2025.

      This statistic, shared in a recent Anthropic Institute paper titled “When AI builds itself,” isn’t the focus the company wants to highlight. The key point is the next step: AI capable of designing and training its own successors. Anthropic claims it hasn’t reached this point yet, but may be closer than many institutions realize.

      The shift in productivity is significant. In Q2 2026, the average Anthropic engineer merged eight times more code daily compared to 2024. An internal survey of 130 research staff indicated that the median respondent estimated approximately four times the output with the latest model, Mythos Preview, than without AI.

      For the most challenging and open-ended engineering tasks, Claude’s success rate rose to 76% by May 2026, an increase of 50 percentage points in just six months. Anthropic cites a specific instance where a routine update caused crashes in numerous training jobs; an engineer directed Claude to the live incident with minimal context. Claude managed to identify a rare debugging issue, recreate the crash, and validate a solution in roughly two hours, a process that would typically take two to three days.

      The quality of the code is also improving. Anthropic’s team noted that in late 2025, Claude's code was "somewhat worse" than that written by humans. Currently, it is considered roughly on par, and is expected to surpass human quality within a year. An automated reviewer by Claude now evaluates every proposed change to Anthropic's codebase prior to merging. A review concluded that it would have identified about one-third of the bugs associated with past claude.ai issues before they entered production.

      Transitioning from coding to research poses a bigger challenge. The question remains whether Claude can conduct research involving open-ended scientific reasoning that propels AI advancement.

      The evidence for research capabilities is still early but compelling. In April 2026, Anthropic showcased Claude conducting a comprehensive AI safety research project independently. Nine parallel agents tackled a problem, generating hypotheses, running experiments, sharing results, and iterating collaboratively. Over 800 hours of work at approximately $18,000 in computing costs allowed the agents to recover 97% of the performance gap in the task, while two human researchers took a week to recover just 23%.

      Another internal test assessed Claude’s ability to choose better “next steps” compared to a human researcher during complex research sessions. In November 2025, Claude agreed with human decisions 51% of the time, which increased to 64% by April 2026. The daily tasks in research often hinge on such next-step choices. If this trend continues, the distinction between AI acting as an assistant and as a researcher will diminish quickly.

      Anthropic’s internal data reflects a wider trend monitored by METR, a non-profit organization that benchmarks AI capabilities. The length of tasks that AI can reliably complete autonomously has been doubling approximately every four months, quickening from an earlier timeframe of every seven months.

      In March 2024, Claude Opus 3 could manage tasks taking humans about four minutes. By early 2025, Claude Sonnet 3.7 handled tasks lasting an hour and a half. Currently, Claude Opus 4.6 can take on tasks that extend over 12 hours, while METR indicates that Mythos Preview can sustain work for at least 16 hours, nearing the limit of current benchmarking metrics. If this pattern continues, it is possible that tasks requiring several days of skilled human work will soon be achievable, with tasks spanning weeks anticipated in 2027.

      The resulting impact is becoming apparent. In 2025, GitHub, the primary platform for software development globally, recorded around one billion code commits. However, by mid-2026, it was processing 275 million commits weekly, leading to an anticipated total of 14 billion for the year. Claude Code is responsible for 4.5% of all public commits on GitHub, contributing 2.6 million each week.

      GitHub’s COO stated that the company is “pushing incredibly hard” to increase capacity to keep pace. Within Anthropic, the bottleneck has shifted; as Claude produces more code, human code reviews have become the limiting factor. The company has encountered a classic instance of

Other articles

AI token prices dropped by 98%, while expenses for enterprises tripled. AI token prices dropped by 98%, while expenses for enterprises tripled. Uber, Microsoft, and Priceline are reacting swiftly as AI token legislation has surged. The Linux Foundation is establishing a Tokenomics Foundation aimed at instilling financial discipline in AI expenditures. Hinge is integrating AI into the dating scene, and its CEO believes that Generation Z requires chatbots to facilitate conversations. Hinge is integrating AI into the dating scene, and its CEO believes that Generation Z requires chatbots to facilitate conversations. According to Hinge CEO Jackie Jantos, Gen Z interacts in person for over two hours a day less compared to their peers two decades ago, as reported by the BBC. To address this, the app is implementing AI-driven profile coaching and conversation prompts, but researchers argue that dating apps have already exaggerated the potential of technology in combating loneliness. Scams related to the FIFA World Cup 2026 are currently active, including fraudulent websites and malware. Scams related to the FIFA World Cup 2026 are currently active, including fraudulent websites and malware. More than 4,300 counterfeit FIFA websites, banking malware within streaming applications, and compromised login credentials are currently aiming at fans of World Cup 2026, with the FBI and experts alerting about potential losses in the billions. Die Ernennung von Von der Leyens KI-Beauftragtem wird wegen möglicher Interessenkonflikte kritisiert. Die Ernennung von Von der Leyens KI-Beauftragtem wird wegen möglicher Interessenkonflikte kritisiert. The EU designated Siemens chairman Jim Hagemann Snabe as an AI envoy shortly after the company contributed to the modification of the AI Act. Detractors argue that this grants policy power to industry lobbyists. You might not be looking for it, but a flood of Google Books is on its way from leading brands. You might not be looking for it, but a flood of Google Books is on its way from leading brands. Googlebook may launch with as many as eight devices this fall, offering customers a variety of options that include Intel, Snapdragon, and MediaTek hardware. However, important information regarding pricing, specifications, and availability is still unverified. AirTrunk aims to invest $30 billion in a 5GW data center initiative in India by 2030. AirTrunk aims to invest $30 billion in a 5GW data center initiative in India by 2030. AirTrunk, supported by Blackstone, intends to invest $30 billion in India by 2030 to establish 5GW of data center capacity, shortly after its entry into the market via the acquisition of Lumina CloudInfra.

Claude generates 80% of its code and advocates for an AI pause.

Claude currently generates 80% of Anthropic's production code. The organization's recent paper outlines a strategy for recursive self-improvement and advocates for the implementation of a global pause mechanism.