Developers can no longer operate without AI. Research indicates that it could potentially be hindering their performance.

Developers can no longer operate without AI. Research indicates that it could potentially be hindering their performance.

      Developers are increasingly unwilling to code without AI, but research indicates this reliance may hinder their productivity. In February 2026, the AI research lab METR aimed to replicate a pivotal study on developers' task completion times with and without AI, but faced refusal as developers were unwilling to engage without AI, even for a limited number of tasks in a study.

      The original 2025 research yielded unexpected findings; developers claimed AI enhanced their productivity, while the data suggested otherwise, revealing that AI actually slowed them down due to the time spent on error correction, AI management, and waiting for task completion.

      When METR could not replicate the study, it resorted to a survey in May where developers self-reported their productivity value with AI, claiming it doubled their worth to organizations. However, emerging evidence from various sources indicates this perception may be misguided.

      This week, Amazon discontinued its internal token leaderboard, Kirorank, following reports that employees were exploiting AI agents excessively, leading to inflated costs. The leaderboard demonstrated that increased AI usage does not guarantee enhanced productivity.

      Uber, too, exhausted its entire 2026 AI budget within just the first four months, according to The Information. COO Andrew Macdonald mentioned on a podcast that this expenditure did not result in any measurable gains in project output or productivity. This scenario illustrates how two of the most technically advanced companies spent significantly on AI coding tools without seeing a return.

      This pattern is termed “tokenmaxxing,” where token consumption is used as a productivity measure. It has emerged as a notable corporate trend in 2026, though it seems to be waning. The experiences of Amazon and Uber highlight that quantifying AI uptake based solely on usage, rather than the quality of the output, can lead to misguided incentives.

      Salesforce anticipates spending $300 million on Anthropic tokens this year. CEO Marc Benioff has proposed an “intermediary layer” to intelligently allocate tokens between advanced and more economical models. This suggestion subtly acknowledges that not all tokens yield value, underscoring the need to align spending with task difficulty.

      A deeper issue lies with code quality. Programmer and author James Shore emphasized in a widely shared blog post that generating code faster without reducing maintenance expenses is perilous. He cautioned, “You write code twice as quickly now? Better hope you’ve halved your maintenance costs. Otherwise, you’re in trouble. You’re swapping a temporary speed boost for enduring debt.”

      Supporting data reinforces this caution. Entelligence AI, a reliability engineering startup, reported that companies allocate 44% of their tokens to fixing AI-generated bugs. CodeRabbit, a code-review tool, discovered that AI creates 1.7 times more issues than human-produced code. While both firms offer AI code review tools, their statistics, while perhaps self-serving, are not necessarily inaccurate.

      A report from independent researchers at Singapore Management University in April confirmed similar findings, stating that “AI-generated code can introduce long-term maintenance costs into actual software projects.” While code delivery is quicker, bugs surface later, leading to compounded maintenance debt.

      Engineering leaders are confronting a crucial question: Are the productivity benefits of AI coding tools genuine or merely perceived? If developers are unwilling to code without AI, yet the AI is generating more bugs than it resolves, the overall impact may be detrimental. The reliance on AI appears to outpace the supporting evidence.

      Cognition founder Scott Wu, creator of the AI coding agent Devin, acknowledges that its effectiveness ranges between that of a junior and mid-level developer, meaning it cannot be used without oversight. The SMU researchers advise treating AI-generated output like that from a junior coder: reviewing all output, maintaining robust quality assurance systems, and ensuring human oversight of architectural and security design.

      The job market reflects this contradiction, with companies hiring “vibe coders” and forward deployed engineers at record rates, while simultaneously realizing that the tools those positions rely on may not produce the expected quality enhancements. The AI coding market is expanding more rapidly than the evidence supporting its efficacy.

      Developers are unlikely to revert to coding without AI. That possibility has passed. The crucial issue now is whether the industry will establish the necessary quality assurance frameworks, routing layers, and review processes to guarantee that quicker code production does not translate into faster accumulation of technical debt. Currently, the answer leans toward no. Developers are enthusiastic about the tools, yet those tools may not reciprocate their affection.

Other articles

Acer's latest Swift Air 14 aims to compete with the MacBook Neo, but it might be at a disadvantage. Acer's latest Swift Air 14 aims to compete with the MacBook Neo, but it might be at a disadvantage. Acer has recently revealed its response to the MacBook Neo. The Swift Air 14 features Intel’s latest Wildcat Lake processors, a 70Wh battery, a 120Hz display, and additional ports, yet Apple’s $599 laptop might still prove difficult to surpass. The final trailer for season 3 of House of the Dragon has revealed its most awaited moment. The final trailer for season 3 of House of the Dragon has revealed its most awaited moment. The concluding trailer for House of the Dragon season 3 shows Rhaenyra has seized control of King's Landing, but governing proves to be much more challenging than achieving victory, with schemes, treachery, and full-scale war threatening her reign. Berlin's Stark is securing €300M at a valuation of €2.5B. The kamikaze drone manufacturer was established 18 months ago. Berlin's Stark is securing €300M at a valuation of €2.5B. The kamikaze drone manufacturer was established 18 months ago. The German strike drone startup increased its valuation to €2.5B within a few months. Its flagship drone, the Virtus, autonomously recognizes and eliminates targets upon impact. HeartFocus Link integrates AI cardiac ultrasound into any cart system. French medtech DESKi has introduced HeartFocus Link, which enables AI-assisted cardiac imaging on any cart-based ultrasound through a tablet and HDMI connection, addressing the worldwide shortage of sonographers. Former DeepMind researchers have secured $50 million to develop AI that determines which scientific questions are worth exploring. Former DeepMind researchers have secured $50 million to develop AI that determines which scientific questions are worth exploring. The London-based AI lab Inherent has come out of stealth mode, with Index Ventures and Radical Ventures as co-leaders. Nvidia Ventures also took part, and Matt Clifford serves as an advisor. HeartFocus Link integrates AI cardiac ultrasound with any cart system. French medtech company DESKi has introduced HeartFocus Link, which integrates AI-guided cardiac imaging into any cart-based ultrasound system through a tablet and HDMI connection, addressing the worldwide shortage of sonographers.

Developers can no longer operate without AI. Research indicates that it could potentially be hindering their performance.

METR was unable to replicate its AI coding research as developers declined to participate without AI support. Amazon discontinued its token leaderboard. Uber exhausted its AI budget in just four months.