The question that AI providers wish VPs of Engineering would never pose

      The adoption of AI coding is skyrocketing. However, many engineering leaders continue to focus on usage metrics rather than outcomes, creating a significant blind spot that can be costly. There's a crucial question that the AI industry prefers you not to ask. Not OpenAI, nor Anthropic, nor Google, nor the various startups selling AI coding tools to your engineering teams. The question is clear: how much of the code generated by your AI agents actually gets deployed into production?

      It’s not about the volume of code generated, the number of prompts executed, or active users. It's about how much code makes it through code review, passes continuous integration, is merged, deployed, and ultimately serves customers. Most engineering leaders lack an answer to this, and AI providers have no motivation to help them discover it. There’s significant expenditure, yet little transparency.

      As per the Stanford AI Spend Index, the average company now invests $86 per developer monthly on AI coding tools, based on data from 140 firms and over 113,000 developers. The top quartile spends upwards of $195, with some companies exceeding $28,000 per developer each month. Anthropic recently surpassed $30 billion in annualized revenue, a steep rise from $9 billion just four months prior. According to SemiAnalysis, 4% of all public GitHub commits now come from Claude Code, with projections to exceed 20% by the end of the year. Linear's CEO stated that issue tracking has become obsolete.

      Coding agents are now part of over 75% of Linear’s enterprise workspaces. Money and code are flowing, yet there’s a lack of tracking on how much of that code ends up being shipped.

      The unspoken incentive issue lies here. AI providers charge based on tokens. Each time your engineers utilize tokens, the AI provider profits. They earn when tokens are consumed, not when the generated code passes review, is merged, deployed, or functions effectively in production.

      This creates a fundamental misalignment. If a developer prompts an AI agent ten times to create a function that is ultimately rewritten by a human, it costs your company tenfold compared to a developer who achieves it on the first try. The provider benefits tenfold more from the former, while the latter is far more valuable to your organization.

      Currently, many engineering leaders find it challenging to differentiate the two scenarios. They only see a single line item on the AI bill, unaware of which tokens resulted in production code and which became obsolete.

      This isn't a conspiracy but rather a fundamental incentive problem that the VP of Engineering must address, as providers have no reason to rectify it on their behalf.

      We have encountered similar situations in the past. When cloud computing first emerged, companies quickly adopted AWS and Azure, spending heavily with the expectation of improved efficiency. The reality, however, was significant waste. It took years for FinOps to emerge as companies recognized they were overspending by 30-40% on cloud infrastructure due to the lack of measurement of actual usage.

      AI spend is mirroring this same trajectory, but with a faster growth rate and a wider measurement gap. Cloud providers eventually had to incorporate cost optimization tools due to customer demand.

      A similar shift is imminent in AI. Engineering leaders who prioritize measurement will optimize more swiftly, negotiate more effectively, and discern which tools to retain or eliminate. Those who fail to act will continue to write checks, hoping the outcomes justify the investment.

      What’s missing is not more dashboards showcasing usage trends and seat occupancy; engineering leaders already have an abundance of that. What is essential is the capability to track AI-generated code from inception to production. This involves commit-level attribution that indicates which agent authored the code, the proportion of a commit that was AI-generated versus human-edited, whether it passed review or was rewritten, and if it deployed successfully or failed.

      By linking AI expenditure to production results, organizations can begin to answer critical questions. It becomes possible to identify which teams are truly leveraging AI agents and which are merely consuming tokens without significant returns. Furthermore, it clarifies which vendors produce deployable code versus those that create additional workload for reviewers and whether rising AI costs stem from successful adoption or costly failures.

      At Waydev, we dedicated the past year to building a solution to this issue. With nearly a decade of experience measuring engineering behaviors for companies like Dropbox, American Express, and PwC, we adapted our measurement framework to accommodate the changes brought by AI.

      Our new platform monitors AI adoption, impact, and ROI across the entire software development lifecycle, linking organizational spending on AI agents to what ultimately makes it to production.

      Usage does not equate to value. The AI industry is urging engineering leaders to believe that increased usage signals greater value, but these are not synonymous. Adoption shouldn't be confused with value, and usage does not denote impact. Simply put, tokens consumed do not equal code shipped.

      A team generating 10,000 lines of AI code weekly but only shipping 2,000 to production is not

Other articles

Siemens and Humanoid launched a humanoid robot powered by Nvidia. Siemens and UK startup Humanoid have deployed a humanoid robot powered by Nvidia at a German electronics factory, successfully handling totes autonomously for over 8 hours.

Revolut's initial public offering is set to take place in two years, and it will occur in the United States. Revolut's CEO Nik Storonsky has stated that the fintech company's IPO is now expected to be two years away and will take place in the US, refining his earlier guidance from December of 'two to three years.'

ASIC collaborates with international regulators to oversee Anthropic's Mythos AI. ASIC has announced that it is keeping an eye on Anthropic's Mythos AI model for potential banking risks, alongside the Bank of England, the Federal Reserve, the European Central Bank, and the Treasury.

NEXTDC in Australia unveils a capital plan worth A$2.2 billion. NEXTDC unveils a capital plan worth A$2.2 billion, which includes a A$1.5 billion equity raise aimed at fast-tracking its S4 Sydney data center following a 60% increase in contracted capacity.

Siemens and Humanoid have launched a humanoid robot that is powered by Nvidia technology. Siemens and UK-based startup Humanoid launched a humanoid robot powered by Nvidia at a German electronics factory, successfully handling tote operations autonomously for over 8 hours.

AirTrunk expands into India through the acquisition of Lumina CloudInfra. AirTrunk has acquired Lumina CloudInfra, which is backed by Blackstone, securing a data center pipeline in India of 600MW valued at up to $5 billion.

The question that AI providers wish VPs of Engineering would never pose

The adoption of AI in coding is rapidly increasing. However, many engineering leaders continue to focus on measuring usage rather than evaluating outcomes. For more details, check out the article.