The question that AI providers wish VPs of Engineering would never pose
The adoption of AI coding is skyrocketing. However, many engineering leaders continue to focus on usage metrics rather than outcomes, creating a significant blind spot that can be costly. There's a crucial question that the AI industry prefers you not to ask. Not OpenAI, nor Anthropic, nor Google, nor the various startups selling AI coding tools to your engineering teams. The question is clear: how much of the code generated by your AI agents actually gets deployed into production?
It’s not about the volume of code generated, the number of prompts executed, or active users. It's about how much code makes it through code review, passes continuous integration, is merged, deployed, and ultimately serves customers. Most engineering leaders lack an answer to this, and AI providers have no motivation to help them discover it. There’s significant expenditure, yet little transparency.
As per the Stanford AI Spend Index, the average company now invests $86 per developer monthly on AI coding tools, based on data from 140 firms and over 113,000 developers. The top quartile spends upwards of $195, with some companies exceeding $28,000 per developer each month. Anthropic recently surpassed $30 billion in annualized revenue, a steep rise from $9 billion just four months prior. According to SemiAnalysis, 4% of all public GitHub commits now come from Claude Code, with projections to exceed 20% by the end of the year. Linear's CEO stated that issue tracking has become obsolete.
Coding agents are now part of over 75% of Linear’s enterprise workspaces. Money and code are flowing, yet there’s a lack of tracking on how much of that code ends up being shipped.
The unspoken incentive issue lies here. AI providers charge based on tokens. Each time your engineers utilize tokens, the AI provider profits. They earn when tokens are consumed, not when the generated code passes review, is merged, deployed, or functions effectively in production.
This creates a fundamental misalignment. If a developer prompts an AI agent ten times to create a function that is ultimately rewritten by a human, it costs your company tenfold compared to a developer who achieves it on the first try. The provider benefits tenfold more from the former, while the latter is far more valuable to your organization.
Currently, many engineering leaders find it challenging to differentiate the two scenarios. They only see a single line item on the AI bill, unaware of which tokens resulted in production code and which became obsolete.
This isn't a conspiracy but rather a fundamental incentive problem that the VP of Engineering must address, as providers have no reason to rectify it on their behalf.
We have encountered similar situations in the past. When cloud computing first emerged, companies quickly adopted AWS and Azure, spending heavily with the expectation of improved efficiency. The reality, however, was significant waste. It took years for FinOps to emerge as companies recognized they were overspending by 30-40% on cloud infrastructure due to the lack of measurement of actual usage.
AI spend is mirroring this same trajectory, but with a faster growth rate and a wider measurement gap. Cloud providers eventually had to incorporate cost optimization tools due to customer demand.
A similar shift is imminent in AI. Engineering leaders who prioritize measurement will optimize more swiftly, negotiate more effectively, and discern which tools to retain or eliminate. Those who fail to act will continue to write checks, hoping the outcomes justify the investment.
What’s missing is not more dashboards showcasing usage trends and seat occupancy; engineering leaders already have an abundance of that. What is essential is the capability to track AI-generated code from inception to production. This involves commit-level attribution that indicates which agent authored the code, the proportion of a commit that was AI-generated versus human-edited, whether it passed review or was rewritten, and if it deployed successfully or failed.
By linking AI expenditure to production results, organizations can begin to answer critical questions. It becomes possible to identify which teams are truly leveraging AI agents and which are merely consuming tokens without significant returns. Furthermore, it clarifies which vendors produce deployable code versus those that create additional workload for reviewers and whether rising AI costs stem from successful adoption or costly failures.
At Waydev, we dedicated the past year to building a solution to this issue. With nearly a decade of experience measuring engineering behaviors for companies like Dropbox, American Express, and PwC, we adapted our measurement framework to accommodate the changes brought by AI.
Our new platform monitors AI adoption, impact, and ROI across the entire software development lifecycle, linking organizational spending on AI agents to what ultimately makes it to production.
Usage does not equate to value. The AI industry is urging engineering leaders to believe that increased usage signals greater value, but these are not synonymous. Adoption shouldn't be confused with value, and usage does not denote impact. Simply put, tokens consumed do not equal code shipped.
A team generating 10,000 lines of AI code weekly but only shipping 2,000 to production is not
Other articles
The question that AI providers wish VPs of Engineering would never pose
The adoption of AI in coding is rapidly increasing. However, many engineering leaders continue to focus on measuring usage rather than evaluating outcomes. For more details, check out the article.
