Ways to prevent hindering AI agents
Developers of agentic AI have been making significant assertions. They've promised autonomous systems capable of performing a wide range of tasks, from booking flights and monitoring competitors in real time to managing entire procurement cycles, all without the need for a human to press “confirm.” While the technology required to realize many of these advancements is largely in place, the infrastructure needed for reliable large-scale implementation still falls short.
According to a recent projection by Gartner, over 40% of agentic AI projects are expected to fail before the end of 2027, attributing this to rising costs, ambiguous business value, and insufficient risk controls. This is quite striking, especially considering the anticipation that autonomous agents would signify a significant advancement in AI. However, it may not be surprising to those who have observed the clear limitations these agents demonstrate in real-world settings. Many people assume the core issue lies in the quality of the models themselves, a perspective that, while plausible, is somewhat misguided.
Why the Web Resists Agents
To understand what a capable agent truly requires, it's essential to recognize that accessing a website and retrieving a response is merely the beginning; an agent must also convert that response into usable information. Furthermore, it must do this consistently, in real time, and at a scale sufficient to justify the effort.
Given the current state of the web, this is a formidable challenge. Take online platforms as an example; there’s no technical justification preventing an independent agent from comparing various platforms and selecting the one that aligns best with user preferences. However, these platforms rely on the inaccessibility of such information to preserve their competitive edge. They focus on increasingly personalized results, sponsored placements, and urgency cues to influence user behavior and tilt the balance in their favor. Without access to relevant data, no AI agent will be able to perform tasks on the web or automate the selection of the best option for users.
As a result, the web facilitates general browsing while systematically hindering automated access. I will share some findings that illustrate this issue clearly.
Oxylabs is preparing to release a Web Openness Index, which evaluates over 120 countries based on different aspects of web accessibility. The findings reveal:
The global average score for practical reachability—how effectively a site responds to standard automated HTTP requests—averages 83.4 out of 100.
The score for anti-automation friction (where a lower score indicates greater friction), including CAPTCHAs, rate limiting, fingerprinting, and bot detection, stands at an average of 62.8.
The score for structured data interoperability—whether sites deliver data in machine-friendly formats—drops further to 60.3.
The more than 20-point discrepancies illustrate a structural divide. While sites generally respond to requests for automated access, numerous restrictions exist, and data is frequently presented in ways that are not conducive to machine use. Agents relying on consistent, timely, structured information often find themselves caught in this gap.
Data-Starved AI
Within organizations, agents confront a different but related challenge: a lack of usable data. Essentially, relevant data is available but has not been cleaned, tagged, or organized in a manner that an AI system can comprehend.
The same issue affects customer-facing applications built on agentic systems. Without access to real-time web data—current pricing, live inventory, policy updates, market movements—they can only reason based on an outdated version of reality.
Latency also presents a challenge. An agent that eventually provides the correct answer is considerably less useful than one that delivers it promptly enough for action. In the realm of autonomous systems, the acceptable tolerance for delay is even slimmer.
In every instance, the limitation is similar: agents require trustworthy context, and they are not receiving it—neither from their organizational data nor from the web.
Addressing a Previously Solved Issue
It’s easy to overlook, but this isn’t the first instance of overwhelming information outpacing our ability to process it. The early web serves as a particularly illustrative example. Although it contained a wealth of knowledge, that information was largely unhelpful in its raw form. The significant difference back then came from infrastructure designed for scalability. Web crawlers were employed to index pages, scrapers were utilized to compare online prices, and monitoring systems were established to track fraudulent advertisements and brand impersonation across numerous domains. All these innovations relied on the reliable, large-scale collection of public web data.
A more recent example involves the pro bono Project 4β partners, Debunk.org. This nonprofit organization, which combats online disinformation and fraud, conducted an investigation that revealed a large-scale, multilingual scam targeting previous fraud victims. The investigation identified over 50,000 ads, 459 domains, and more than 1,100 related web pages, affecting an estimated 52 million individuals across Europe. Such extensive coverage necessitates systematic, automated data collection at scale.
Agentic AI requires a similar infrastructure, albeit with even higher demands, as agents utilize data in ways
Other articles
Ways to prevent hindering AI agents
Juras Jursenas, COO of Oxylabs, contends that the true constraint for agentic AI lies not in the quality of the models, but rather in a web designed to block automated access and enterprise data that is unprepared for agents.
