Challenges of ethical proxy sourcing: ensuring compliance and integrity

      Proxy servers may not be widely recognized, yet they are fundamental to much of AI infrastructure. A proxy server acts as an intermediary device with its own IP address, enabling internet access. Collectively, they allow automated browsing of numerous web pages without running into CAPTCHAs or other obstacles. Without proxies, companies would struggle to gather sufficient training data for large language models, causing AI agents to falter during tasks.

      However, this capability comes with significant responsibility. When proxies are acquired carelessly, they can convert individuals' computers into unintended botnets. If used maliciously, they can overwhelm websites, generate fake social media profiles, or even facilitate theft. Like any powerful tool, they have the potential to either benefit or harm, which underscores the importance of proper governance.

      Proxyway, a site focused on web data collection infrastructure, diligently tracks the proxy server market and publishes an annual, publicly accessible report. This article draws on the report to explore the dangers of selecting an unethical provider and provides guidance on how to avoid such pitfalls.

      Proxy Servers in the Age of AI

      Proxies have existed for quite some time, primarily as tools for anonymity since the early 2000s, if not earlier. However, in the past decade, the proxy server industry has flourished. They are essential for companies that compare flight costs, conduct market research, monitor their standings on Google, and more. The largest proxy server providers today generate hundreds of millions in revenue, with the overall market valued in the billions.

      The industry was thriving even before the rise of AI. Nonetheless, significant investments in AI startups like OpenAI, Anthropic, and Perplexity have significantly amplified the demand. Language models require vast amounts of data for training, and the web represents the most extensive data source available. Proxies drastically accelerate the data collection process, allowing major proxy providers like Bright Data to achieve $300 million in annual recurring revenue, growing 50% year over year.

      The Dark Side of Residential Proxy Networks

      Residential proxies are the most sought-after type of proxy server. They are prized because websites often resist sharing their data, even publicly available information, leading them to implement measures like Cloudflare to limit automated access. Unlike proxies hosted in data centers, residential proxies are less likely to be blocked since they appear as regular home computers connected to Internet Service Providers like Comcast or Verizon.

      What’s intriguing is that residential proxies are indeed home computers, sourced from users’ laptops, phones, and other internet-connected devices. A proxy provider leases a user’s IP address and a minimal amount of data, allowing its clients to access relevant websites. That IP address functions as a proxy, and the users' devices become servers.

      Some readers might question whether their devices are being used without their explicit consent. Ideally, individuals sharing their connections should be informed and benefit from the arrangement. Unfortunately, this is not always the case. Unscrupulous proxy server operators often recruit devices through methods like installing malware, repackaging pirated software, offering free VPNs, or selling insecure smart devices. Essentially, they create botnets.

      In recent years, several large-scale botnets have emerged, some involving tens of millions of devices. The BADBOX and Aisuru botnets affected millions of inexpensive Android TV boxes, while authorities in the Netherlands recently disrupted the ASOCKS botnet, which consisted of over 17 million devices.

      Many of these botnets operate in the dark web, where they are exploited for malicious activities. For example, Aisuru was involved in some of the largest distributed denial of service (DDoS) attacks recorded. Additionally, they are often monetized as commercial proxy services, complicating the distinction between them and legitimate businesses. In January 2026, Google shuttered ten Hong Kong-based proxy server brands linked to the ASOCKS botnet.

      Malicious proxy networks betray the trust and property of uninformed individuals, which is not only unethical but potentially harmful. Businesses that unknowingly purchase services from these vendors risk damaging their reputation and network security. Meanwhile, botnet operators take significant legal risks, but at least they are aware of the gamble.

      Doing It Right

      How can one distinguish a reputable proxy server provider from a mere botnet front? It is not always straightforward, but significant market players have implemented serious measures to self-regulate the acquisition and use of their infrastructure.

      The first step in assessing legitimacy is the method of residential proxy acquisition. The ideal standard ensures that the source is aware, consents, and receives something in return. Apps like Honeygain or TraffMonetizer exemplify this concept, as their primary function is to compensate users for their data usage.

      Another method involves SDKs—small pieces of code embedded in popular desktop or mobile applications. Developers often view this as an alternative revenue source to ads or subscriptions. However, SDKs raise moral questions; there’s a marked difference between obscuring the SDK in service agreements and clearly presenting a consent screen, providing inadequate rewards or compens

Other articles

Addverb, backed by Ambani, is looking to raise $100 million to enhance its robotics efforts in India. India's Addverb Technologies is aiming to raise over $100 million to create humanoid robots and AI solutions. Backed by Reliance, the startup is currently positioned outside of the global top 30 and has not yet reported a net profit.

Addverb, supported by Ambani, is looking to raise $100 million to enhance its robotics initiatives in India. Addverb Technologies from India is looking to raise over $100 million to create humanoid robots and artificial intelligence systems. Backed by Reliance, the startup is not among the top 30 globally and has not yet reported a net profit.

The $2 trillion issue of AI infrastructure that is being overlooked, along with the engineer addressing it. GPU idle rates exceeding 30%, operational staffing increasing in direct proportion to cluster size, and a lack of insight into ongoing expenses. The development of AI infrastructure is facing a profitability issue, and the solution is beginning to be released as open source.

How B2B brands are gaining mentions in ChatGPT, Claude, and Google's AI Overviews. The visibility of AI is linked to search rankings rather than being influenced by them. The brands that appear in AI answer engines are implementing the same content strategies that effective SEO has always demanded, but they are doing so across a broader range of sources.

Standard Bots secures $200 million at a valuation of $1 billion for its US robotic arms. Standard Bots secured $200 million at a valuation of $1 billion to produce AI robotic arms in the United States, asserting that they will capture 10% of industrial deployments by the end of the year. Revenue and unit volume figures have not been revealed.

75% of C-suite executives are optimistic about agentic AI, while 48% still intend to implement cuts. A survey conducted among Fortune 500/1000 executives revealed that 75% are optimistic about agentic AI; however, 48% intend to reduce their workforce. The sample consisted of 29 participants, and larger surveys present a more complicated narrative.

Challenges of ethical proxy sourcing: ensuring compliance and integrity

Residential proxies enhance AI data gathering, but unethical providers transform devices into botnets. Proxyway's market report analyzes the risks and governance frameworks that are shaping this multi-billion-dollar infrastructure sector.