Challenges of ethical proxy sourcing: ensuring compliance and integrity
Proxy servers may not be widely recognized, yet they are fundamental to much of AI infrastructure. A proxy server acts as an intermediary device with its own IP address, enabling internet access. Collectively, they allow automated browsing of numerous web pages without running into CAPTCHAs or other obstacles. Without proxies, companies would struggle to gather sufficient training data for large language models, causing AI agents to falter during tasks.
However, this capability comes with significant responsibility. When proxies are acquired carelessly, they can convert individuals' computers into unintended botnets. If used maliciously, they can overwhelm websites, generate fake social media profiles, or even facilitate theft. Like any powerful tool, they have the potential to either benefit or harm, which underscores the importance of proper governance.
Proxyway, a site focused on web data collection infrastructure, diligently tracks the proxy server market and publishes an annual, publicly accessible report. This article draws on the report to explore the dangers of selecting an unethical provider and provides guidance on how to avoid such pitfalls.
Proxy Servers in the Age of AI
Proxies have existed for quite some time, primarily as tools for anonymity since the early 2000s, if not earlier. However, in the past decade, the proxy server industry has flourished. They are essential for companies that compare flight costs, conduct market research, monitor their standings on Google, and more. The largest proxy server providers today generate hundreds of millions in revenue, with the overall market valued in the billions.
The industry was thriving even before the rise of AI. Nonetheless, significant investments in AI startups like OpenAI, Anthropic, and Perplexity have significantly amplified the demand. Language models require vast amounts of data for training, and the web represents the most extensive data source available. Proxies drastically accelerate the data collection process, allowing major proxy providers like Bright Data to achieve $300 million in annual recurring revenue, growing 50% year over year.
The Dark Side of Residential Proxy Networks
Residential proxies are the most sought-after type of proxy server. They are prized because websites often resist sharing their data, even publicly available information, leading them to implement measures like Cloudflare to limit automated access. Unlike proxies hosted in data centers, residential proxies are less likely to be blocked since they appear as regular home computers connected to Internet Service Providers like Comcast or Verizon.
What’s intriguing is that residential proxies are indeed home computers, sourced from users’ laptops, phones, and other internet-connected devices. A proxy provider leases a user’s IP address and a minimal amount of data, allowing its clients to access relevant websites. That IP address functions as a proxy, and the users' devices become servers.
Some readers might question whether their devices are being used without their explicit consent. Ideally, individuals sharing their connections should be informed and benefit from the arrangement. Unfortunately, this is not always the case. Unscrupulous proxy server operators often recruit devices through methods like installing malware, repackaging pirated software, offering free VPNs, or selling insecure smart devices. Essentially, they create botnets.
In recent years, several large-scale botnets have emerged, some involving tens of millions of devices. The BADBOX and Aisuru botnets affected millions of inexpensive Android TV boxes, while authorities in the Netherlands recently disrupted the ASOCKS botnet, which consisted of over 17 million devices.
Many of these botnets operate in the dark web, where they are exploited for malicious activities. For example, Aisuru was involved in some of the largest distributed denial of service (DDoS) attacks recorded. Additionally, they are often monetized as commercial proxy services, complicating the distinction between them and legitimate businesses. In January 2026, Google shuttered ten Hong Kong-based proxy server brands linked to the ASOCKS botnet.
Malicious proxy networks betray the trust and property of uninformed individuals, which is not only unethical but potentially harmful. Businesses that unknowingly purchase services from these vendors risk damaging their reputation and network security. Meanwhile, botnet operators take significant legal risks, but at least they are aware of the gamble.
Doing It Right
How can one distinguish a reputable proxy server provider from a mere botnet front? It is not always straightforward, but significant market players have implemented serious measures to self-regulate the acquisition and use of their infrastructure.
The first step in assessing legitimacy is the method of residential proxy acquisition. The ideal standard ensures that the source is aware, consents, and receives something in return. Apps like Honeygain or TraffMonetizer exemplify this concept, as their primary function is to compensate users for their data usage.
Another method involves SDKs—small pieces of code embedded in popular desktop or mobile applications. Developers often view this as an alternative revenue source to ads or subscriptions. However, SDKs raise moral questions; there’s a marked difference between obscuring the SDK in service agreements and clearly presenting a consent screen, providing inadequate rewards or compens
Other articles
Challenges of ethical proxy sourcing: ensuring compliance and integrity
Residential proxies enhance AI data gathering, but unethical providers transform devices into botnets. Proxyway's market report analyzes the risks and governance frameworks that are shaping this multi-billion-dollar infrastructure sector.
