Challenges in ethical proxy sourcing: ways to remain compliant
Proxy servers may not be widely recognized, yet they are essential to much of the AI infrastructure. A proxy server acts as another device with its own IP address that facilitates web access. Collectively, they enable users to open numerous webpages automatically without running into CAPTCHAs or other barriers. Without proxies, companies would struggle to gather sufficient training data for large language models, impeding the effectiveness of AI agents.
However, the significant capabilities of proxies come with considerable responsibilities. When sourced irresponsibly, they can turn individuals' computers into unintentional botnets. If used maliciously, they can overload websites, generate fake social media accounts, or even aid in theft. Like any powerful tool, proxies can be constructive or destructive, emphasizing the need for appropriate governance.
Proxyway, a website that focuses on web data collection infrastructure, closely monitors the proxy server market and publishes an annual report accessible to the public. This article, informed by their findings, discusses the dangers of selecting an unethical provider and provides guidance on how to avoid such pitfalls.
Proxy servers in the Age of AI
Proxy servers have been around for quite some time, primarily used for anonymity, with usage tracing back to the early 2000s or even earlier. Recently, however, they have evolved into a burgeoning industry, serving as the foundation for companies that compare flight prices, conduct market research, support businesses in gauging their Google search standing, and more. Leading proxy server providers currently generate hundreds of millions in revenue, with the overall market valued at billions of dollars.
The industry was thriving even before the rise of AI. However, the investments flowing into companies like OpenAI, Anthropic, and Perplexity have amplified this growth. Language models require vast amounts of data for each training session; since the web serves as the largest data reservoir, proxies drastically accelerate data collection. This demand has propelled one of the major proxy providers, Bright Data, to achieve $300 million in recurring annual revenue, reflecting a 50% year-over-year growth.
The Risks Associated with Residential Proxy Networks
Residential proxies are the most sought-after type of proxy server. Their value lies in websites' reluctance to share their data, even if it's publicly accessible, leading them to implement protections like Cloudflare to restrict automated access. Unlike data center-hosted proxies, residential proxies are less likely to be blocked, as they appear to be regular home computers connected to internet service providers like Comcast or Verizon.
Interestingly, residential proxies mimic home computers because they indeed originate from them. They are derived from users' laptops, phones, and other connected devices. A proxy provider obtains a user's IP address and a small volume of data, allowing clients to access websites pertinent to their business. In this setup, the IP serves as the proxy, and the users' devices function as servers.
Some readers may wonder if they are unintentionally involved without consent. Ideally, those sharing their connections should be aware and benefit from the arrangement. Unfortunately, this is often not the case. Unscrupulous proxy operators exploit devices by deploying malware, repackaging pirated software, offering free VPNs, or selling vulnerable smart devices like digital picture frames, effectively creating botnets.
In recent years, numerous large botnets have emerged, with some comprising millions of devices. For instance, BADBOX impacted millions of affordable Android TV boxes, and Aisuru did the same. Most recently, Dutch authorities disrupted the ASOCKS botnet, which encompassed over 17 million devices.
Many of these botnets operate on the dark web, where they are weaponized for malicious purposes. Aisuru, for example, was involved in some of the largest distributed denial-of-service (DDoS) attacks recorded. Additionally, they are often monetized as commercial proxy services, blurring the line between illegitimate and legitimate businesses. In January 2026, Google terminated ten proxy server brands based in Hong Kong that were linked to the ASOCKS botnet.
Malicious proxy networks violate the trust and property of unsuspecting individuals, which is both reprehensible and dangerous. Businesses that unknowingly purchase services from such vendors risk harming their reputation and network security. Meanwhile, botnet operators face the prospect of imprisonment, but at least they make their choices knowingly.
Choosing Wisely
How can one differentiate a reputable proxy server provider from a botnet storefront? It can be challenging, but significant market players have implemented substantial measures to self-regulate the procurement and use of their infrastructure.
The first part of establishing legitimacy is in how residential proxies are sourced. The gold standard for acquiring these IPs ensures that the source is aware, gives consent, and receives something in return. Bandwidth-sharing applications like Honeygain or TraffMonetizer exemplify this principle, as their primary objective is to exchange payment for user traffic.
Another option involves software development kits (SDKs)—small snippets of code included in popular applications. Developers often view this as an alternative monetization strategy to subscriptions or
Other articles
Challenges in ethical proxy sourcing: ways to remain compliant
Residential proxies facilitate AI data gathering; however, unethical suppliers can convert devices into botnets. Proxyway's market analysis investigates the dangers and governance frameworks influencing this multi-billion-dollar infrastructure sector.
