Challenges in ethical proxy sourcing: ways to remain compliant

      Proxy servers may not be widely recognized, yet they are essential to much of the AI infrastructure. A proxy server acts as another device with its own IP address that facilitates web access. Collectively, they enable users to open numerous webpages automatically without running into CAPTCHAs or other barriers. Without proxies, companies would struggle to gather sufficient training data for large language models, impeding the effectiveness of AI agents.

      However, the significant capabilities of proxies come with considerable responsibilities. When sourced irresponsibly, they can turn individuals' computers into unintentional botnets. If used maliciously, they can overload websites, generate fake social media accounts, or even aid in theft. Like any powerful tool, proxies can be constructive or destructive, emphasizing the need for appropriate governance.

      Proxyway, a website that focuses on web data collection infrastructure, closely monitors the proxy server market and publishes an annual report accessible to the public. This article, informed by their findings, discusses the dangers of selecting an unethical provider and provides guidance on how to avoid such pitfalls.

      Proxy servers in the Age of AI

      Proxy servers have been around for quite some time, primarily used for anonymity, with usage tracing back to the early 2000s or even earlier. Recently, however, they have evolved into a burgeoning industry, serving as the foundation for companies that compare flight prices, conduct market research, support businesses in gauging their Google search standing, and more. Leading proxy server providers currently generate hundreds of millions in revenue, with the overall market valued at billions of dollars.

      The industry was thriving even before the rise of AI. However, the investments flowing into companies like OpenAI, Anthropic, and Perplexity have amplified this growth. Language models require vast amounts of data for each training session; since the web serves as the largest data reservoir, proxies drastically accelerate data collection. This demand has propelled one of the major proxy providers, Bright Data, to achieve $300 million in recurring annual revenue, reflecting a 50% year-over-year growth.

      The Risks Associated with Residential Proxy Networks

      Residential proxies are the most sought-after type of proxy server. Their value lies in websites' reluctance to share their data, even if it's publicly accessible, leading them to implement protections like Cloudflare to restrict automated access. Unlike data center-hosted proxies, residential proxies are less likely to be blocked, as they appear to be regular home computers connected to internet service providers like Comcast or Verizon.

      Interestingly, residential proxies mimic home computers because they indeed originate from them. They are derived from users' laptops, phones, and other connected devices. A proxy provider obtains a user's IP address and a small volume of data, allowing clients to access websites pertinent to their business. In this setup, the IP serves as the proxy, and the users' devices function as servers.

      Some readers may wonder if they are unintentionally involved without consent. Ideally, those sharing their connections should be aware and benefit from the arrangement. Unfortunately, this is often not the case. Unscrupulous proxy operators exploit devices by deploying malware, repackaging pirated software, offering free VPNs, or selling vulnerable smart devices like digital picture frames, effectively creating botnets.

      In recent years, numerous large botnets have emerged, with some comprising millions of devices. For instance, BADBOX impacted millions of affordable Android TV boxes, and Aisuru did the same. Most recently, Dutch authorities disrupted the ASOCKS botnet, which encompassed over 17 million devices.

      Many of these botnets operate on the dark web, where they are weaponized for malicious purposes. Aisuru, for example, was involved in some of the largest distributed denial-of-service (DDoS) attacks recorded. Additionally, they are often monetized as commercial proxy services, blurring the line between illegitimate and legitimate businesses. In January 2026, Google terminated ten proxy server brands based in Hong Kong that were linked to the ASOCKS botnet.

      Malicious proxy networks violate the trust and property of unsuspecting individuals, which is both reprehensible and dangerous. Businesses that unknowingly purchase services from such vendors risk harming their reputation and network security. Meanwhile, botnet operators face the prospect of imprisonment, but at least they make their choices knowingly.

      Choosing Wisely

      How can one differentiate a reputable proxy server provider from a botnet storefront? It can be challenging, but significant market players have implemented substantial measures to self-regulate the procurement and use of their infrastructure.

      The first part of establishing legitimacy is in how residential proxies are sourced. The gold standard for acquiring these IPs ensures that the source is aware, gives consent, and receives something in return. Bandwidth-sharing applications like Honeygain or TraffMonetizer exemplify this principle, as their primary objective is to exchange payment for user traffic.

      Another option involves software development kits (SDKs)—small snippets of code included in popular applications. Developers often view this as an alternative monetization strategy to subscriptions or

Other articles

Standard Bots secures $200 million at a valuation of $1 billion for its US robotic arms. Standard Bots secured $200 million at a valuation of $1 billion to produce AI robotic arms in the United States, asserting that they will capture 10% of industrial deployments by the end of the year. Revenue and unit volume figures have not been revealed.

How B2B brands are gaining mentions in ChatGPT, Claude, and Google's AI Summaries The visibility of AI is associated with search rankings, rather than being influenced by them. The brands that appear in AI answer engines are implementing the same content strategy that effective SEO has always necessitated, but they are doing so across a broader range of sources.

The $2 trillion issue of AI infrastructure that is being overlooked, along with the engineer addressing it. GPU idle rates exceeding 30%, operational staffing increasing in direct proportion to cluster size, and a lack of insight into ongoing expenses. The development of AI infrastructure is facing a profitability issue, and the solution is beginning to be released as open source.

The $2 trillion AI infrastructure challenge that remains largely unaddressed, along with the engineer tackling it. GPU idle rates exceeding 30%, operational personnel scaling proportionally with cluster size, and a lack of clarity regarding ongoing expenses. The development of AI infrastructure faces margin issues, and the solution is beginning to be released as open source.

Challenges of ethical proxy sourcing: ways to remain compliant Residential proxies enhance AI data gathering, but unethical suppliers may transform devices into botnets. Proxyway's market report analyzes the risks and governance frameworks influencing this multi-billion-dollar infrastructure sector.

How B2B brands are gaining mentions in ChatGPT, Claude, and Google's AI Summaries The visibility of AI is linked to search rankings rather than being a consequence of them. Brands that appear in AI answer engines are following the same content strategies that effective SEO has always demanded, but on a broader range of sources.

Challenges in ethical proxy sourcing: ways to remain compliant

Residential proxies facilitate AI data gathering; however, unethical suppliers can convert devices into botnets. Proxyway's market analysis investigates the dangers and governance frameworks influencing this multi-billion-dollar infrastructure sector.