Challenges of ethical proxy sourcing: ways to remain compliant

Challenges of ethical proxy sourcing: ways to remain compliant

      Proxy servers may not be widely recognized tools, yet they are essential to much of the AI infrastructure. A proxy server is a device with its own IP address that allows users to access the internet. Collectively, they enable the automated opening of numerous web pages without running into CAPTCHAs or other obstacles. Without proxies, companies would struggle to gather sufficient training data for large language models, and AI agents would frequently encounter setbacks in their tasks.

      However, this power comes with significant responsibility. If proxies are sourced irresponsibly, they can turn individuals’ computers into unknowing parts of botnets. When used maliciously, they can overwhelm websites, create fake social media profiles, or even assist in stealing personal information. Like any powerful tool, proxies can be used for good or harm, underscoring the need for proper governance.

      Proxyway, a website that analyzes the web data collection ecosystem, closely monitors the proxy server market and publishes an annual, publicly accessible report detailing their findings. This article, which references the report, explores the dangers of selecting an unethical provider and offers guidance on how to avoid such choices.

      The Role of Proxy Servers in the AI Era

      Proxies have existed for quite some time, primarily as tools for anonymity starting in the early 2000s or even prior. In the past decade, however, the proxy server market has rapidly expanded. They have become fundamental to companies that compare flight prices, conduct market research, and help businesses evaluate their visibility on Google searches, among other functions. Today, leading proxy providers generate hundreds of millions in revenue, contributing to a market worth billions.

      The industry was thriving before the rise of AI, but the extensive investments in companies like OpenAI, Anthropic, and Perplexity have amplified its growth. Language models require vast amounts of data for training, and the internet is the largest data repository available; proxies dramatically accelerate the data collection speed. This growing demand has allowed major proxy provider Bright Data to achieve an annual recurring revenue of $300 million, with a 50% year-over-year increase.

      The Dark Side of Residential Proxy Networks

      Residential proxies are the most sought-after type of proxy server. They are valuable because websites often take measures, such as using Cloudflare, to restrict automated data access. In contrast to data center proxies, residential proxies are less likely to be blocked because they resemble home computers linked to ISPs like Comcast or Verizon.

      Interestingly, residential proxies are home computers because they originate from users’ devices like laptops and smartphones. Proxy providers borrow users’ IP addresses and a small amount of data so their clients can access relevant websites. At this point, some readers may question whether they are unwittingly participating. Ideally, those sharing their connections should be aware and benefit from it. Unfortunately, this is not always the case. Unscrupulous proxy providers may recruit devices through malware, re-packaged pirated software, free VPN offers, or by selling vulnerable smart devices. Essentially, they create botnets.

      Recent years have revealed numerous large-scale botnets, some encompassing tens of millions of devices. The BADBOX botnet targeted millions of inexpensive Android TV boxes, as did Aisuru. Just recently, Dutch authorities disrupted the ASOCKS botnet, which included over 17 million devices.

      Many of these botnets are traded on the dark web, where they are exploited for malicious purposes. For example, Aisuru has led to some of the largest distributed denial-of-service attacks recorded. However, they are frequently monetized as commercial proxy services, making them difficult to differentiate from legitimate operations. In January 2026, Google shut down ten Hong Kong-based proxy server brands linked to the ASOCKS botnet.

      Malicious proxy networks violate the trust and property of unsuspecting individuals, which is both reprehensible and potentially hazardous. Commercial entities that unknowingly purchase from such vendors risk tarnishing their reputation and compromising their network security. Operators of these botnets take calculated risks of incarceration, opting to engage in such activities knowingly.

      Navigating Proxy Server Legitimacy

      So, how can one differentiate between a reliable proxy server provider and a botnet operation? While it may not always be simple, prominent market players have taken significant steps to self-regulate the sourcing and utilization of their infrastructure.

      The first step toward legitimacy is how residential proxies are acquired. The gold standard ensures that the source is aware of and consents to the arrangement while receiving some form of compensation in return. Bandwidth-sharing applications like Honeygain or TraffMonetizer exemplify this model, where users are compensated for their traffic.

      Another method involves SDKs—small code snippets embedded in popular applications. Developers often utilize SDKs as an alternative to subscription or advertising revenue. However, SDKs can be ethically ambiguous. There is a notable contrast between subtly including the SDK in terms of service and clearly presenting a consent screen; offering minimal or disproportionate rewards versus fair compensation; and placing SDKs in children's apps rather than targeting

Other articles

Standard Bots secures $200M at a valuation of $1 billion for its robotic arms in the U.S. Standard Bots secures $200M at a valuation of $1 billion for its robotic arms in the U.S. Standard Bots secured $200 million at a valuation of $1 billion to produce AI robotic arms in the United States, asserting that they will capture 10% of industrial deployments by the end of the year. Revenue and unit numbers have not been revealed. Addverb, backed by Ambani, is looking to raise $100 million to enhance its robotics efforts in India. Addverb, backed by Ambani, is looking to raise $100 million to enhance its robotics efforts in India. India's Addverb Technologies is aiming to raise over $100 million to create humanoid robots and AI solutions. Backed by Reliance, the startup is currently positioned outside of the global top 30 and has not yet reported a net profit. Einride geht für 1,35 Milliarden Dollar über einen SPAC an die Nasdaq, ein Rückgang von 5 Milliarden Dollar. Einride geht für 1,35 Milliarden Dollar über einen SPAC an die Nasdaq, ein Rückgang von 5 Milliarden Dollar. Swedish autonomous trucking startup Einride went public on Nasdaq at a valuation of $1.35 billion through a SPAC, marking a 73% decrease from the $5 billion discussed in banking negotiations. Competitors in the autonomous trucking sector have faced challenges following their listings. Einride notiert an der Nasdaq mit einer Bewertung von 1,35 Milliarden Dollar durch einen SPAC, zuvor waren es 5 Milliarden Dollar. Einride notiert an der Nasdaq mit einer Bewertung von 1,35 Milliarden Dollar durch einen SPAC, zuvor waren es 5 Milliarden Dollar. The Swedish autonomous trucking startup Einride went public on Nasdaq with a valuation of $1.35 billion through a SPAC, which is a 73% decrease from the $5 billion valuation discussed with banks. Competitors in the autonomous trucking sector have faced challenges following their listings. Standard Bots secures $200M at a $1 billion valuation for its robotic arms in the US. Standard Bots secures $200M at a $1 billion valuation for its robotic arms in the US. Standard Bots secured $200 million at a valuation of $1 billion to produce AI robotic arms in the United States, asserting they will account for 10% of industrial deployments by the end of the year. Revenue and unit volumes remain undisclosed. How B2B brands are gaining mentions in ChatGPT, Claude, and Google's AI Summaries How B2B brands are gaining mentions in ChatGPT, Claude, and Google's AI Summaries The visibility of AI is associated with search rankings, rather than being influenced by them. The brands that appear in AI answer engines are implementing the same content strategy that effective SEO has always necessitated, but they are doing so across a broader range of sources.

Challenges of ethical proxy sourcing: ways to remain compliant

Residential proxies enhance AI data gathering, but unethical suppliers may transform devices into botnets. Proxyway's market report analyzes the risks and governance frameworks influencing this multi-billion-dollar infrastructure sector.