Perplexity divides AI processing between personal computers and the cloud to reduce expenses.
Perplexity AI unveiled a platform at Computex that intelligently directs AI inference tasks between PCs and cloud servers in real time, likening it to an “air-traffic controller” for AI processes. This chip-agnostic system aims to address the financial challenges posed by centralized inference as Perplexity's revenue reaches $500 million.
This platform can dynamically allocate AI workloads, determining on-the-fly which tasks can be processed by a PC's local processor and which necessitate the capabilities of data center hardware. CEO Aravind Srinivas introduced the system at Computex in Taipei on Tuesday, emphasizing its role in reducing inference costs as trained AI models are executed to produce outputs.
Srinivas noted in a Bloomberg Television interview, “You don’t want all your compute centralised in servers and everything running through the largest models. Reports indicate some companies are alarmed by their costs, with expenses reaching half a billion dollars monthly. What is truly needed is efficient value per watt per user.”
**Operation of the System:**
The platform assesses each AI task and sends it to the most efficient computing resource available. Routine tasks that modern PC processors can manage, such as summarization, formatting, or basic classification, are handled locally without involving the cloud. However, more intricate tasks that require significant model inference, such as multi-step reasoning or retrieval-based generation with extensive datasets, are directed to cloud servers. This decision-making occurs in real-time and goes unnoticed by the user.
The practical implication is that Perplexity can accommodate more users at a reduced cost by delegating a portion of the inference work to the billions of PCs already in operation. With the rising demand for AI inference straining data center capabilities and prompting plans for $1.4 trillion in infrastructure upgrades, distributing computing tasks to the user level is both an economic and infrastructural imperative.
Srinivas made this announcement alongside Intel CEO Lip-Bu Tan, whose company dominates the PC processor market and is interested in establishing PCs as a significant compute layer for AI. Srinivas pointed out, however, that the platform is “chip agnostic” and compatible with Nvidia processors as well. Nvidia also showcased the shift towards edge inference during Computex with its new RTX Spark platform for AI-enabled laptops and desktops.
**The Cost Challenge:**
Srinivas’s remark about companies “spending half a billion dollars per month” on AI computation is grounded in reality. Reports indicate that OpenAI's infrastructure expenses are at that level, while Anthropic anticipates $10.9 billion in revenue for Q2, accompanied by high compute costs that impact profit margins. The energy and financial burden of centralized AI inference is a major limiting factor in the ongoing AI surge.
Perplexity’s strategy counters the assumption that AI inference must occur in the cloud. By positioning PCs as essential computing nodes rather than mere endpoints, the company can lower its server costs while potentially enhancing response times for local tasks. This approach introduces complexity, as the routing system must accurately gauge task difficulty in milliseconds, and the quality of local inference relies on the user's hardware specifications.
**Financial Insights:**
Perplexity’s financial growth emphasizes the importance of cost efficiency. Srinivas shared on X in April that the company's revenue increased fivefold from $100 million to $500 million, with only a 34% rise in headcount. This ratio, approximately 15x revenue growth per additional employee, illustrates both the advantages of AI-native business models and Perplexity's role in aggregating queries across various AI providers rather than training its own advanced models.
Srinivas stated, “Every time any of the AI improves, our unified system also enhances because we route across all of them.” The high growth rates of AI-centric companies that attract investment away from traditional SaaS entities are, in part, facilitated by this architectural efficiency, where product advancements align with improvements from underlying AI providers without proportional increases in costs.
The hybrid compute platform further extends this concept to hardware. If Perplexity can leverage existing user-facing computing resources to manage a substantial portion of inference work, it can lower the marginal cost per query and enhance response times for simple tasks. As AI becomes more embedded in enterprise processes, understanding who bears the computation costs—cloud providers, AI companies, or users' own hardware—will become a pivotal competitive factor.
Other articles
Perplexity divides AI processing between personal computers and the cloud to reduce expenses.
Perplexity AI has developed a real-time routing system that divides AI tasks between personal computers and cloud servers, which was unveiled at Computex in collaboration with Intel, as revenue reaches $500 million.
