Perplexity divides AI inference between personal computers and the cloud to reduce expenses.

      Perplexity AI introduced a platform at Computex that facilitates real-time routing of AI inference between PCs and cloud servers, functioning as an “air-traffic controller” for AI operations. This chip-agnostic system aims to address the cost issues associated with centralized inference as Perplexity's revenue reaches $500 million.

      Perplexity AI's platform intelligently divides AI tasks between personal computers and cloud servers, determining in real time which computations can be executed on a PC’s processor and which require the capabilities of data center hardware. During an announcement at Computex in Taipei, CEO Aravind Srinivas referred to the system as an “air-traffic controller for AI tasks," aiming to lower inference costs, which involve running trained AI models to produce responses.

      “You don’t want all your compute centralized in servers with everything running through the largest models,” Srinivas commented in a Bloomberg Television interview. “You’re already seeing reports of companies panicking over costs. Some are spending as much as half a billion dollars a month. What you really want is efficient value per watt per user.”

      **How it works**

      The system assesses each AI task and directs it to the most effective computing layer. Basic tasks manageable by modern PC processors, such as summarization, formatting, or lightweight classification, are executed locally without cloud interaction. In contrast, more complicated tasks that require extensive model inference, like multi-step reasoning or retrieval-augmented generation utilizing vast datasets, are directed to cloud servers. This routing occurs in real time and is seamless to the user.

      The outcome is that Perplexity can accommodate more users at reduced costs by shifting some inference responsibilities to the billions of PCs that are already in use. As the demand for AI inference burdens data center capabilities and pushes utilities to plan for $1.4 trillion in grid enhancements, distributing computing to the edge becomes both an economic and infrastructure imperative.

      Srinivas shared the stage with Intel CEO Lip-Bu Tan, whose company is a major player in the PC processor market and has a vested interest in establishing PCs as an important layer for AI computation. Nevertheless, Srinivas emphasized that the platform is “chip agnostic” and is compatible with Nvidia processors as well. Nvidia also spotlighted this trend towards edge inference at Computex with its new RTX Spark platform for AI-powered laptops and desktops.

      **The cost problem**

      Srinivas's comments on firms "spending half a billion dollars per month" on AI compute are not exaggerated. OpenAI’s infrastructure costs have been documented at that magnitude, and Anthropic is projecting $10.9 billion in revenue for Q2, which includes significant compute expenses that squeeze profit margins. The financial and energy burdens associated with centralized AI inference represent a key limitation of the current AI boom.

      Perplexity’s approach challenges the notion that AI inference must occur in the cloud. By recognizing the PC as a primary compute node instead of a mere thin client, the company can lower its own server expenses while likely providing quicker responses for tasks handled locally. However, this introduces complexity: the routing system must accurately judge task difficulty in milliseconds, and the quality of local inference relies on the user’s hardware capabilities.

      **Revenue efficiency**

      Perplexity’s financial growth highlights the importance of cost efficiency. Srinivas reported on X in April that the company’s revenue surged fivefold, from $100 million to $500 million, while its workforce grew by only 34%. This ratio, approximately 15 times revenue growth per additional employee, demonstrates the leverage of AI-native business models and Perplexity’s role as an aggregator that directs queries across various AI providers rather than developing its own advanced models.

      “Every time any of the AI improves, our unified system also enhances because we route across all of them,” Srinivas stated. The growth rates of AI-native companies that are attracting investment away from traditional SaaS firms are partly made possible by this kind of structural efficiency, where the product advances alongside its underlying providers without corresponding cost increases.

      The hybrid compute platform applies this reasoning to hardware. If Perplexity can utilize the computing resources already available on users’ desks to manage a significant portion of inference tasks, it will lower the marginal cost per query and enhance response times for simpler tasks. As AI increasingly integrates into enterprise processes, determining who bears the cost of compute—whether it's the cloud provider, the AI firm, or the user’s own hardware—will become a critical factor in competition.

Other articles

Trump signs a revised AI order featuring a voluntary 30-day review of models. Trump signs a revised AI order featuring a voluntary 30-day review of models. Trump signed a reduced AI executive order that includes a voluntary 30-day pre-release model review and a cybersecurity clearinghouse, which is shorter than the previously planned 90-day mandatory draft that was abandoned in May. Hackers used brute force to compromise Dashlane's two-factor authentication and downloaded encrypted vaults. Hackers used brute force to compromise Dashlane's two-factor authentication and downloaded encrypted vaults. Attackers circumvented Dashlane's 2FA on less than 20 accounts by brute-forcing numeric codes and retrieving encrypted password vaults. Zero-knowledge encryption safeguards data, provided that the master passwords are robust. Microsoft reveals Project Solara: an operating system designed for agent-first devices. Microsoft reveals Project Solara: an operating system designed for agent-first devices. Microsoft's Project Solara is an innovative platform for devices that operate using AI agents in place of traditional apps. Two concept designs, one resembling a badge and the other a desk device, are currently being tested with Best Buy, CVS, Levi's, and Target. EXCLUSIVE: Obsession Star Megan Lawless Discusses Sarah’s Unexpected Destiny, Audience Reactions, and Future Plans EXCLUSIVE: Obsession Star Megan Lawless Discusses Sarah’s Unexpected Destiny, Audience Reactions, and Future Plans In a unique interview, Obsession star Megan Lawless talks about Sarah's unexpected demise, the film's remarkable success, her camaraderie with Inde Navarrette, and her future plans. Amazon has revealed the date for its Prime Day sales, which will take place slightly earlier this year. Amazon has revealed the date for its Prime Day sales, which will take place slightly earlier this year. Prime Day is returning and it’s happening earlier than ever before. This year, it will take place from June 23 to 26, compared to July 8 last year. Hackers forcefully bypassed Dashlane's two-factor authentication and accessed encrypted vaults. Hackers forcefully bypassed Dashlane's two-factor authentication and accessed encrypted vaults. Attackers managed to circumvent Dashlane's two-factor authentication on fewer than 20 accounts by brute-forcing numeric codes and downloading encrypted password vaults. Data is safeguarded by zero-knowledge encryption, provided that the master passwords are robust.

Perplexity divides AI inference between personal computers and the cloud to reduce expenses.

Perplexity AI has developed a real-time routing system that distributes AI tasks between personal computers and cloud servers, which was disclosed at Computex in conjunction with Intel, as revenue reaches $500 million.