Sail secures $80 million to reduce the operational costs of AI agents.
Sail Research has secured $80 million to reduce the operational costs of AI agents. The startup, established by former engineers from Apple and NVIDIA, claims it can support the token usage of agents at a cost up to ten times lower than competitors.
AI agents are resource-intensive. Running one for an extended period can consume billions of tokens for just a single task, leading to rapidly escalating expenses. This financial burden often prevents many agents from being deployed outside laboratory settings. The newly launched Sail Research believes it can address these economic challenges.
Sail has garnered $80 million in total seed and Series A funding, achieving a valuation of $450 million. Sequoia Capital led the seed funding, while Kleiner Perkins took charge of the Series A. Other contributors included Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures.
The list of angel investors reads like a who’s who of tech leadership, featuring John Hennessy, chairman of Alphabet, Intel's CEO Lip-Bu Tan, and Tri Dao, chief scientist at Together AI. The San Francisco-based company also attracted angels from Anthropic, OpenAI, SpaceX, and Thinking Machines.
Designed for agents rather than humans, Sail’s proposition is based on a straightforward insight. Current AI infrastructure was created with humans in mind, prioritizing speed for users awaiting prompts. In contrast, agents function autonomously over long durations and emphasize reliability, scalability, and cost-effectiveness.
This disparity presents a significant opportunity. Humans seek immediate responses, whereas agents need the capacity to manage numerous calls for extended periods without costs skyrocketing. Sail contends that the existing technology stack optimizes for the incorrect metric.
“Most inference infrastructure was designed to minimize latency on a single request, but that’s not the right optimization for agents,” explained co-founder and CTO Samir Menon. He emphasized that agents require sustained throughput for thousands of concurrent requests over long time frames, prompting Sail to redesign the stack to accommodate this requirement.
This philosophy is termed “abundant intelligence,” suggesting that an agent’s effectiveness improves with increased compute power and context, aiming to make this compute affordable for widespread use.
Sail offers two main products. The first is its inference engine, reengineered for throughput rather than speed, catering to agents that use billions of tokens per task. The company asserts it provides up to ten times lower costs per token compared to competing solutions.
The second offering is a sandbox known as Sailboxes, which operate for hours or even days instead of mere seconds. Importantly, users are charged solely for the actual working time of the agent, significantly reducing costs associated with inactive periods during lengthy tasks.
Cost reductions are achieved by optimizing the entire stack. Sail customizes open-source inference engines to enhance GPU performance and distributes workloads across multiple providers to ensure resilience. The company also seeks out inexpensive, underutilized computing resources.
Sail cites a benchmark achievement, asserting that its inference surpassed BrowseComp-Plus in a detailed research evaluation, achieving 90.72% accuracy at a cost up to ten times lower than leading alternatives. Additionally, the platform offers seamless integration, supporting existing OpenAI workflows and various open models, including DeepSeek, Gemma, GLM, Kimi, and Nemotron.
The founding team has a strong hardware background. Co-founder and CEO Neil Movva previously worked at NVIDIA, maximizing GPU performance, and held roles at Apple and Together AI focusing on infrastructure. Menon also has experience from Apple, where he managed large-scale systems.
This background influences Sail's product development. The founders argue that their advantage lies in integrating seamlessly from silicon to API, allowing them to balance cost and latency benefits in ways that discrete layers cannot achieve.
“Sail exists to make intelligence abundant,” Movva stated. “Every decision we make, from the chip level to the API, focuses on providing teams with the tokens, scale, and runtime to develop agents without restrictions.” This broad vision positions the company as foundational for a much larger future.
Kleiner Perkins is aligned with this vision. “The infrastructure layer for the agent era is one of the most crucial investments in AI currently,” noted partner Aditya Naganath, commending the founders for their blend of computing knowledge and systems discipline, derived from working at scale.
The timing aligns with a clear trend, as inference costs—the expense of executing a model—have become essential in AI infrastructure. Nebius recently spent $643 million on the 20-person startup Eigen AI, reflecting the industry's demand for expertise that can enhance chip performance and reduce token expenses.
The industry faces a genuine challenge; while token prices have plummeted, enterprise AI costs have surged due to agents requiring significantly more tokens per task. Reducing the cost per token is one of the few viable strategies to reverse this trend.
Sail is not alone in addressing this issue; other companies are also exploring different strategies to reduce costs. Fractile is developing inference chips as an alternative to NVIDIA, while
Other articles
Sail secures $80 million to reduce the operational costs of AI agents.
Sail Research has secured $80 million, with funding led by Sequoia and Kleiner Perkins, to operate long-horizon AI agents at costs per token that are up to 10 times lower.
