DeepSeek reduces V4-Pro prices by 75%.
The promotional discount will be effective until May 5, 2026. Even at its regular price, V4-Pro already offers lower per-token costs than GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. This action directly challenges the pricing strategies of American AI companies, especially as the Trump administration has accused Chinese companies of large-scale distilling of U.S. AI models.
DeepSeek announced on Monday a 75% discount on its newly launched DeepSeek-V4-Pro model for developers, valid until May 5, 2026, and is simultaneously reducing the cost of input cache hits across its entire API suite to merely one-tenth of the previous price, effective immediately. This price reduction was shared in a post on X. The move escalates a pricing competition that DeepSeek initiated in January 2025 with its R1 model, which claimed to deliver frontier-level reasoning performance at a mere fraction of the cost of comparable OpenAI offerings.
The pricing context is significant. At full price, before any discount, the DeepSeek-V4-Pro model costs $0.145 per million input tokens and $3.48 per million output tokens, underpricing OpenAI's GPT-5.5, Google's Gemini 3.1 Pro, and Anthropic's Claude Opus 4.7 on a per-token basis. With the 75% discount on input tokens, the cost drops to approximately $0.036 per million tokens. The Flash variant, which is a smaller, faster version of V4, costs $0.14 per million input tokens and $0.28 per million output tokens at full price, already offering lower prices than GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5.
The drastic reduction in cache-hit pricing specifically aims to attract frequent users and enterprise developers who typically send similar or repeated requests, a common pattern in production agentic applications. The strategic rationale is clear, as DeepSeek has demonstrated since the launch of R1. The open-source model availability removes access barriers, aggressive API pricing eliminates cost barriers for deployment, and the 1 million-token context window enables its use in enterprise scenarios that involve extensive codebases or lengthy documents, which would normally require several API requests.
Furthermore, V4-Pro seamlessly integrates with Claude Code, OpenClaw, and OpenCode, which are the leading agentic coding frameworks in use by developers within the Western AI ecosystem. This combination effectively minimizes the transition difficulty for any developer whose main constraint is cost, as highlighted by Akshar Keremane, co-founder of the Bangalore-based AI startup O-Health, who remarked that this combination of pricing, open-source availability, and the 1 million-token context window significantly lowers barriers “for developers, startups, and small enterprises.”
Launched last Friday, the V4-Pro model is a mixture-of-experts model featuring 1.6 trillion total parameters and 49 billion active parameters per task, making it the largest open-weight model currently accessible, surpassing Moonshot AI’s Kimi K2.6 and MiniMax’s M1. It is designed with a Hybrid Attention Architecture to ensure coherence over long contexts and is optimized for Huawei’s Ascend 950 chips and Cambricon hardware, rather than Nvidia GPUs.
Zhang Yi, founder of tech research firm iiMedia, stated to AFP that V4’s architecture signifies a “genuine inflection point” for AI processing with long-context capabilities, suggesting that support for ultra-long contexts will transition from research labs into mainstream commercial usage. Wei Sun, principal analyst at Counterpoint Research, remarked that V4 operating on domestic chips “enables the development and deployment of AI systems without sole reliance on Nvidia” and could “accelerate domestic adoption and contribute to faster global AI advancement overall.”
The timing of this pricing strategy occurs in a tense geopolitical context. Last Thursday, White House Director of Science and Technology Policy Michael Kratsios accused foreign entities, especially those in China, of engaging in “industrial-scale” operations to distill frontier AI models from U.S. companies, a practice wherein a smaller model is trained using outputs from a larger model to attain similar capabilities at a lower expense.
While Kratsios’s memo did not name DeepSeek directly, both Anthropic and OpenAI have previously accused DeepSeek of distilling their models. CNN has reported that it reached out to DeepSeek for a response regarding these allegations.
The U.S. government's crackdown on distillation, along with China's similar initiative to limit U.S. investment in its AI companies, was announced a day prior to the launch of V4. In response, DeepSeek's decision three days later to lower prices—rather than addressing the accusations directly—serves as a competitive strategy and a political statement about where it believes the AI race will ultimately be determined.
OpenAI has reduced API prices
Other articles
DeepSeek reduces V4-Pro prices by 75%.
DeepSeek is providing a 75% discount on V4-Pro until May 5 and reducing API cache hit prices to one-tenth across its entire range of products.
