Spirit AI surpasses Nvidia on the RoboArena robotics benchmark.
TL;DR: Chinese startup Spirit AI has taken the top position on the RoboArena leaderboard co-developed by Nvidia, scoring 1,924 against Nvidia’s 1,881, highlighting the rise of physical AI as a significant technological frontier.
Nvidia’s latest robotics model held the top place on the RoboArena leaderboard for only two days before being surpassed by a startup from Hangzhou. Spirit AI announced on Wednesday that its foundation model for embodied intelligence, Spirit v1.6, achieved a score of 1,924, surpassing Nvidia’s Cosmos3-Nano-Policy score of 1,881, while Nvidia's second project, DreamZero, came in third with 1,763. This marks the first time a Chinese model has reached the top of RoboArena, a benchmark developed in collaboration with Stanford University and the University of California, Berkeley.
The timing is significant because Nvidia launched its Cosmos 3 omnimodel on June 1 at Computex in Taipei, branding it as the “open frontier foundation model for physical AI.” Trained on 20 trillion tokens of multimodal data, Cosmos 3 was intended to showcase Nvidia’s leadership in a field it largely established, but Spirit AI had other intentions.
What physical AI actually measures:
RoboArena does not assess chatbot fluency or image generation skills but instead evaluates how well a generalist robot policy translates actions into the real world: including object manipulation, navigation, tool use, perception, planning, and adaptability to new situations. In essence, it measures whether a machine can think and act accordingly.
Physical AI hinges on two key capabilities: Policy capabilities, which reflect a model's ability to act based on observations (the focus of RoboArena), and World capabilities, which involve simulating and predicting outcomes based on specific actions.
The industry is shifting towards a combination of these two aspects. Last September, researchers in China introduced a unified “Policy World Model” architecture that integrates world modelling with trajectory planning. This convergence is rapidly gaining momentum within the industry.
China is making strides on multiple fronts:
Spirit AI's performance on RoboArena is part of a broader trend, as Chinese companies dominate many areas of physical AI benchmarks. For instance, the WorldArena benchmark for embodied world models is led by WorldScape-0.2 from Manifold AI, surpassing Nvidia’s Cosmos-Predict 2.5 in the policy evaluator track. AgiBot, one of China's major robotics firms, leads the perception track with its GenieEnvisioner-Sim2.0-2B model released last week. DexForce, another Chinese startup, tops the data engine track.
On the WorldScore benchmark, which assesses a model's ability to generate worlds from text prompts, Manifold AI's WorldScape-0.2 leads, outperforming WonderJourney, a collaborative project from Stanford and Google.
The influx of investment:
These technical achievements are supported by significant financial backing. Spirit AI declared a financing round of 1.5 billion yuan ($222 million) on Wednesday, marking its fourth round in just three months, a pace considered the most aggressive in the embodied AI segment. Previous funding rounds have boosted the company's valuation beyond 10 billion yuan ($1.4 billion).
On the same day, XYZ Embodied AI, incubated by the Beijing Academy of Artificial Intelligence, announced it had closed a pre-A round with 1 billion yuan raised in just 10 months for developing “embodied brains” and world models. Manifold AI has completed five funding rounds in 10 months, with its latest in April reportedly securing hundreds of millions of yuan.
The overall Chinese robotics sector attracted $3.4 billion in venture funding in 2025 alone, which is 42 percent more than the United States, and this gap appears to be widening in 2026.
Nvidia's counter-strategy:
Nvidia is actively responding. At Computex, CEO Jensen Huang announced partnerships with Chinese robotics firm Unitree, preparing for a $7 billion IPO, and Singaporean robotic hand maker Sharpa to create a humanoid robot reference design. This platform combines Unitree’s H2 Plus humanoid body, Sharpa’s Wave tactile hands, and Nvidia’s Jetson AGX Thor T5000 processor.
Huang also introduced the Cosmos Coalition, bringing together AI labs including Agile Robots, Black Forest Labs, Runway, and Skild AI to advance open world models, clearly positioning Nvidia as a foundational layer for the broader physical AI ecosystem, even if individual models may lose benchmark titles.
However, Huang acknowledged a fundamental challenge in the sector: “For robotic systems and physical AI, data is the hardest problem” he stated at Computex, suggesting that China may have a structural advantage.
The data challenge:
Alexandr Wang, founder of Scale AI who joined Meta as its inaugural chief AI officer in 2025, reportedly said last year that China is “fundamentally very well positioned on data” and that many U.S. companies
Other articles
Spirit AI surpasses Nvidia on the RoboArena robotics benchmark.
Chinese startup Spirit AI has taken the lead on the RoboArena leaderboard co-developed by Nvidia, achieving a score of 1,924, compared to Nvidia's 1,881, as physical AI emerges as the next frontier in technology.
