Spirit AI surpasses Nvidia in the RoboArena robotics benchmark.
TL;DR: The Chinese startup Spirit AI has taken the top position on the RoboArena leaderboard, scoring 1,924 compared to Nvidia’s 1,881, signaling a shift in the tech battleground of physical AI.
Nvidia's latest robotics model held the top spot on the RoboArena leaderboard for just two days before being surpassed by the Hangzhou-based startup Spirit AI. On Wednesday, Spirit AI revealed that its foundation model for embodied intelligence, Spirit v1.6, achieved a score of 1,924, beating Nvidia's Cosmos3-Nano-Policy score of 1,881, while another Nvidia project, DreamZero, ranked third with a score of 1,763. This marks the first occurrence of a Chinese model leading the RoboArena leaderboard, which was co-developed by Nvidia in collaboration with Stanford University and the University of California, Berkeley.
This development comes at a particularly poignant moment, as Nvidia introduced its Cosmos 3 omnimodel at Computex in Taipei on June 1, branding it as the “open frontier foundation model for physical AI.” Trained on 20 trillion tokens of multimodal data, Cosmos 3 was intended to showcase Nvidia’s leadership in a field it effectively created, but Spirit AI had different intentions.
Understanding what physical AI measures
RoboArena does not assess the fluency of chatbots or the quality of image generation. Instead, it tests how proficiently a generalist robot policy can translate into physical actions: object manipulation, navigation, tool usage, perception, planning, and adaptability in new environments. Essentially, it gauges whether a machine can conceptualize and then execute actions.
Physical AI depends on two primary capabilities. The policy capabilities gauge a model’s capacity to act based on observations, precisely what RoboArena evaluates. World capabilities assess a model’s ability to simulate and predict outcomes based on particular actions.
The industry is increasingly prioritizing the integration of both. Last September, Chinese researchers presented a unified “Policy World Model” architecture that combines world modeling and trajectory planning into a single framework. This convergence is accelerating across the industry.
China’s advancements on multiple fronts
Spirit AI’s success in RoboArena is not an isolated instance. In the wider landscape of physical AI benchmarks, Chinese companies dominate nearly every category.
In the WorldArena benchmark for assessing embodied world models, Manifold AI's WorldScape-0.2 holds the top position, surpassing Nvidia’s Cosmos-Predict 2.5 in the policy evaluator segment. The perception track is led by China's AgiBot with its recently unveiled GenieEnvisioner-Sim2.0-2B model. Meanwhile, the data engine track is topped by another Chinese startup, DexForce.
In the WorldScore benchmark, which measures a model’s ability to generate worlds from text inputs, Manifold AI's WorldScape-0.2 again outperforms WonderJourney, a collaboration between Stanford and Google.
Investment influx
These impressive technical results are backed by a remarkable influx of funding. On Wednesday, Spirit AI announced a financing round of 1.5 billion yuan ($222 million), marking the fourth round in just three months. This rapid pace is considered the most aggressive funding activity seen in the embodied AI sector. Prior rounds have raised the company’s valuation to over 10 billion yuan ($1.4 billion).
On the same day, XYZ Embodied AI, developed by the Beijing Academy of Artificial Intelligence, announced it had completed its pre-A round, raising 1 billion yuan within just 10 months to create “embodied brains” and world models. Manifold AI has secured funding through five rounds in 10 months, with its latest round in April reportedly raising hundreds of millions of yuan.
The broader Chinese robotics sector attracted $3.4 billion in venture capital in 2025, outpacing the United States by 42 percent. This gap appears to be widening in 2026.
Nvidia’s response strategy
Nvidia is also taking action. At Computex, CEO Jensen Huang announced a partnership with Chinese robotics company Unitree, which is planning a $7 billion IPO, along with Singaporean robotic hand manufacturer Sharpa to create a humanoid robot reference design. This platform will integrate Unitree’s H2 Plus humanoid body, Sharpa’s Wave tactile hands, and Nvidia’s Jetson AGX Thor T5000 processor.
Huang also launched the Cosmos Coalition, bringing together AI labs such as Agile Robots, Black Forest Labs, Runway, and Skild AI to enhance open-world models. The intent is clear: Nvidia aims to serve as the foundational infrastructure for the entire physical AI ecosystem, even if individual models face challenges in benchmarks.
However, Huang acknowledged the fundamental bottleneck in the sector: “For robotic systems and physical AI, data is the hardest problem,” he stated at Computex. This admission highlights why China may have a structural advantage.
The data dilemma
Alexandr Wang, the founder of Scale AI who joined
Other articles
Spirit AI surpasses Nvidia in the RoboArena robotics benchmark.
Chinese startup Spirit AI has taken the lead on the RoboArena leaderboard co-developed by Nvidia, achieving a score of 1,924 compared to Nvidia's 1,881, marking the rise of physical AI as the new frontier in technology.
