Meta's Muse Spark has arrived – and it's not open source.
In summary: Meta has launched Muse Spark, the inaugural model from its Superintelligence Labs, a division formed under Alexandr Wang after a $14.3 billion investment in Scale AI. Developed anew over nine months, this multimodal model features a “Contemplating” reasoning mode that enables simultaneous operation of sub-agents and is currently integrated into Meta AI across various platforms. Unlike its predecessor Llama, Muse Spark is closed source.
Its introduction marks the conclusion of a journey that began in June 2025, when Mark Zuckerberg revealed the creation of Meta Superintelligence Labs, appointing Wang as the company's first chief AI officer. The objective was clear: to catch up with OpenAI, Anthropic, and Google by building a dedicated team and infrastructure. Nine months later, this team has produced results.
Rebuilding the framework took nine months
“Nine months ago we rebuilt our AI stack from scratch,” Wang stated on X on Wednesday. “New infrastructure, new architecture, new data pipelines. Muse Spark is the result of that effort, now powerfully enabling Meta AI.” This remark acknowledges the depth of the overhaul: it involved a complete replacement of the foundational infrastructure needed for training Meta's models rather than minor tweaks.
Internally referred to as Avocado, the model faced delays earlier this year after underperforming compared to its competitors in internal assessments related to reasoning, coding, and writing. The Wednesday launch implies that Meta has made sufficient improvements to be considered competitive, although the benchmark results still present a mixed picture. Wang’s emphasis is on the development process, viewing Muse Spark as the first in a series of models rather than a definitive solution to surpass industry leaders.
The heart of EU tech
Get the latest updates from the EU tech scene, insights from our founder Boris, and a bit of intriguing AI art. Subscribe for free every week!
Muse Spark is inherently multimodal, capable of processing voice, text, and images, with its initial output limited to text. It features a fast mode for casual inquiries and a new “Contemplating” mode that coordinates multiple sub-agents for parallel reasoning, aiming to compete with Google’s Gemini Deep Think and OpenAI’s GPT-5.4 Pro. Meta claims that Muse Spark accomplishes its reasoning tasks utilizing over ten times less computational power than Llama 4 Maverick, achieved through a training approach called “thought compression,” which penalizes excessive thinking time during reinforcement learning, encouraging the model to solve problems with fewer reasoning tokens while maintaining accuracy.
Benchmark performance presents a complex narrative
Meta's published evaluations rank Muse Spark fourth on the Artificial Analysis Intelligence Index v4.0 with a score of 52, trailing Gemini 3.1 Pro Preview and GPT-5.4 (both scoring 57) as well as Claude Opus 4.6 (53). The overall ranking illustrates a varied performance profile rather than a straightforward deficit.
In graduate-level scientific reasoning assessments on GPQA Diamond, Muse Spark achieved 89.5%, lagging behind Gemini 3.1 Pro's 94.3%, OpenAI’s GPT-5.4’s 92.8%, and Claude Opus 4.6’s 92.7%. The gap widens significantly on the abstract reasoning benchmark, ARC AGI 2, where Muse Spark scored 42.5 in Contemplating mode compared to Gemini 3.1 Pro’s 76.5 and GPT-5.4’s 76.1, indicating that its parallel sub-agent structure does not completely bridge the gap for abstract reasoning tasks. In software engineering, Muse Spark earned a score of 77.4% on SWE-bench Verified.
Muse Spark excels in certain specialized areas closely aligned with Meta's unique advantages. In CharXiv Reasoning, which evaluates image understanding of figures and charts, Muse Spark scored 86.4 in Contemplating mode, surpassing Gemini 3.1 Pro’s 80.2 and GPT-5.4’s 82.8. In the medical reasoning assessment, HealthBench Hard, Muse Spark scored 42.8%, benefiting from training data curated in collaboration with over 1,000 physicians, while Claude Opus 4.6 only achieved 14.8% and GPT-5.4 scored 40.1%.
Shopping, health, and the concept of 'personal superintelligence'
The result on the health benchmark is significant. Meta's differentiation strategy for Muse Spark hinges on combining general reasoning skills with specific data strengths that Meta possesses over competitors, including three billion users, their interests, social graphs, and health inquiries. Zuckerberg characterized Muse Spark as “a world-class assistant, particularly effective in areas related to personal superintelligence such as visual understanding, health, social content, shopping, games, and more” in a Facebook post accompanying the launch.
A specialized shopping mode is a prominent embodiment of this concept. This function utilizes content from creators within Meta’s ecosystem and individual user signals
Other articles
Meta's Muse Spark has arrived – and it's not open source.
Meta Superintelligence Labs has launched Muse Spark, its initial model following a nine-month overhaul of its architecture. While it excels in health-related benchmarks, it falls behind in abstract reasoning.
