Meta has introduced Muse Spark, which is not open source.
In summary: Meta has introduced Muse Spark, the inaugural model from Meta Superintelligence Labs, a division formed under Alexandr Wang after Meta's $14.3 billion investment in Scale AI. Developed over nine months from the ground up, this model is intrinsically multimodal and features a “Contemplating” reasoning mode that runs multiple sub-agents simultaneously. It now powers Meta AI across the company’s various platforms, differing from Meta’s Llama lineage as it is closed source.
The launch of this model concludes a journey that began in June 2025 when Mark Zuckerberg unveiled Meta Superintelligence Labs and appointed Wang as the first chief AI officer. The directive was clear: to catch up with OpenAI, Anthropic, and Google, utilizing a specifically rebuilt team and infrastructure for this purpose. Now, after nine months, that team has made significant strides.
Rebuilding from the ground up
“Nine months ago, we completely revamped our AI stack,” Wang announced on X on Wednesday. “New infrastructure, architecture, and data pipelines. Muse Spark is the outcome of our efforts, now powering Meta AI.” This statement highlights the extent of the overhaul: it was not just a fine-tuning of existing systems but a fundamental replacement of the core infrastructure for training Meta’s models.
Internally known as Avocado, the model faced delays earlier in the year due to underperformance in reasoning, coding, and writing during internal evaluations. Wednesday's release indicates that these issues have been sufficiently resolved for Meta to deem it competitive, even if the overall benchmark results remain varied. Wang emphasizes the developmental journey rather than presenting Muse Spark as a definitive solution against industry leaders.
An integrated approach
Muse Spark is fundamentally multimodal, accepting voice, text, and image inputs, launching with text-only outputs. It offers a fast mode for casual inquiries and a new “Contemplating” mode, which orchestrates several sub-agents for parallel reasoning, directly competing with the advanced reasoning capabilities of both Google’s Gemini Deep Think and OpenAI’s GPT-5.4 Pro. Meta claims that Muse Spark's reasoning capability utilizes over ten times less computing power than Llama 4 Maverick, thanks to a training technique called “thought compression,” which penalizes the model during reinforcement learning for prolonged thinking, compelling it to solve problems with fewer reasoning tokens without losing accuracy.
Mixed benchmark results
In the latest benchmarks, Muse Spark ranks fourth on the Artificial Analysis Intelligence Index v4.0, scoring 52, trailing behind Gemini 3.1 Pro Preview and GPT-5.4 (both scoring 57) and Claude Opus 4.6 (scoring 53). This ranking reflects a mixed performance profile rather than a consistent shortfall.
In the graduate-level GPQA Diamond benchmark for scientific reasoning, Muse Spark scored 89.5%, behind Gemini 3.1 Pro’s 94.3%, OpenAI’s GPT-5.4 at 92.8%, and Claude Opus 4.6 at 92.7%. In the abstract reasoning benchmark, ARC AGI 2, Muse Spark lagged more significantly, scoring 42.5 in Contemplating mode compared to Gemini 3.1 Pro’s 76.5 and GPT-5.4’s 76.1, suggesting that the parallel sub-agent structure does not entirely bridge the gap in abstract reasoning tasks. In software engineering, Muse Spark achieved 77.4% on SWE-bench Verified.
Notable performance domains
The aspects where Muse Spark excels are particular and align with the unique advantages that Meta offers. In the CharXiv Reasoning benchmark, which evaluates figure and chart comprehension from images, Muse Spark scored 86.4 in Contemplating mode, surpassing both Gemini 3.1 Pro’s 80.2 and GPT-5.4’s 82.8. For HealthBench Hard, a medical reasoning assessment, Muse Spark attained a score of 42.8%, indicative of its training with data curated alongside over 1,000 physicians. In contrast, Claude Opus 4.6 scored 14.8% on the same evaluation, while GPT-5.4 scored 40.1%.
The significance of health and shopping capabilities
The result in the health benchmark is noteworthy. Meta argues that Muse Spark’s strength arises from its combination of general reasoning abilities with specific data advantages, including three billion users, their interests, social connections, and now their health inquiries. Zuckerberg characterized Muse Spark as “a world-class assistant, particularly proficient in areas related to personal superintelligence, such as visual understanding, health, social content, shopping, games, and more” in a Facebook post accompanying the release.
A dedicated shopping mode most clearly exemplifies this argument. This feature leverages content from creators within Meta’s ecosystem, together with signals regarding individual user interests and behaviors, enabling tailored recommendations that a general-purpose model lacking this context would struggle to replicate. Similarly,
Other articles
Meta has introduced Muse Spark, which is not open source.
Meta Superintelligence Labs has introduced Muse Spark, its initial model following a nine-month overhaul of its technology stack. It excels in health benchmarks but falls behind in abstract reasoning capabilities.
