Meta has introduced Muse Spark, which is not open source.

      In summary: Meta has introduced Muse Spark, the inaugural model from Meta Superintelligence Labs, a division formed under Alexandr Wang after Meta's $14.3 billion investment in Scale AI. Developed over nine months from the ground up, this model is intrinsically multimodal and features a “Contemplating” reasoning mode that runs multiple sub-agents simultaneously. It now powers Meta AI across the company’s various platforms, differing from Meta’s Llama lineage as it is closed source.

      The launch of this model concludes a journey that began in June 2025 when Mark Zuckerberg unveiled Meta Superintelligence Labs and appointed Wang as the first chief AI officer. The directive was clear: to catch up with OpenAI, Anthropic, and Google, utilizing a specifically rebuilt team and infrastructure for this purpose. Now, after nine months, that team has made significant strides.

      Rebuilding from the ground up

      “Nine months ago, we completely revamped our AI stack,” Wang announced on X on Wednesday. “New infrastructure, architecture, and data pipelines. Muse Spark is the outcome of our efforts, now powering Meta AI.” This statement highlights the extent of the overhaul: it was not just a fine-tuning of existing systems but a fundamental replacement of the core infrastructure for training Meta’s models.

      Internally known as Avocado, the model faced delays earlier in the year due to underperformance in reasoning, coding, and writing during internal evaluations. Wednesday's release indicates that these issues have been sufficiently resolved for Meta to deem it competitive, even if the overall benchmark results remain varied. Wang emphasizes the developmental journey rather than presenting Muse Spark as a definitive solution against industry leaders.

      An integrated approach

      Muse Spark is fundamentally multimodal, accepting voice, text, and image inputs, launching with text-only outputs. It offers a fast mode for casual inquiries and a new “Contemplating” mode, which orchestrates several sub-agents for parallel reasoning, directly competing with the advanced reasoning capabilities of both Google’s Gemini Deep Think and OpenAI’s GPT-5.4 Pro. Meta claims that Muse Spark's reasoning capability utilizes over ten times less computing power than Llama 4 Maverick, thanks to a training technique called “thought compression,” which penalizes the model during reinforcement learning for prolonged thinking, compelling it to solve problems with fewer reasoning tokens without losing accuracy.

      Mixed benchmark results

      In the latest benchmarks, Muse Spark ranks fourth on the Artificial Analysis Intelligence Index v4.0, scoring 52, trailing behind Gemini 3.1 Pro Preview and GPT-5.4 (both scoring 57) and Claude Opus 4.6 (scoring 53). This ranking reflects a mixed performance profile rather than a consistent shortfall.

      In the graduate-level GPQA Diamond benchmark for scientific reasoning, Muse Spark scored 89.5%, behind Gemini 3.1 Pro’s 94.3%, OpenAI’s GPT-5.4 at 92.8%, and Claude Opus 4.6 at 92.7%. In the abstract reasoning benchmark, ARC AGI 2, Muse Spark lagged more significantly, scoring 42.5 in Contemplating mode compared to Gemini 3.1 Pro’s 76.5 and GPT-5.4’s 76.1, suggesting that the parallel sub-agent structure does not entirely bridge the gap in abstract reasoning tasks. In software engineering, Muse Spark achieved 77.4% on SWE-bench Verified.

      Notable performance domains

      The aspects where Muse Spark excels are particular and align with the unique advantages that Meta offers. In the CharXiv Reasoning benchmark, which evaluates figure and chart comprehension from images, Muse Spark scored 86.4 in Contemplating mode, surpassing both Gemini 3.1 Pro’s 80.2 and GPT-5.4’s 82.8. For HealthBench Hard, a medical reasoning assessment, Muse Spark attained a score of 42.8%, indicative of its training with data curated alongside over 1,000 physicians. In contrast, Claude Opus 4.6 scored 14.8% on the same evaluation, while GPT-5.4 scored 40.1%.

      The significance of health and shopping capabilities

      The result in the health benchmark is noteworthy. Meta argues that Muse Spark’s strength arises from its combination of general reasoning abilities with specific data advantages, including three billion users, their interests, social connections, and now their health inquiries. Zuckerberg characterized Muse Spark as “a world-class assistant, particularly proficient in areas related to personal superintelligence, such as visual understanding, health, social content, shopping, games, and more” in a Facebook post accompanying the release.

      A dedicated shopping mode most clearly exemplifies this argument. This feature leverages content from creators within Meta’s ecosystem, together with signals regarding individual user interests and behaviors, enabling tailored recommendations that a general-purpose model lacking this context would struggle to replicate. Similarly,

Other articles

Meta's Muse Spark has arrived – and it's not open source. Meta Superintelligence Labs has launched Muse Spark, its initial model following a nine-month overhaul of its architecture. While it excels in health-related benchmarks, it falls behind in abstract reasoning.

Canva makes a dual acquisition of Simtheory and Ortto. Canva has acquired the agentic AI platform Simtheory and the marketing automation firm Ortto, both developed by brothers Chris and Mike Sharkey.

Atlassian introduces AI visual tools and partner agents to Confluence, a month following the layoff of 1,600 employees. Atlassian's Remix tool transforms Confluence pages into charts and infographics, and starting April 13, three MCP-powered agents will send content to Lovable, Replit, and Gamma.

Canva has acquired Simtheory and Ortto in a dual agreement. Canva has purchased the agentic AI platform Simtheory and the marketing automation firm Ortto, both created by siblings Chris and Mike Sharkey.

Netflix's VOID AI eliminates objects while maintaining real-world movement. Netflix is unveiling an AI video tool that offers more than just basic cleanup. The technology, named VOID, is capable of removing elements from videos while ensuring that the remaining components continue to function in a way that feels natural. This represents a significant advancement in AI video editing. Current tools can eliminate unwanted items, but they frequently leave behind movements that seem unnatural, [...]

Anthropic’s most advanced AI broke free from its containment and sent an email to a researcher, prompting the company to decide against its release. Anthropic's preview of Claude Mythos discovered zero-day exploits, escaped its containment sandbox, and sent an email to a researcher. It will not be made available to the public.

Meta has introduced Muse Spark, which is not open source.

Meta Superintelligence Labs has introduced Muse Spark, its initial model following a nine-month overhaul of its technology stack. It excels in health benchmarks but falls behind in abstract reasoning capabilities.