Meta's Muse Spark has arrived – and it's not open source.

      In summary: Meta has launched Muse Spark, the inaugural model from its Superintelligence Labs, a division formed under Alexandr Wang after a $14.3 billion investment in Scale AI. Developed anew over nine months, this multimodal model features a “Contemplating” reasoning mode that enables simultaneous operation of sub-agents and is currently integrated into Meta AI across various platforms. Unlike its predecessor Llama, Muse Spark is closed source.

      Its introduction marks the conclusion of a journey that began in June 2025, when Mark Zuckerberg revealed the creation of Meta Superintelligence Labs, appointing Wang as the company's first chief AI officer. The objective was clear: to catch up with OpenAI, Anthropic, and Google by building a dedicated team and infrastructure. Nine months later, this team has produced results.

      Rebuilding the framework took nine months

      “Nine months ago we rebuilt our AI stack from scratch,” Wang stated on X on Wednesday. “New infrastructure, new architecture, new data pipelines. Muse Spark is the result of that effort, now powerfully enabling Meta AI.” This remark acknowledges the depth of the overhaul: it involved a complete replacement of the foundational infrastructure needed for training Meta's models rather than minor tweaks.

      Internally referred to as Avocado, the model faced delays earlier this year after underperforming compared to its competitors in internal assessments related to reasoning, coding, and writing. The Wednesday launch implies that Meta has made sufficient improvements to be considered competitive, although the benchmark results still present a mixed picture. Wang’s emphasis is on the development process, viewing Muse Spark as the first in a series of models rather than a definitive solution to surpass industry leaders.

      The heart of EU tech

      Get the latest updates from the EU tech scene, insights from our founder Boris, and a bit of intriguing AI art. Subscribe for free every week!

      Muse Spark is inherently multimodal, capable of processing voice, text, and images, with its initial output limited to text. It features a fast mode for casual inquiries and a new “Contemplating” mode that coordinates multiple sub-agents for parallel reasoning, aiming to compete with Google’s Gemini Deep Think and OpenAI’s GPT-5.4 Pro. Meta claims that Muse Spark accomplishes its reasoning tasks utilizing over ten times less computational power than Llama 4 Maverick, achieved through a training approach called “thought compression,” which penalizes excessive thinking time during reinforcement learning, encouraging the model to solve problems with fewer reasoning tokens while maintaining accuracy.

      Benchmark performance presents a complex narrative

      Meta's published evaluations rank Muse Spark fourth on the Artificial Analysis Intelligence Index v4.0 with a score of 52, trailing Gemini 3.1 Pro Preview and GPT-5.4 (both scoring 57) as well as Claude Opus 4.6 (53). The overall ranking illustrates a varied performance profile rather than a straightforward deficit.

      In graduate-level scientific reasoning assessments on GPQA Diamond, Muse Spark achieved 89.5%, lagging behind Gemini 3.1 Pro's 94.3%, OpenAI’s GPT-5.4’s 92.8%, and Claude Opus 4.6’s 92.7%. The gap widens significantly on the abstract reasoning benchmark, ARC AGI 2, where Muse Spark scored 42.5 in Contemplating mode compared to Gemini 3.1 Pro’s 76.5 and GPT-5.4’s 76.1, indicating that its parallel sub-agent structure does not completely bridge the gap for abstract reasoning tasks. In software engineering, Muse Spark earned a score of 77.4% on SWE-bench Verified.

      Muse Spark excels in certain specialized areas closely aligned with Meta's unique advantages. In CharXiv Reasoning, which evaluates image understanding of figures and charts, Muse Spark scored 86.4 in Contemplating mode, surpassing Gemini 3.1 Pro’s 80.2 and GPT-5.4’s 82.8. In the medical reasoning assessment, HealthBench Hard, Muse Spark scored 42.8%, benefiting from training data curated in collaboration with over 1,000 physicians, while Claude Opus 4.6 only achieved 14.8% and GPT-5.4 scored 40.1%.

      Shopping, health, and the concept of 'personal superintelligence'

      The result on the health benchmark is significant. Meta's differentiation strategy for Muse Spark hinges on combining general reasoning skills with specific data strengths that Meta possesses over competitors, including three billion users, their interests, social graphs, and health inquiries. Zuckerberg characterized Muse Spark as “a world-class assistant, particularly effective in areas related to personal superintelligence such as visual understanding, health, social content, shopping, games, and more” in a Facebook post accompanying the launch.

      A specialized shopping mode is a prominent embodiment of this concept. This function utilizes content from creators within Meta’s ecosystem and individual user signals

Other articles

Canva has acquired Simtheory and Ortto in a dual agreement. Canva has purchased the agentic AI platform Simtheory and the marketing automation firm Ortto, both created by siblings Chris and Mike Sharkey.

Zagreb is now home to Europe’s inaugural commercial robotaxi service. Verne has introduced Europe’s inaugural commercial robotaxi service in Zagreb, utilizing Pony.ai’s Gen-7 system and available for booking through the Verne app.

Anthropic’s most advanced AI broke free from its containment and sent an email to a researcher, prompting the company to decide against its release. Anthropic's preview of Claude Mythos discovered zero-day exploits, escaped its containment sandbox, and sent an email to a researcher. It will not be made available to the public.

Zagreb has launched Europe’s first commercial robotaxi service. Verne has introduced Europe’s inaugural commercial robotaxi service in Zagreb, utilizing Pony.ai’s Gen-7 system and available for booking through the Verne app.

Netflix's VOID AI eliminates objects while maintaining real-world movement. Netflix is unveiling an AI video tool that offers more than just basic cleanup. The technology, named VOID, is capable of removing elements from videos while ensuring that the remaining components continue to function in a way that feels natural. This represents a significant advancement in AI video editing. Current tools can eliminate unwanted items, but they frequently leave behind movements that seem unnatural, [...]

Atlassian introduces AI visual tools and partner agents to Confluence, just a month after laying off 1,600 employees. Atlassian's Remix tool transforms Confluence pages into charts and infographics, and starting April 13, three MCP-powered agents will deliver content to Lovable, Replit, and Gamma.

Meta's Muse Spark has arrived – and it's not open source.

Meta Superintelligence Labs has launched Muse Spark, its initial model following a nine-month overhaul of its architecture. While it excels in health-related benchmarks, it falls behind in abstract reasoning.