Fable 5 compared to GPT 5.5: Anthropic's model excelled in every benchmark, but then was withdrawn by the government.

      Fable 5 surpassed GPT 5.5 on all significant benchmarks but was removed by the US government after just three days, leaving GPT 5.5 as the leading model currently accessible for use. Anthropic’s Fable 5 briefly held the title of the most advanced AI model ever available to the public, taking the top spot on the Chatbot Arena leaderboard and significantly outperforming OpenAI’s GPT 5.5 on coding benchmarks. However, on June 12, the US government mandated Anthropic to shut it down.

      This situation creates an unusual scenario in the AI landscape. The model that clearly outperforms all others is unavailable for use, while GPT 5.5, launched by OpenAI in late April under the internal name “Spud,” is now the most powerful model that developers and consumers can access—not due to its own improvements, but because its main rival was withdrawn.

      The disparity in benchmark performance between the two models is substantial. In the SWE-Bench Pro, assessing the resolution of real software engineering problems in open-source codebases, Fable 5 scored 80.3%, while GPT 5.5 scored 58.6%, marking a 22-point difference. In a specific subset known as SWE-Bench Verified, Fable 5 achieved an impressive 95.0%.

      The results in coding benchmarks reflect a similar trend. Fable 5 leads the Code Arena by 98 Elo points, scoring 1,665 compared to GPT 5.5’s 1,501. For the challenging FrontierCode Diamond benchmark, Fable 5 scored 29.3%, while GPT 5.5 only reached 5.7%. On the overall Chatbot Arena leaderboard, Fable 5 is ranked first with GPT 5.5 in fourth.

      Conversely, GPT 5.5 excels in one area: Terminal-Bench 2.0, which assesses interactive terminal-based coding tasks. Here, GPT 5.5 scored 82.7%, slightly lower than Fable 5’s approximately 88.0%. This gap is narrower as the benchmark evaluates a different skill set, focusing on executing commands and debugging in real-time rather than handling large code repositories.

      Cost also favors OpenAI. GPT 5.5 is priced at $5 per million input tokens and $30 per million output tokens, which is half the cost of Fable 5’s $10 and $50, respectively. For developers managing high-volume applications, where price is more critical than performance, GPT 5.5 is often the more feasible option even when both models are obtainable.

      Launched on June 9, Fable 5 was Anthropic’s first public release of a Mythos-class model, featuring a one-million-token context window and 128,000 output tokens. It was offered at no additional charge to Pro, Max, Team, and Enterprise subscribers until June 22, but the government intervention terminated this promotional period after only three days.

      The shutdown followed an export control directive issued on June 12, citing a jailbreak vulnerability as the reason for removing both Fable 5 and the Mythos 5 model family. Anthropic has challenged the severity of this finding, arguing that the identified vulnerabilities are minor and publicly acknowledged, and achievable by GPT 5.5 without special techniques. Reports suggest that Amazon CEO Andy Jassy may have influenced the government’s review process.

      The practical outcome is that developers and researchers who were assessing Fable 5 for real-world applications have to revert to GPT 5.5 or Anthropic’s previous Opus models. For workflows centered around coding, this downgrade is significant; the 22-point gap on SWE-Bench Pro indicates the difference between a model that can resolve roughly 80% of actual software issues and one that handles about 60%.

      The future of Fable 5 hinges on Anthropic’s negotiations with the government regarding the export control classification. The company has publicly contended that the directive is excessive and that the alleged vulnerabilities do not warrant the total withdrawal of the model. Until this issue is settled, GPT 5.5 remains the leading model by default, noted as the best option not because it is superior, but due to the absence of its true competitor.

Other articles

In the first quarter of 2026, grassroots organizations halted 75 data center projects valued at $130 billion. Opposition to data centers has surged to 833 groups in 49 states in the US, which have either halted or postponed 75 projects valued at $130 billion in just three months, matching the total for all of 2025.

Skoda's Peaq is a seven-seater electric SUV that is priced several thousand less than the Kia EV9. Skoda's Peaq flagship SUV features seven seats, a range of 600 km, and V2H charging, starting at approximately €50,000, which is cheaper compared to the Kia EV9 and Hyundai Ioniq 9.

GM states that its autonomous vehicles will ultimately function as robotaxis. GM's chief product officer, Sterling Anderson, states that the company's focus on personal autonomy will intersect with the realm of robotaxis, leaving possibilities open.

NHS England launches Microsoft 365 Copilot for 505,000 employees in the largest implementation of AI in healthcare. NHS England is providing 505,000 clinicians and staff with access to Microsoft 365 Copilot following a trial involving 30,000 participants, which indicated an average time saving of 43 minutes per day.

Skoda's Peaq is a seven-seat electric SUV that is priced significantly lower than the Kia EV9. The Skoda Peaq flagship SUV, which features seven seats, a 600km range, and V2H charging, is priced starting at approximately €50,000, making it less expensive than the Kia EV9 and Hyundai Ioniq 9 by notable amounts.

The Apple Pay update in iOS 27 will eliminate the need for you to search for the correct card before making a purchase. Apple has revamped the Apple Pay checkout, allowing you to switch cards just by tapping on them, at last.

Fable 5 compared to GPT 5.5: Anthropic's model excelled in every benchmark, but then was withdrawn by the government.

Anthropic's Fable 5 outperformed OpenAI's GPT 5.5 in every significant AI benchmark before a US export control directive took it offline just three days after its release.