Fable 5 compared to GPT 5.5: Anthropic's model excelled in every benchmark, but then was withdrawn by the government.
Fable 5 surpassed GPT 5.5 on all significant benchmarks but was removed by the US government after just three days, leaving GPT 5.5 as the leading model currently accessible for use. Anthropic’s Fable 5 briefly held the title of the most advanced AI model ever available to the public, taking the top spot on the Chatbot Arena leaderboard and significantly outperforming OpenAI’s GPT 5.5 on coding benchmarks. However, on June 12, the US government mandated Anthropic to shut it down.
This situation creates an unusual scenario in the AI landscape. The model that clearly outperforms all others is unavailable for use, while GPT 5.5, launched by OpenAI in late April under the internal name “Spud,” is now the most powerful model that developers and consumers can access—not due to its own improvements, but because its main rival was withdrawn.
The disparity in benchmark performance between the two models is substantial. In the SWE-Bench Pro, assessing the resolution of real software engineering problems in open-source codebases, Fable 5 scored 80.3%, while GPT 5.5 scored 58.6%, marking a 22-point difference. In a specific subset known as SWE-Bench Verified, Fable 5 achieved an impressive 95.0%.
The results in coding benchmarks reflect a similar trend. Fable 5 leads the Code Arena by 98 Elo points, scoring 1,665 compared to GPT 5.5’s 1,501. For the challenging FrontierCode Diamond benchmark, Fable 5 scored 29.3%, while GPT 5.5 only reached 5.7%. On the overall Chatbot Arena leaderboard, Fable 5 is ranked first with GPT 5.5 in fourth.
Conversely, GPT 5.5 excels in one area: Terminal-Bench 2.0, which assesses interactive terminal-based coding tasks. Here, GPT 5.5 scored 82.7%, slightly lower than Fable 5’s approximately 88.0%. This gap is narrower as the benchmark evaluates a different skill set, focusing on executing commands and debugging in real-time rather than handling large code repositories.
Cost also favors OpenAI. GPT 5.5 is priced at $5 per million input tokens and $30 per million output tokens, which is half the cost of Fable 5’s $10 and $50, respectively. For developers managing high-volume applications, where price is more critical than performance, GPT 5.5 is often the more feasible option even when both models are obtainable.
Launched on June 9, Fable 5 was Anthropic’s first public release of a Mythos-class model, featuring a one-million-token context window and 128,000 output tokens. It was offered at no additional charge to Pro, Max, Team, and Enterprise subscribers until June 22, but the government intervention terminated this promotional period after only three days.
The shutdown followed an export control directive issued on June 12, citing a jailbreak vulnerability as the reason for removing both Fable 5 and the Mythos 5 model family. Anthropic has challenged the severity of this finding, arguing that the identified vulnerabilities are minor and publicly acknowledged, and achievable by GPT 5.5 without special techniques. Reports suggest that Amazon CEO Andy Jassy may have influenced the government’s review process.
The practical outcome is that developers and researchers who were assessing Fable 5 for real-world applications have to revert to GPT 5.5 or Anthropic’s previous Opus models. For workflows centered around coding, this downgrade is significant; the 22-point gap on SWE-Bench Pro indicates the difference between a model that can resolve roughly 80% of actual software issues and one that handles about 60%.
The future of Fable 5 hinges on Anthropic’s negotiations with the government regarding the export control classification. The company has publicly contended that the directive is excessive and that the alleged vulnerabilities do not warrant the total withdrawal of the model. Until this issue is settled, GPT 5.5 remains the leading model by default, noted as the best option not because it is superior, but due to the absence of its true competitor.
Other articles
Fable 5 compared to GPT 5.5: Anthropic's model excelled in every benchmark, but then was withdrawn by the government.
Anthropic's Fable 5 outperformed OpenAI's GPT 5.5 in every significant AI benchmark before a US export control directive took it offline just three days after its release.
