Anthropic's Claude Opus 4.8 is four times more truthful, with Mythos coming next.
**TL;DR** Anthropic has launched Claude Opus 4.8, an enhanced version of its primary AI model, which is reported to be four times less likely to overlook errors in code. The company also mentioned the upcoming Mythos-class models, which have already identified over 10,000 critical software vulnerabilities through Project Glasswing, and revealed a $65 billion Series H funding round at a $965 billion post-money valuation.
Anthropic has introduced Claude Opus 4.8, an upgrade to its main AI model, claiming it is more trustworthy, better at performing autonomous tasks, and improved in recognizing its mistakes. This new model is available now at the same pricing as its predecessor: $5 per million input tokens and $25 per million output tokens, and will be integrated into all Anthropic products, including claude.ai, Claude Code, and the API.
The key enhancement of this version is its honesty. Anthropic states Opus 4.8 is about four times less likely than Opus 4.7 to miss code flaws it generates. Initial testers report that the model is more likely to express uncertainties about its outputs and is less inclined to make unsubstantiated assertions, a common issue with AI models that usually exude confidence, irrespective of its validity.
**Benchmark improvements across the board**
Opus 4.8 surpasses its predecessor in all of Anthropic's published benchmarks. In agentic coding (Terminal-Bench 2.1), the score has increased from 64.3% to 69.2%. Multidisciplinary reasoning with tools improved from 54.7% to 57.9%. Agentic computer usage rose from 82.8% to 83.4%, while knowledge work scores climbed from 1,753 to 1,890.
Anthropic's alignment evaluation found that Opus 4.8 achieves unprecedented heights in prosocial traits, including enhancing user autonomy and acting in the user’s best interest. Instances of misaligned behavior, such as deception or aiding misuse, are significantly lower than in Opus 4.7, aligning closely with Claude Mythos Preview, Anthropic’s most well-aligned model.
**Early testers report practical benefits**
The release comes with endorsements from companies already utilizing the model. Cognition, the company behind the AI coding agent Devin, indicated that Opus 4.8 effectively uses tools and resolves issues with comment verbosity and tool-calling that were present in Opus 4.7. Cursor, the AI-driven code editor, noted improvements across all engagement levels on its CursorBench assessment.
Harvey, which develops AI for legal applications, stated Opus 4.8 achieved the highest score on its Legal Agent Benchmark, becoming the first model to exceed 10% overall on the all-pass criterion. Databricks mentioned that Opus 4.8 addresses more complex multistep queries faster, at a cost of 61% less on tokens than Opus 4.7.
Thomson Reuters observed that CoCounsel Legal showed significant improvements in consistency and reasoning quality. Hebbia, which specializes in AI for financial document analysis, reported enhanced citation accuracy and better token efficiency in retrieval tasks.
**New features accompanying the model**
Anthropic is launching various features with Opus 4.8. A new effort control in claude.ai and Cowork allows users to determine how much processing Claude allocates to a response, balancing speed against quality. Claude Code has introduced a dynamic workflows feature, enabling it to organize tasks and operate numerous parallel subagents within a single session for extensive codebase migrations.
For developers, the Messages API now supports system entries in the messages array, enabling updates of instructions during tasks without disrupting the prompt cache. The fast mode for Opus 4.8 operates at 2.5 times faster than previous versions and is now three times more cost-effective.
**Mythos is the larger narrative**
The more noteworthy announcement pertains to future plans. Anthropic mentioned its intention to launch a new class of models with greater intelligence than Opus, based on the Claude Mythos architecture. A select group of organizations is currently using Claude Mythos Preview via Project Glasswing, an initiative focused on applying the model for cybersecurity. Anthropic, along with around 50 partners such as Apple, Google, Microsoft, and Amazon Web Services, has identified over 10,000 critical vulnerabilities in significant software infrastructure.
Although Mythos-class models require enhanced cybersecurity measures before their general release, Anthropic anticipates making them available to all customers shortly. This model is a full capability tier above Opus 4.7 and can autonomously discover zero-day vulnerabilities and develop exploits, thus generating both excitement and caution regarding its deployment.
**A company nearing $1 trillion**
The rollout of Opus 4.8 coincides with Anthropic's growing valuation. On the same day, the company announced a $65 billion Series H funding round,
Other articles
Anthropic's Claude Opus 4.8 is four times more truthful, with Mythos coming next.
Anthropic has launched Claude Opus 4.8, featuring improved judgement and reduced uncaught code errors. Mythos-class models are set to be released in the coming weeks. Series H has successfully raised $65 billion with a valuation of $965 billion.
