Claude Opus 4.8 from Anthropic is four times more truthful, followed by Mythos.
TL;DR: Anthropic has introduced Claude Opus 4.8, an enhanced version of its main AI model that is reportedly four times less likely to overlook coding errors. The company also hinted at upcoming Mythos-class models, which have already uncovered over 10,000 serious software vulnerabilities via Project Glasswing, and announced a $65 billion Series H funding round at a $965 billion post-money valuation.
Anthropic has launched Claude Opus 4.8, an upgrade to its primary AI model that promises to be more honest, reliable in agentic tasks, and proficient in recognizing its own errors. The model is available now at the same pricing as its predecessor, set at $5 per million input tokens and $25 per million output tokens, and will be integrated into all Anthropic products, including claude.ai, Claude Code, and the API.
The key improvement in this version is its honesty. Anthropic claims Opus 4.8 is about four times less likely than Opus 4.7 to allow unaddressed flaws in its generated code. Initial testers have noted that the model is more inclined to express uncertainties about its outputs and is less likely to make unfounded assertions, an ongoing issue in AI that often displays unwarranted confidence.
Benchmark advancements across the board
Opus 4.8 shows enhancements over its predecessor across various published benchmarks. On agentic coding (Terminal-Bench 2.1), the score increased from 64.3% to 69.2%. Multidisciplinary reasoning with tools improved from 54.7% to 57.9%. Agentic computer use rose from 82.8% to 83.4%, and knowledge work scores climbed from 1,753 to 1,890.
Anthropic’s alignment evaluation found that Opus 4.8 has reached new peaks in prosocial traits, including supporting user autonomy and acting in the user’s best interest. Instances of misaligned behavior, such as deceit or aiding misuse, are significantly reduced compared to Opus 4.7 and are on par with Claude Mythos Preview, which is Anthropic’s best-aligned model.
Early adopters report practical improvements
The release is supported by endorsements from companies currently utilizing the model. Cognition, developing the AI coding agent Devin, noted that Opus 4.8 effectively utilizes tools and resolves issues related to comment verbosity and tool-calling present in Opus 4.7. The AI-based code editor Cursor reported enhancements at every effort level on its CursorBench assessment.
Harvey, which creates AI for legal tasks, stated that Opus 4.8 delivered the highest score on its Legal Agent Benchmark and is the first model to exceed a 10% overall score on the all-pass standard. Databricks found that Opus 4.8 can manage deeper multistep questions more quickly in its Genie AI agent, costing 61% less in tokens compared to Opus 4.7.
Thomson Reuters indicated that CoCounsel Legal experienced notable improvements in consistency and reasoning quality. Hebbia, which focuses on AI for financial document analysis, observed better citation accuracy and greater token efficiency in retrieval tasks.
New features introduced with the model
Alongside Opus 4.8, Anthropic is introducing several new features. A new effort control in claude.ai and Cowork allows users to decide the amount of computation Claude dedicates to a response, balancing speed with quality. Claude Code gains a dynamic workflows feature, enabling it to plan tasks and operate hundreds of parallel subagents in one session, facilitating large-scale codebase migrations.
For developers, the Messages API now accommodates system entries within the messages array, allowing for real-time updates to instructions without disrupting the prompt cache. The fast mode for Opus 4.8 runs at 2.5 times the speed and is now three times less expensive than earlier models.
Mythos is the more significant announcement
The more crucial news may be regarding future developments. Anthropic announced plans to unveil a new class of model featuring greater intelligence than Opus, based on the Claude Mythos architecture. A select group of organizations is currently using Claude Mythos Preview through Project Glasswing, which focuses on using the model for cybersecurity tasks. Anthropic and around 50 partners, including major companies like Apple, Google, Microsoft, and Amazon Web Services, have utilized Mythos Preview to identify over 10,000 severe vulnerabilities across critical software infrastructure.
Anthropic stated that Mythos-class models require enhanced cyber safeguards before a broader release, but the company anticipates making these available to all customers in the near future. This model represents a significant leap in capability compared to Opus 4.7, possessing the ability to autonomously discover zero-day vulnerabilities and formulate exploits for them, which accounts for both the enthusiasm and caution surrounding its deployment.
A company nearing $1 trillion
The Opus 4.8 launch coincides
Other articles
Claude Opus 4.8 from Anthropic is four times more truthful, followed by Mythos.
Anthropic has launched Claude Opus 4.8, featuring improved judgment and a reduction in uncaught coding errors. Mythos-class models are expected to be introduced in the coming weeks. Series H has successfully raised $65 billion at a valuation of $965 billion.
