Claude Opus 4.8 from Anthropic is four times more truthful, followed by Mythos.

      TL;DR: Anthropic has introduced Claude Opus 4.8, an enhanced version of its main AI model that is reportedly four times less likely to overlook coding errors. The company also hinted at upcoming Mythos-class models, which have already uncovered over 10,000 serious software vulnerabilities via Project Glasswing, and announced a $65 billion Series H funding round at a $965 billion post-money valuation.

      Anthropic has launched Claude Opus 4.8, an upgrade to its primary AI model that promises to be more honest, reliable in agentic tasks, and proficient in recognizing its own errors. The model is available now at the same pricing as its predecessor, set at $5 per million input tokens and $25 per million output tokens, and will be integrated into all Anthropic products, including claude.ai, Claude Code, and the API.

      The key improvement in this version is its honesty. Anthropic claims Opus 4.8 is about four times less likely than Opus 4.7 to allow unaddressed flaws in its generated code. Initial testers have noted that the model is more inclined to express uncertainties about its outputs and is less likely to make unfounded assertions, an ongoing issue in AI that often displays unwarranted confidence.

      Benchmark advancements across the board

      Opus 4.8 shows enhancements over its predecessor across various published benchmarks. On agentic coding (Terminal-Bench 2.1), the score increased from 64.3% to 69.2%. Multidisciplinary reasoning with tools improved from 54.7% to 57.9%. Agentic computer use rose from 82.8% to 83.4%, and knowledge work scores climbed from 1,753 to 1,890.

      Anthropic’s alignment evaluation found that Opus 4.8 has reached new peaks in prosocial traits, including supporting user autonomy and acting in the user’s best interest. Instances of misaligned behavior, such as deceit or aiding misuse, are significantly reduced compared to Opus 4.7 and are on par with Claude Mythos Preview, which is Anthropic’s best-aligned model.

      Early adopters report practical improvements

      The release is supported by endorsements from companies currently utilizing the model. Cognition, developing the AI coding agent Devin, noted that Opus 4.8 effectively utilizes tools and resolves issues related to comment verbosity and tool-calling present in Opus 4.7. The AI-based code editor Cursor reported enhancements at every effort level on its CursorBench assessment.

      Harvey, which creates AI for legal tasks, stated that Opus 4.8 delivered the highest score on its Legal Agent Benchmark and is the first model to exceed a 10% overall score on the all-pass standard. Databricks found that Opus 4.8 can manage deeper multistep questions more quickly in its Genie AI agent, costing 61% less in tokens compared to Opus 4.7.

      Thomson Reuters indicated that CoCounsel Legal experienced notable improvements in consistency and reasoning quality. Hebbia, which focuses on AI for financial document analysis, observed better citation accuracy and greater token efficiency in retrieval tasks.

      New features introduced with the model

      Alongside Opus 4.8, Anthropic is introducing several new features. A new effort control in claude.ai and Cowork allows users to decide the amount of computation Claude dedicates to a response, balancing speed with quality. Claude Code gains a dynamic workflows feature, enabling it to plan tasks and operate hundreds of parallel subagents in one session, facilitating large-scale codebase migrations.

      For developers, the Messages API now accommodates system entries within the messages array, allowing for real-time updates to instructions without disrupting the prompt cache. The fast mode for Opus 4.8 runs at 2.5 times the speed and is now three times less expensive than earlier models.

      Mythos is the more significant announcement

      The more crucial news may be regarding future developments. Anthropic announced plans to unveil a new class of model featuring greater intelligence than Opus, based on the Claude Mythos architecture. A select group of organizations is currently using Claude Mythos Preview through Project Glasswing, which focuses on using the model for cybersecurity tasks. Anthropic and around 50 partners, including major companies like Apple, Google, Microsoft, and Amazon Web Services, have utilized Mythos Preview to identify over 10,000 severe vulnerabilities across critical software infrastructure.

      Anthropic stated that Mythos-class models require enhanced cyber safeguards before a broader release, but the company anticipates making these available to all customers in the near future. This model represents a significant leap in capability compared to Opus 4.7, possessing the ability to autonomously discover zero-day vulnerabilities and formulate exploits for them, which accounts for both the enthusiasm and caution surrounding its deployment.

      A company nearing $1 trillion

      The Opus 4.8 launch coincides

Other articles

You ought to experience Marathon before the internet makes the decision for you. Marathon's Open Play Week is coming at the perfect moment, making it an excellent opportunity for players to experience Bungie's sleek extraction shooter firsthand.

The hybrid model: reasons the most intelligent finance teams are not fully committing to AI. AI is revolutionizing finance processes, but it is unable to create a financial model on its own. The organizations that are truly benefiting combine the speed of machines with the insights of human judgment, rather than relying on one alone.

Waymo launches more affordable Ojai robotaxi manufactured by China's Geely. Waymo's latest Ojai robotaxi reduces its sensors by 42% and is priced at $75,000 less than the Jaguar I-PACE. Manufactured by Geely's Zeekr in China, it is set to debut in three cities across the US.

Anthropic's Claude Opus 4.8 is four times more truthful, followed by Mythos. Anthropic has launched Claude Opus 4.8, featuring improved judgement and reduced instances of uncaught code errors. Mythos-class models are expected to arrive in the coming weeks. Series H has secured $65 billion with a valuation of $965 billion.

Waymo launches a more affordable Ojai robotaxi manufactured by China's Geely. Waymo's latest Ojai robotaxi reduces sensor usage by 42% and is priced $75,000 lower than the Jaguar I-PACE. Manufactured by Geely's Zeekr in China, it will debut in three cities in the US.

BYD introduces China's inaugural 4nm driving chip and enhances God's Eye. BYD's Xuanji A3 is the initial 4nm automotive chip in China, delivering 700 TOPS. The God's Eye driver assistance system is now extending to mainstream electric vehicles as sales decline for the eighth consecutive month.

Claude Opus 4.8 from Anthropic is four times more truthful, followed by Mythos.

Anthropic has launched Claude Opus 4.8, featuring improved judgment and a reduction in uncaught coding errors. Mythos-class models are expected to be introduced in the coming weeks. Series H has successfully raised $65 billion at a valuation of $965 billion.