Anthropic's Claude Opus 4.8 is four times more truthful, followed by Mythos.

Anthropic's Claude Opus 4.8 is four times more truthful, followed by Mythos.

      **TL;DR** Anthropic has launched Claude Opus 4.8, an enhancement of its main AI model that is significantly better at identifying code errors. The company also hinted at upcoming Mythos-class models, which have already detected over 10,000 critical vulnerabilities through Project Glasswing. Additionally, Anthropic announced a Series H funding round of $65 billion, resulting in a post-money valuation of $965 billion.

      Anthropic has unveiled Claude Opus 4.8, a refined version of its primary AI model, claiming it is more truthful, reliable for agentic tasks, and better at recognizing its errors. The model is now available at the same pricing structure as its predecessor: $5 per million input tokens and $25 per million output tokens, and will be integrated into all of Anthropic’s products including claude.ai, Claude Code, and the API.

      The main enhancement is its honesty. According to Anthropic, Opus 4.8 is approximately four times less likely than Opus 4.7 to overlook flaws in self-generated code. Initial testers note that the model is more inclined to acknowledge uncertainties and is less prone to making unsupported statements, a common issue in AI models that often express unwarranted confidence.

      **Benchmark improvements across all metrics**

      Opus 4.8 shows advancements over its predecessor in Anthropic's established benchmarks. The score for agentic coding (Terminal-Bench 2.1) increased from 64.3% to 69.2%. For multidisciplinary reasoning with tools, there was an improvement from 54.7% to 57.9%. The score for agentic computer use rose from 82.8% to 83.4%, and knowledge work ratings improved from 1,753 to 1,890.

      Anthropic’s alignment evaluation indicated that Opus 4.8 achieved records in prosocial traits, demonstrating support for user autonomy and acting in users' best interests. Instances of misaligned behavior, such as deception or complicity in misuse, are considerably lower than in Opus 4.7 and comparable to Claude Mythos Preview, Anthropic’s most well-aligned model.

      **Early testers report practical improvements**

      The launch has received positive feedback from companies already utilizing the model. Cognition, which developed the AI coding agent Devin, noted that Opus 4.8 utilizes tools effectively and addresses verbosity and tool-calling issues seen in Opus 4.7. Cursor, an AI-driven code editor, reported enhancements across all performance levels in its CursorBench assessment.

      Harvey, which specializes in AI for legal applications, mentioned that Opus 4.8 achieved the highest score noted on its Legal Agent Benchmark and was the first to exceed the 10% threshold on the all-pass standard. Databricks indicated that Opus 4.8 processes more complex, multi-step inquiries more quickly and at a 61% lower token cost than Opus 4.7.

      Thomson Reuters mentioned that CoCounsel Legal experienced significant improvements in consistency and reasoning quality. Hebbia, focused on AI for financial document analysis, reported better citation accuracy and token efficiency in retrieval tasks.

      **New features introduced with the model**

      Alongside Opus 4.8, Anthropic is rolling out several new features. A new effort control in claude.ai and Cowork allows users to determine how much computation Claude dedicates to a response, balancing speed and quality. Claude Code has introduced a dynamic workflow feature that enables planning and running hundreds of parallel subagents in a single session, facilitating codebase-scale migrations involving vast amounts of code.

      For developers, the Messages API now supports the inclusion of system entries within the messages array, enabling task instructions to be modified mid-process without disrupting the prompt cache. The fast mode for Opus 4.8 operates at 2.5 times the speed and is now a third cheaper than it was for prior models.

      **Mythos is the more significant development**

      The larger story may be what is anticipated next. Anthropic plans to release a new breed of model with greater intelligence than Opus, based on the Claude Mythos framework. A select group of organizations is already using Claude Mythos Preview through Project Glasswing, which focuses on employing the model for cybersecurity tasks. Anthropic, along with about 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, has utilized Mythos Preview to identify over 10,000 high- or critical-severity vulnerabilities across essential software infrastructures.

      Although Mythos-class models require enhanced cybersecurity measures prior to general distribution, Anthropic intends to make them available to all customers shortly. This model outperforms Opus 4.7 and is capable of autonomously detecting zero-day vulnerabilities and creating corresponding exploits, which creates both excitement and caution regarding its deployment.

      **A company approaching a trillion-dollar valuation**

      The launch of Opus 4.8 coincides

Other articles

BYD introduces China's inaugural 4nm driving chip and enhances God's Eye. BYD introduces China's inaugural 4nm driving chip and enhances God's Eye. BYD's Xuanji A3 is the initial 4nm automotive chip in China, delivering 700 TOPS. The God's Eye driver assistance system is now extending to mainstream electric vehicles as sales decline for the eighth consecutive month. On this Scary Movie website, you can control Ghostface however you like. On this Scary Movie website, you can control Ghostface however you like. Scary Movie has introduced an interactive website in advance of its June 5 release, allowing fans to input commands and see Ghostface perform them on screen. On this Scary Movie website, you have the ability to control the Ghostface as you wish. On this Scary Movie website, you have the ability to control the Ghostface as you wish. Scary Movie has unveiled an interactive website in anticipation of its June 5 release, allowing fans to input commands and see Ghostface execute them on screen. The camera app on the iPhone is set to receive a significant update, and this might be our initial glimpse of it. The camera app on the iPhone is set to receive a significant update, and this might be our initial glimpse of it. Apple's iOS 27 update will introduce a significant redesign of the Camera app on your iPhone, featuring Siri integration, new AI editing features, and a completely customizable interface. The hybrid model: reasons the most intelligent finance teams are not fully committing to AI. The hybrid model: reasons the most intelligent finance teams are not fully committing to AI. AI is revolutionizing finance processes, but it is unable to create a financial model on its own. The organizations that are truly benefiting combine the speed of machines with the insights of human judgment, rather than relying on one alone. Waymo launches a more affordable Ojai robotaxi manufactured by China's Geely. Waymo launches a more affordable Ojai robotaxi manufactured by China's Geely. Waymo's latest Ojai robotaxi reduces sensor usage by 42% and is priced at $75,000 lower than the Jaguar I-PACE. Manufactured by Geely's Zeekr in China, it will be introduced in three cities across the United States.

Anthropic's Claude Opus 4.8 is four times more truthful, followed by Mythos.

Anthropic has launched Claude Opus 4.8, featuring improved judgement and reduced instances of uncaught code errors. Mythos-class models are expected to arrive in the coming weeks. Series H has secured $65 billion with a valuation of $965 billion.