Google, Microsoft, and xAI have consented to provide pre-release evaluations of their government AI models as the Mythos crisis compels an increase in oversight.

Google, Microsoft, and xAI have consented to provide pre-release evaluations of their government AI models as the Mythos crisis compels an increase in oversight.

      **Summary**: Google, Microsoft, and xAI have joined OpenAI and Anthropic in granting the U.S. Commerce Department pre-release access to assess their AI models, establishing a voluntary oversight system among five major AI labs through an office lacking formal authority and staffed by fewer than 200 individuals. This expansion was prompted by the Mythos crisis and potential executive orders aimed at formalizing the review process.

      The Mythos crisis compelled the U.S. government to address a previously sidestepped issue: how to handle an AI model powerful enough to pose a national security risk without a formal evaluation mechanism before public release. On Tuesday, the Commerce Department announced that Google, Microsoft, and xAI would provide the U.S. government with pre-release access to their AI models for assessment. They join OpenAI and Anthropic, which have been submitting their models for evaluation to the same agency since 2024. Currently, five companies are responsible for a significant portion of global frontier AI advancements, and all have consented to let a single government office examine their systems before they are made public. This arrangement is voluntary, without statutory backing, and does not grant the government the power to prevent a release. Nevertheless, it represents the closest thing to an AI oversight system that the U.S. has and was put in place within two years by a small office.

      The Office

      The Center for AI Standards and Innovation is located within the National Institute of Standards and Technology in the Commerce Department. It was founded under President Biden in 2023 as the AI Safety Institute, reformulated under Trump with a focus on standards and national security instead of safety research. The center has achieved over 40 evaluations of AI models, including advanced systems not yet publicly available. Developers frequently submit versions with reduced safety measures so evaluators can investigate capabilities relevant to national security, including biological weapon synthesis, cyberattack automation, and potentially uncontrollable autonomous behaviors.

      Chris Fall now leads the center, succeeding Collin Burns, a former AI researcher at Anthropic who was ousted by the White House just four days into his appointment. Burns had left Anthropic, forfeited valuable stock, and relocated to accept the government role. His dismissal, reportedly due to his ties with an actively contested company, underscores the political complexities involved in creating an oversight structure for an industry where evaluators and the evaluated share the same talent pool. Trump's broader regulatory strategy on AI has emphasized federal preemption over state laws and a non-intrusive stance toward industry, though the model evaluation program reflects a more assertive approach: the government seeks to understand these systems before anyone else does.

      The Agreements

      The new collaborations with Google, Microsoft, and xAI broaden what was previously a two-company agreement into a more extensive frontier coverage. OpenAI and Anthropic have revised their existing agreements to align with Trump’s AI Action Plan, which instructs the center to lead assessments related to national security and integrates it into a wider "evaluations ecosystem." These agreements are not formal contracts but voluntary commitments that companies can withdraw at any point. There are no legal requirements for pre-release evaluation, and the center lacks the authority to delay or prohibit deployment. The entire setup relies on the AI companies voluntarily choosing to provide early access for strategic reasons.

      From the companies' viewpoint, the alternative could be more stringent legislation. Several proposed bills would grant the center permanent statutory authority, compulsory evaluation requirements, and the ability to set conditions on deployment. The Pentagon has shown a willingness to blacklist AI companies that resist government demands, labeling Anthropic a supply-chain risk after it declined to permit its models for autonomous weapons or mass domestic surveillance. These voluntary evaluation agreements serve, in part, as a means for the remaining companies to demonstrate their cooperation before it becomes obligatory.

      The Catalyst

      The growth of the evaluation program occurs in the context of the Mythos crisis. Anthropic’s innovative model, announced in April, can autonomously find and exploit zero-day vulnerabilities across major operating systems and web browsers, having detected thousands of significant bugs, some of which had persisted undetected for decades. The White House has opposed Anthropic’s efforts to broaden access to Mythos beyond its initial partners. Despite the Pentagon's blacklisting, the NSA is using it. The EU is urging access to Mythos for European cybersecurity efforts, contending that a critical cybersecurity tool cannot remain under the exclusive control of a U.S. company partially blacklisted by the American government.

      Mythos exemplified the type of model the evaluation program aims to identify: one with capabilities that have immediate national security implications and that cannot be assessed post-deployment. The center's over 40 evaluations since 2024 presumably uncovered capabilities in unreleased models that influenced policy decisions, but these evaluations were conducted under agreements with only two companies. Google’s Gemini, Microsoft’s models, and xAI’s Grok had not previously undergone pre-release review by the government. The new agreements close this gap, ensuring that any future model with Mythos-level capabilities reaches government

Other articles

Next-generation DDR6 memory, boasting incredible speeds, is currently in development, but it will be a considerable wait. Next-generation DDR6 memory, boasting incredible speeds, is currently in development, but it will be a considerable wait. Samsung, SK Hynix, and Micron have initiated early development of DDR6 in collaboration with substrate manufacturers, aiming for speeds that surpass DDR5 by more than two times, although mass production remains years in the future. ServiceNow estimates that by 2030, it will reach $30 billion, with one-third of its annual contract value coming from AI. ServiceNow estimates that by 2030, it will reach $30 billion, with one-third of its annual contract value coming from AI. ServiceNow anticipates $30 billion in subscription revenue by 2030, with 30% of that annual contract value (ACV) coming from Now Assist, the company's premier AI product. The presentation during the investor day addresses concerns regarding potential displacement by AI-SaaS. Intel has recruited Qualcomm veteran Alex Katouzian to head a newly formed group focused on Client Computing and Physical AI. Intel has recruited Qualcomm veteran Alex Katouzian to head a newly formed group focused on Client Computing and Physical AI. Intel has brought on Alex Katouzian, who spent 25 years at Qualcomm, to head a newly formed group that combines Client Computing and Physical AI. This marks the second senior recruitment from Qualcomm during CEO Lip-Bu Tan's time in office. Employees at Google DeepMind have decided to unionize following a Pentagon AI agreement that contradicts eight years of ethical commitments. Employees at Google DeepMind have decided to unionize following a Pentagon AI agreement that contradicts eight years of ethical commitments. Staff at DeepMind UK voted 98% in favor of joining the CWU and Unite following Google's signing of a classified Pentagon agreement for "any lawful purpose." They are advocating for the cessation of military AI applications and the reinstatement of ethical standards. The founders of IronSource have secured $60 million at a $500 million valuation for Zyg, an AI platform designed to automate advertising in e-commerce. The founders of IronSource have secured $60 million at a $500 million valuation for Zyg, an AI platform designed to automate advertising in e-commerce. Zyg secured $60 million in funding, with Accel leading the investment, at a valuation of $500 million just two months after emerging from stealth mode. The IronSource team is developing AI agents to take over the roles of human ad buyers for direct-to-consumer (DTC) brands. Intel has appointed Qualcomm veteran Alex Katouzian to head a new Client Computing and Physical AI division. Intel has appointed Qualcomm veteran Alex Katouzian to head a new Client Computing and Physical AI division. Intel has brought on Alex Katouzian, who spent 25 years at Qualcomm, to head a newly formed Client Computing and Physical AI division. This marks the second high-profile recruitment from Qualcomm during CEO Lip-Bu Tan's leadership.

Google, Microsoft, and xAI have consented to provide pre-release evaluations of their government AI models as the Mythos crisis compels an increase in oversight.

Five leading AI laboratories are currently submitting their models for evaluation by the US government. This voluntary initiative lacks legal authority but includes all significant AI developers following the Mythos crisis.