Google, Microsoft, and xAI have consented to share evaluations of AI models with the government before their release, as the Mythos crisis necessitates an increase in oversight.

Google, Microsoft, and xAI have consented to share evaluations of AI models with the government before their release, as the Mythos crisis necessitates an increase in oversight.

      **Summary:** Google, Microsoft, and xAI have partnered with OpenAI and Anthropic to grant the US Commerce Department pre-release access to their AI models for evaluation, establishing a voluntary oversight mechanism among the five major frontier AI labs. This development follows the Mythos crisis and a potential executive order aimed at formalizing the review process.

      The Mythos crisis compelled the US government to address the risks posed by powerful AI models that could jeopardize national security without any formal evaluation process. On Tuesday, the Commerce Department revealed that Google, Microsoft, and xAI will provide the government with pre-release access to their AI models. They join OpenAI and Anthropic, which have been submitting their models to the office since 2024. Now, these five companies, representing most of the global frontier AI development, have consented to allow a single government office to assess their systems before deployment. This arrangement, which lacks statutory grounding, gives the government no authority to block releases and serves as the closest approximation to an AI oversight system in the U.S., developed in under two years by an office with fewer than 200 personnel.

      **The Office:**

      The Center for AI Standards and Innovation operates within the Commerce Department’s National Institute of Standards and Technology. It was established by President Biden in 2023 as the AI Safety Institute, but was rebranded under Trump to underline a focus on standards and national security rather than solely on safety research. The center has conducted over 40 evaluations of AI models, including advanced systems that have never been publicly released. Developers often submit versions of their models with reduced safety measures to allow evaluators to investigate national security-relevant capabilities, including autonomous weapon synthesis and cyberattack automation.

      Chris Fall currently leads the center after the sudden departure of Collin Burns, a former AI researcher at Anthropic. Burns was appointed to the position but left after just four days, reportedly due to political pressures stemming from his ties to a company the administration was contesting. His removal highlights the complexities of creating an oversight framework for an industry where evaluators and subjects share the same talent pool. Trump’s broader AI regulatory stance has emphasized minimizing federal oversight of state regulations and adopting a relaxed approach toward industry, but the model evaluation program underscores a tougher stance: the government aims to comprehend these systems' capabilities before they enter public usage.

      **The Agreements:**

      The new collaborations with Google, Microsoft, and xAI enhance what was previously a two-company agreement, making it more comprehensive in terms of frontier coverage. OpenAI and Anthropic have revised their existing agreements to align with Trump’s AI Action Plan, which assigns the center to oversee national security-related model assessments, integrating it into a larger "evaluations ecosystem." These agreements are not formal contracts; they are voluntary pledges that companies can rescind at any time. There are no legal requirements mandating pre-release evaluations, and the center has no authority to delay or prevent deployment, relying instead on companies’ strategic interests to grant early government access.

      From the companies' perspectives, the alternative to voluntary participation is new legislation. Several proposed bills could bestow the center with permanent legal authority, mandatory evaluation conditions, and the ability to set deployment terms. The Pentagon has already indicated its readiness to blacklist AI companies resisting government requests, labeling Anthropic as a supply-chain risk after it declined to permit its models for use in autonomous weapons or mass surveillance. The voluntary agreements thus serve partly as a means for the remaining companies to illustrate cooperation before regulation potentially becomes enforced.

      **The Catalyst:**

      The expansion of the evaluation program is occurring against the backdrop of the Mythos crisis, wherein Anthropic’s advanced model was reported to autonomously uncover and exploit vulnerabilities across major operating systems and web browsers. This model has detected numerous critical bugs, some of which had remained hidden for decades. The White House has opposed Anthropic’s attempts to broaden access to Mythos beyond its select launch partners, while the NSA has utilized it despite the Pentagon's blacklist of the company. Additionally, the EU demands access to Mythos for European cyber defense, arguing that a tool with significant cybersecurity capabilities cannot remain under the exclusive control of a partially blacklisted American entity.

      Mythos exemplified the risk the evaluation program aims to intercept: a model with capabilities that pose immediate national security concerns but cannot be appraised post-deployment. The center’s evaluations since 2024 have likely uncovered features in unreleased models that inform policy-making, but these examinations were conducted under terms with only two companies. Until now, Google’s Gemini, Microsoft’s models, and xAI’s Grok were not subject to government review prior to release. The new agreements bridge this gap, ensuring that any future model with Mythos-level capabilities reaches government evaluators before public distribution, regardless of its origin.

      **The Limits:**

      The program's structural shortcoming is evident: it relies solely on voluntary participation. A company that identifies dangerous capabilities in its model could opt not to submit it for evaluation and choose to release

Other articles

Metalenz's innovative face scanning technology is embedded beneath the phone's display, eliminating the need for unsightly cutouts. Metalenz's innovative face scanning technology is embedded beneath the phone's display, eliminating the need for unsightly cutouts. Metalenz has demonstrated that payment-grade facial authentication can function with a fully powered-on display, a feat that Apple has attempted to achieve for years without success. Founders of IronSource secure $60 million at a valuation of $500 million for Zyg, an agentic AI platform designed to automate advertising in e-commerce. Founders of IronSource secure $60 million at a valuation of $500 million for Zyg, an agentic AI platform designed to automate advertising in e-commerce. Zyg secured $60 million in funding, led by Accel, at a valuation of $500 million just two months following their stealth phase. The IronSource team is developing AI agents designed to take the place of human ad buyers for direct-to-consumer brands. Fervo Energy initiates a $1.33 billion IPO, marking the biggest climate-tech listing of 2026. Fervo Energy initiates a $1.33 billion IPO, marking the biggest climate-tech listing of 2026. Fervo Energy has initiated its IPO roadshow, aiming to raise up to $1.33 billion by offering shares at a price range of $21 to $24. This Nasdaq listing is presented as the climate-tech platform for the AI infrastructure sector. Nscale has invested €695 million in Portugal, while Microsoft contributes to a crypto-to-AI neocloud that has reached a valuation of $14.6 billion in just two years. Nscale has invested €695 million in Portugal, while Microsoft contributes to a crypto-to-AI neocloud that has reached a valuation of $14.6 billion in just two years. Nscale is set to provide 66,000 Nvidia Rubin GPUs to Microsoft's 1.2 GW Sines campus. The company, which transitioned from crypto mining, is now the most valuable AI infrastructure startup in Europe, boasting a valuation of $14.6 billion. Employees at Google DeepMind have decided to unionize following a Pentagon AI agreement that contradicts eight years of ethical commitments. Employees at Google DeepMind have decided to unionize following a Pentagon AI agreement that contradicts eight years of ethical commitments. Staff at DeepMind UK voted 98% in favor of joining the CWU and Unite following Google's signing of a classified Pentagon agreement for "any lawful purpose." They are advocating for the cessation of military AI applications and the reinstatement of ethical standards. Nscale has invested €695 million in Portugal, while the crypto-to-AI neocloud has reached a valuation of $14.6 billion in just two years with the support of Microsoft. Nscale has invested €695 million in Portugal, while the crypto-to-AI neocloud has reached a valuation of $14.6 billion in just two years with the support of Microsoft. Nscale will provide 66,000 Nvidia Rubin GPUs to Microsoft's 1.2 GW Sines campus. Once a crypto miner, it is now Europe's most valuable AI infrastructure startup, valued at $14.6 billion.

Google, Microsoft, and xAI have consented to share evaluations of AI models with the government before their release, as the Mythos crisis necessitates an increase in oversight.

Five leading AI laboratories are now providing models for evaluation by the US government. This voluntary program lacks statutory authority but includes all significant AI developers following the Mythos crisis.