Your AI browser may be vulnerable to hijacking through prompt injection; OpenAI has recently fixed Atlas.
OpenAI has announced a security update for ChatGPT Atlas in response to the discovery of a new type of agent-in-browser attacks by an internal automated red team. This update includes a model trained to anticipate adversarial actions and enhanced security measures.
Atlas operates in agent mode, mimicking user actions within the browser by being able to view pages, click, and type to complete tasks in the same context. However, this also increases its vulnerability, as it may encounter untrusted text in emails, shared documents, forums, social media, and any webpage it accesses.
The main warning from the company is straightforward: hackers may deceive the agent's decision-making process by embedding instructions within the information it processes during tasks.
The consequences of concealed instructions can be significant. OpenAI illustrates this with a scenario where an attacker sends a malicious email to a user's inbox with instructions intended for the agent. When the user requests Atlas to draft an out-of-office reply, the agent inadvertently treats the malicious instructions as legitimate. Instead of creating the out-of-office message, the agent sends a resignation letter to the user's CEO.
An attacker could manipulate third-party content within a legitimate workflow by hiding commands in what appears to be ordinary text, thus overriding the user's request.
To identify these vulnerabilities earlier, OpenAI has developed an automated attacker model and employed reinforcement learning to detect prompt-injection weaknesses against a browser agent. The aim is to rigorously test detailed workflows rather than just aiming for isolated erroneous outputs.
This attacker model can draft potential injections, simulate how the target agent would respond, and refine its strategy based on feedback from the observed reasoning and action paths. OpenAI believes that access to these traces provides its internal red team with an edge that external attackers lack.
OpenAI considers prompt injection a long-term security issue, akin to online scams rather than a one-time fix. Their strategy involves identifying new attack methods, training against them, and enhancing system-level protections.
For users, it is advisable to browse while logged out when possible, carefully review confirmations for actions such as sending emails, and provide agents with precise, narrow instructions instead of broad "handle everything" requests. If you're interested in AI browsing capabilities, it is better to choose browsers that offer updates that enhance your experience.
Other articles
Your AI browser may be vulnerable to hijacking through prompt injection; OpenAI has recently fixed Atlas.
OpenAI announced that it has fixed ChatGPT Atlas following internal red teaming that uncovered new prompt injection attacks capable of taking control of AI browser agents. The update includes an adversarially trained model and enhanced protections.
