Gemini 3.5 Flash is now capable of viewing and managing your screen, and Google aims to gain the trust of enterprises in this feature.
**TL;DR** Google has integrated computer use as a built-in tool within Gemini 3.5 Flash, replacing the previous standalone Gemini 2.5 model, while introducing enterprise-level safety features.
The integration includes capabilities that allow AI agents to view screens, click, type, and scroll across various devices and browsers, previously requiring a separate model. This functionality is now available natively through the Gemini API and the renamed Gemini Enterprise Agent Platform, previously known as Vertex AI. Developers can now activate computer use as one of several tools within Flash, along with code execution, search, and function calling, which means no more reliance on a dedicated model for agents that operate with graphical interfaces. Mateo Quiros, the product manager, described this new integration as enhancing Flash's ability to observe, reason, and act on screens.
A standalone Gemini computer use model was originally launched in October 2025, aimed at browser-based agent workflows, which achieved about 70% accuracy on the Online-Mind2Web benchmark. This model operated on a screenshot-action loop where developers provided a screen capture, received a structured command in return, executed it, and then updated the view. Integrating this functionality into Flash has streamlined what was previously a two-model operation into a single one.
The enterprise focus emphasizes automation that extends beyond traditional chatbots. Google posits that this tool facilitates continuous software testing, enabling agents to navigate applications and validate functionalities without human testers needing to manually check each screen. Knowledge workers could utilize these agents for various tasks, including completing multi-step browser activities, filling out forms, extracting data from dashboards, and navigating internal applications.
Google has emphasized its safety architecture, applying targeted adversarial training specifically to counter prompt injection attacks—where malicious instructions hidden in webpages or documents mislead AI agents into performing unintended tasks. This concern is backed by research showing that AI agents can indeed be manipulated by the content they encounter during task execution.
Two optional enterprise safeguards complement the base model. The first requires explicit user confirmation before the agent undertakes any action deemed sensitive or irreversible, such as submitting a form or making a purchase. The second safeguard automatically stops the agent if it identifies an attempted indirect prompt injection, ceasing execution to avoid compromising actions.
Both safeguards are optional and not set as defaults. Google recommends a layered approach to security, encouraging developers to implement multiple protections rather than relying solely on one solution. The documentation acknowledges that no single safeguard is sufficient on its own, presenting a more cautious perspective compared to the confident marketing language surrounding other AI features.
The competitive landscape has changed significantly since Anthropic introduced the category. Anthropic's Claude Computer Use can work across operating systems and interact with file systems, adding versatility for desktop workflows. Additionally, Google’s Chrome Enterprise had earlier introduced agentic browsing features this year, such as Auto Browse for automated multi-step tasks.
The new Flash integration expands this capability beyond Chrome to any screen visible to an agent. OpenAI has also entered this space, and the three companies now compete on various fronts. For enterprise clients, the key concern seems to be not just which model can click a button but which one can do so safely within a regulated context.
Google has not released updated benchmark scores comparing the built-in Flash tool for computer use with the previous standalone model. The company has not provided information on how many enterprises are using this capability or shared case studies with named clients. Claims regarding targeted adversarial training for prompt injection appear in the accompanying blog post but lack supporting published research or red-team findings.
The Gemini Enterprise Agent Platform offers the tool with pay-as-you-go pricing. Flash is one of the more affordable models in Google's offerings, potentially making computer use more accessible for large-scale automation compared to using a heavier model. Whether this cost advantage remains depends on the number of actions typical agent workflows require and how often safety measures interrupt execution to ask for confirmation.
The field of computer use in AI is still in its early stages. While the models can handle familiar interfaces, they often struggle with unexpected pop-ups, CAPTCHAs, dynamically loaded content, and unfamiliar layouts. Google's choice to make this capability a built-in tool instead of a standalone model reflects confidence in its readiness for general use, but the opt-in safety measures indicate an understanding that it is not yet suitable for unsupervised operation.
Other articles
Gemini 3.5 Flash is now capable of viewing and managing your screen, and Google aims to gain the trust of enterprises in this feature.
Google has integrated computer use as a core feature in Gemini 3.5 Flash, replacing the standalone model and incorporating enterprise safety measures.
