OpenAI's latest image model analyzes before creating visuals.
The new model engages in composition reasoning, searches the internet for context, produces up to eight coherent images from a single prompt, and generates text in non-Latin scripts with nearly perfect precision. It also achieved the top position on the Image Arena leaderboard within 12 hours of its launch, by the largest margin ever recorded.
Two years ago, requesting ChatGPT to create a visual was akin to hiring a fatigued intern armed with a glue stick and an injury to design a poster. You would request a polished design and receive an image filled with "leftover creativity," along with three invented words appearing as if they resulted from a minor software glitch.
The images came out as AI-generated in a way that has become synonymous with the uncanny: almost accurate, strikingly incorrect, and readily identifiable as artificial.
This significant advancement matters. Text rendering has long been a noticeable weakness of AI image generators since DALL-E captured attention in January 2021, a model that was then regarded as a captivating curiosity.
Images 2.0 boasts around 99% accuracy in text rendering across various languages and scripts, including Japanese, Korean, Chinese, Hindi, and Bengali. If this figure holds up in independent evaluations, it bridges the gap between a mere “impressive AI demonstration” and a “tool that a graphic designer would truly use for production work.”
What distinguishes the model—making it different rather than just better—is what OpenAI dubs its “thinking capabilities.” Images 2.0 is the company’s inaugural image model to incorporate its O-series reasoning architecture.
Prior to producing a pixel, the model examines the prompt, plans the composition, assesses spatial relationships between elements, and can search the web for real-time context. It is framed by OpenAI not as a rendering tool but as a “visual thought partner.”
This is my cat reimagined as a comic strip using ChatGPT.
In practice, this translates into two modes of access. Instant mode is available to all ChatGPT users, including those on free-tier accounts, and offers core quality enhancements: improved text, sharper editing, and richer layouts.
Thinking mode, which allows for web searching, multi-image batching, and output verification, is limited to subscribers of Plus ($20/month), Pro ($200/month), Business, and Enterprise plans.
This distinction is significant commercially. The reasoning capabilities, where the majority of the quality enhancement lies, are behind the paywall. Free users receive improved images; paying users gain access to images that the model has deliberated over.
The multi-image feature is likely to transform professional workflows. A single prompt can now yield up to eight images that maintain continuity of characters and objects throughout the set.
This means a designer can create a series of social media assets, a succession of children's book illustrations, or a collection of storyboard frames from one request, maintaining a consistent visual identity throughout.
Previously, each image had to be requested individually and manually assembled. For marketing teams and content creators, this represents a substantial reduction in production friction.
Integration with Codex, OpenAI’s coding environment, is a strategically impactful move. Developers and designers can now create UI mockups, prototypes, and visual assets within the same workspace they use for code, presentations, and browser automation, all under one ChatGPT subscription.
The image model has transitioned from being a standalone product to a feature incorporated into OpenAI’s larger platform, now competing not only with Midjourney and Google’s Nano Banana 2 on quality but also with Canva and Figma in terms of workflow integration.
The benchmark performance is impressive. Within just 12 hours of its launch, Images 2.0 secured the top position on the Image Arena leaderboard across all categories, achieving a score of 1,512, a lead of +242 points over the second-place model, Google’s Nano Banana 2, marking the largest advantage ever recorded on the leaderboard.
For most of 2026, OpenAI and Google had been closely contesting the top spot; Images 2.0 has decisively moved ahead.
DALL-E 2 and DALL-E 3 are set to be deprecated and retired on May 12, 2026. Meanwhile, GPT-Image-1.5, which was introduced in December 2025 as an intermediary upgrade, will remain accessible through the API for legacy integrations but will no longer serve as the default model.
OpenAI did not reveal the architecture of Images 2.0, referring to it merely as a “generalist model” or “GPT for images,” and did not clarify whether it employs a diffusion, autoregressive, or hybrid method. The API model identifier is gpt-image-2; developers are expected to gain access to the API in early May 2026.
Token-based pricing is set at $8 per million tokens for image input, $2 for cached input, and $30 for image output, with per-image costs generally varying
Other articles
OpenAI's latest image model analyzes before creating visuals.
OpenAI's ChatGPT Images 2.0 is its inaugural image model that incorporates reasoning abilities: it organizes compositions, conducts web searches, and generates text in various scripts.
