
ChatGPT now understands images more effectively than both an art critic and an investigator together.
ChatGPT’s recent advancements in image generation have challenged our previous perceptions of AI-generated media. The newly announced GPT-4o model showcases impressive capabilities in accurately interpreting images and recreating them, drawing inspiration from Studio Ghibli-like effects. It also effectively manages text within AI-generated images, a task that has proven tricky for AI in the past. Now, it is introducing two new models that can analyze images for cues to extract significantly more information than a human might miss.
Earlier this week, OpenAI unveiled two new models that enhance ChatGPT’s cognitive abilities. The brand new o3 model, touted by OpenAI as its “most powerful reasoning model,” enhances existing skills in interpretation and perception, claiming improvements in “coding, math, science, visual perception, and more.” On the other hand, the o4-mini is designed as a smaller and quicker model for “cost-efficient reasoning” in these same areas. This announcement follows the recent introduction of the GPT-4.1 series of models, which offers faster processing and more profound contextual understanding.
ChatGPT is now “thinking with images.”
With their improved reasoning capabilities, both models can now incorporate images into their thought processes, which enables them to “think with images,” according to OpenAI. This update allows the models to include images in their reasoning sequences. Beyond simple image analysis, the o3 and o4-mini models can conduct thorough investigations of images and even manipulate them, performing actions such as cropping, zooming, flipping, or enhancing details to extract visual cues that could potentially elevate ChatGPT’s problem-solving abilities.
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically utilize and combine all available tools within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE— OpenAI (@OpenAI) April 16, 2025
With this announcement, the models are said to merge visual and textual reasoning, which can be fused with other ChatGPT functionalities such as web search, data analysis, and code creation, and are expected to form the foundation for more advanced AI agents with multimodal analysis.
In terms of practical applications, users can anticipate incorporating images of various items, including flowcharts or handwritten notes, as well as images of real-world objects, with ChatGPT developing a deeper understanding for improved output, even without a descriptive text prompt. This advancement brings OpenAI closer to Google’s Gemini, which boasts the remarkable ability to interpret the real world through live video.
Despite these ambitious claims, OpenAI is restricting access exclusively to paid members, likely to prevent GPU overloads as it strives to meet the computational demands of the new reasoning features. Currently, the o3, o4-mini, and o4-mini-high models will be available solely to ChatGPT Plus, Pro, and Team members, while users in the Enterprise and Education tiers will gain access in one week. Meanwhile, free users will have limited access to the o4-mini feature when they choose the “Think” option in the prompt bar.
Other articles






ChatGPT now understands images more effectively than both an art critic and an investigator together.
The recent image generation features of ChatGPT have posed a challenge to our earlier perceptions of AI-produced media. The newly unveiled GPT-4o model showcases impressive skills in accurately interpreting images and recreating them with viral appeal, reminiscent of styles like those from Studio Ghibli. It has also excelled in handling text within AI-generated images, a task that has been challenging for AI in the past. And [...]