ChatGPT now understands images more effectively than both an art critic and an investigator together.

      ChatGPT’s recent advancements in image generation have challenged our previous perceptions of AI-generated media. The newly announced GPT-4o model showcases impressive capabilities in accurately interpreting images and recreating them, drawing inspiration from Studio Ghibli-like effects. It also effectively manages text within AI-generated images, a task that has proven tricky for AI in the past. Now, it is introducing two new models that can analyze images for cues to extract significantly more information than a human might miss.

      Earlier this week, OpenAI unveiled two new models that enhance ChatGPT’s cognitive abilities. The brand new o3 model, touted by OpenAI as its “most powerful reasoning model,” enhances existing skills in interpretation and perception, claiming improvements in “coding, math, science, visual perception, and more.” On the other hand, the o4-mini is designed as a smaller and quicker model for “cost-efficient reasoning” in these same areas. This announcement follows the recent introduction of the GPT-4.1 series of models, which offers faster processing and more profound contextual understanding.

      ChatGPT is now “thinking with images.”

      With their improved reasoning capabilities, both models can now incorporate images into their thought processes, which enables them to “think with images,” according to OpenAI. This update allows the models to include images in their reasoning sequences. Beyond simple image analysis, the o3 and o4-mini models can conduct thorough investigations of images and even manipulate them, performing actions such as cropping, zooming, flipping, or enhancing details to extract visual cues that could potentially elevate ChatGPT’s problem-solving abilities.

      Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically utilize and combine all available tools within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE— OpenAI (@OpenAI) April 16, 2025

      With this announcement, the models are said to merge visual and textual reasoning, which can be fused with other ChatGPT functionalities such as web search, data analysis, and code creation, and are expected to form the foundation for more advanced AI agents with multimodal analysis.

      In terms of practical applications, users can anticipate incorporating images of various items, including flowcharts or handwritten notes, as well as images of real-world objects, with ChatGPT developing a deeper understanding for improved output, even without a descriptive text prompt. This advancement brings OpenAI closer to Google’s Gemini, which boasts the remarkable ability to interpret the real world through live video.

      Despite these ambitious claims, OpenAI is restricting access exclusively to paid members, likely to prevent GPU overloads as it strives to meet the computational demands of the new reasoning features. Currently, the o3, o4-mini, and o4-mini-high models will be available solely to ChatGPT Plus, Pro, and Team members, while users in the Enterprise and Education tiers will gain access in one week. Meanwhile, free users will have limited access to the o4-mini feature when they choose the “Think” option in the prompt bar.

Other articles

A new render of the iPhone 17 Pro leaves us speculating about its final design. A new render of the iPhone 17 Pro has surfaced, reinforcing the notion that it will not feature a two-toned rear.

Paebbl launches the 'world's first' demonstration plant that converts CO2 into dust. Paebbl, the startup that converts CO2 into dust, has launched its inaugural demo plant in Rotterdam, Netherlands.

Could this exoplanet be 'full of life'? Clues of life beyond our solar system. A fascinating discovery suggests the potential for life beyond our solar system.

The most intriguing feature of Gemini is now available for free to all Android users. Google has announced that the impressive Gemini screen sharing and camera functionality will be accessible to all Android users.

Mercury: superior to a bank, it's the financial operating system for your startup. Inquire with any entrepreneur, and they'll affirm that banking is just the tip of the iceberg when it comes to startup finance. The real priorities are maintaining cash flow, securing funding at crucial moments, and avoiding being overwhelmed by administrative tasks. Mercury understands this well. It was specifically designed for these challenges. Instead of being just another SaaS solution with an attractive user interface […]

Enjoyable questions to pose to ChatGPT now that it can recall all previous information. ChatGPT can retain all your previous conversations, which means it understands you like never before. Here are some enjoyable ways to take advantage of that.

ChatGPT now understands images more effectively than both an art critic and an investigator together.

The recent image generation features of ChatGPT have posed a challenge to our earlier perceptions of AI-produced media. The newly unveiled GPT-4o model showcases impressive skills in accurately interpreting images and recreating them with viral appeal, reminiscent of styles like those from Studio Ghibli. It has also excelled in handling text within AI-generated images, a task that has been challenging for AI in the past. And [...]