Microsoft's MAI-Image-2 ranks among the top three AI image generators.

      The second iteration of Microsoft’s internal image model has secured the #3 spot on Arena.ai’s leaderboard, trailing only Google and OpenAI, and is set to roll out across Copilot and Bing Image Creator starting today. A year ago, Microsoft primarily relied on OpenAI’s models for image generation in Bing and Copilot. On Thursday, the company's internal team introduced MAI-Image-2, a next-generation image model that has made its debut at third place on the Arena.ai text-to-image leaderboard, positioning Microsoft’s technology just behind Google’s Gemini 3.1 Flash and OpenAI’s GPT Image 1.5.

      This announcement originated from the Microsoft AI Superintelligence team, the internal research group established by Mustafa Suleyman in November 2025, which he now leads full-time following a recent leadership restructuring at Microsoft announced just two days prior. Mustafa Suleyman reduced his broader CEO responsibilities at Microsoft AI on Monday to concentrate solely on this team and its advanced model initiatives. MAI-Image-2 is the first model to be publicly released since this transition.

      The previous model, MAI-Image-1, was launched in October 2025 and entered the top ten on LMArena, a crowd-sourced preference leaderboard that had a slightly different name at the time. It was Microsoft's inaugural image generation model created entirely in-house and was integrated into Bing Image Creator and Copilot alongside DALL-E 3 and GPT-4o. MAI-Image-2 continues this trajectory, developed with feedback from photographers, designers, and visual storytellers, specifically addressing three areas identified as having the largest gaps by creatives.

      The first focus area is photorealism, encompassing natural lighting, accurate skin tones, and environments with physical textures and signs of wear. Microsoft claims that the model aims to decrease the post-production work currently required between the generation phase and usable output.

      The second area pertains to in-image text: MAI-Image-2 is engineered to effectively render readable text within scenes, which includes everything from signs to infographics and typographic layouts, a challenge many image models still face in producing consistent and precise characters.

      The third area is detailed scene generation: this involves dense compositions, surreal concepts, cinematic framing, and imaginative work where accurate prompting and high quality are crucial.

      Access to MAI-Image-2 is being rolled out through various channels. The MAI Playground, Microsoft’s public testing environment at playground.microsoft.ai, currently has the model available. Additionally, MAI-Image-2 is starting to be integrated into Copilot and Bing Image Creator. Enterprise customers can use the model through API access today, and Microsoft indicates that API access will soon be available to any developer via Microsoft Foundry, though no specific timeline has been provided for this wider availability. A commercial application form is accessible for organizations interested in large-scale image generation.

      The announcement also highlighted that the team’s next-generation GB200 compute cluster is now operational, referring to NVIDIA’s Blackwell-architecture hardware, although no specifics on the cluster's scale were shared. This infrastructure claim seems to provide context for the upcoming models the superintelligence team plans to unveil, rather than offering a technically verifiable specification.

      The speed of progress is noteworthy. In August 2025, Microsoft revealed its first in-house voice model (MAI-Voice-1) and its initial text model preview (MAI-1-preview). MAI-Image-1 followed in October. Now, just five months later, the second image generation model is already ranking among the top three on the most recognized crowd-sourced image leaderboard in the industry.

      This pace indicates the superintelligence team is moving more rapidly compared to Microsoft’s traditionally slower consumer product cycles and is doing so with hardware and infrastructure that the company increasingly owns rather than leases from OpenAI.

Other articles

Alpine Eagle is increasing its production of counter-drone systems. The Munich-based counter-drone startup Alpine Eagle is expanding its Sentinel production through a new facility and a partnership with DeltaQuad.

DoorDash introduces Tasks. DoorDash has introduced Tasks, an independent app that compensates Dashers for filming home chores and recording audio to help train AI models.

TACEO introduces its private execution network. Austrian startup TACEO has introduced the TACEO Network, a private execution layer that is currently operational within World ID's 'proof of human' verification for 18 million users.

Bluesky secures $100 million in Series B funding as a new CEO steps in. Bluesky announced a $100M Series B funding round led by Bain Capital Crypto, which concluded last April, and has appointed a new CEO.

Hydrogen fuel vehicles failed to gain popularity, but they could potentially power the next generation of long-range drones. Hydrogen has not been successful in powering cars, but scientists in Norway have developed a drone that operates on hydrogen, replacing batteries with a fuel cell to manage long-distance tasks such as inspecting power lines.

AI analytics agents require limitations rather than increased model size. AI analytics agents require safeguards rather than larger models. Discover why governed data, common definitions, and semantic layers are more crucial than the size of the model.

Microsoft's MAI-Image-2 ranks among the top three AI image generators.

Microsoft's MAI-Image-2 launches at #3 on Arena.ai's text-to-image leaderboard, trailing Google and OpenAI, and starts to be implemented on Copilot.