It appears that educating small AI models using games like Battleship can significantly enhance their intelligence.

      By transforming Battleship into a platform for AI training, researchers improved the reasoning efficiency of smaller models.

      Small AI models have unexpectedly gained an advantage thanks to an old game.

      Researchers at MIT utilized a Battleship-like configuration to assess whether AI agents could enhance their information-gathering strategies prior to making a move. The outcome was a significant increase in performance for smaller models, including one that improved from winning just a few games against humans to securing victories in the majority after modifying its board search tactics.

      This adjustment directly addresses a key limitation in current AI agents. These agents frequently encounter tasks where the solution relies on information they do not yet possess. The findings from MIT indicate that improved questioning strategies can enable a more affordable model to perform at a much higher level.

      How much smarter did it get?

      MIT's experiment utilized a version of Battleship that revolved around natural language queries. One AI agent acted as the teammate searching for concealed ships, while another had access to the game board and provided answers.

      Digital Trends

      The most noteworthy improvement was seen with Llama 4 Scout. Initially, MIT reported that this smaller model only defeated human players in 8% of its games. However, after implementing a more strategic inference approach, its win rate soared to 82%, surpassing that of a larger frontier model while functioning at approximately 1% of the cost.

      This cost-efficiency metric is crucial for those interested in AI expenses. The model's success did not come from increasing its size but rather from formulating better questions and optimizing the use of each response.

      Why does Battleship aid AI learning?

      Battleship serves as an effective testing ground because it compels an AI agent to operate with incomplete information. Since the agent cannot see the entire board, each question must refine the search and prepare for the next move.

      This concept neatly aligns with real-world AI applications. A support bot, research assistant, or planning agent often needs to ask follow-up questions before it can provide assistance. When this process falters, the model risks overlooking important details, repeating itself, or making premature recommendations.

      Fatemeh Rezvani / Unsplash

      MIT's method puts pressure on this vulnerability by assessing whether an agent can collect the necessary information before formulating a response.

      Where could this lead next?

      A more challenging test will be determining whether the same methodology applies outside of games. Battleship’s controlled environment makes it simpler to evaluate compared to the unpredictable workflows in areas like search, customer support, or workplace software.

      Nonetheless, this direction merits attention. If smaller models become adept at asking more precise questions before taking action, companies might develop more affordable AI tools that appear more competent in everyday applications.

      The next milestone will be adapting from the game setting to real-world tasks. Scenarios with vague instructions, missing documents, and hurried users will present a much greater challenge.

      Paulo Vargas is an English major who has transitioned to being a reporter and then a technical writer, consistently returning to themes of technology and communication.

      Gemini could soon introduce a troubleshooting mode, saving you the need to consult help manuals.



      Gemini's new Troubleshooting mode provides step-by-step solutions using text responses and interactive widgets.

      Google may have inadvertently revealed future developments for Gemini. According to TestingCatalog, a new Troubleshooting mode has quietly appeared in the Gemini model selection menu for some users. This option is alongside current choices like Gemini 3.5 Flash and 3.1 Pro, which are the standard AI models you can choose from in the app.

      Read more

      Apple may release the MacBook Ultra in two different sizes with a unique OLED display.

      A new report sheds light on the display size, OLED technology, and launch timeframe for the MacBook Ultra.

      Apple's anticipated MacBook Ultra is becoming one of the most significant redesigns of the Mac in recent years, and a new industry report suggests it may arrive sooner than expected. Research firm Omdia has published a new study on OLED adoption in laptops, which contains specific insights regarding Apple's upcoming MacBook.

      What screen sizes will the MacBook Ultra offer?

      Read more

      You can now send emails directly from ChatGPT on the web.

      You can draft, edit, and send emails directly within ChatGPT on the web now.

      If you've ever had to copy an email you drafted in ChatGPT to send it via Gmail or Outlook, you can now eliminate that step. OpenAI has introduced the feature that allows users to send emails directly from writing blocks in ChatGPT on the web, streamlining the entire process within a single conversation from start to finish.

      Read more

Other articles

The next-generation Siri in iOS 27 may initially launch as a beta experience in its early stages. The upcoming version of Siri from Apple may debut with iOS 27, but a recent report indicates that users might initially receive an incomplete version. It seems Apple intends to approach the assistant as an ongoing test rather than a finalized product.

Smartphone displays are set to reach absurd refresh rate levels similar to those found on gaming monitors. A 120Hz refresh rate used to seem excessive for a smartphone. However, a recent leak indicates that OnePlus may be pursuing refresh rates that seem more suitable for a gaming monitor than for a mobile device.

Google simplifies the process of keeping track of your favorite personalities' activities on social media. Google is offering creators a new platform in Search, which may alter how you follow your favorite online figures. This update introduces a novel method to find content without constantly switching between different apps.

Snap issued alerts to students during class time, fully aware of the potential for distraction. Internal documents from lawsuits involving over 1,400 school districts show that Meta, Snap, TikTok, and YouTube intentionally targeted students, including during school hours.

This AI is capable of distinguishing between authentic online reviews and fraudulent ones, and its accuracy is quite impressive. A novel AI system integrates text, images, and reviewer behavior to identify fake online reviews with an accuracy exceeding 94%, surpassing all other methods it has been compared to.

The Steam Machine has been confirmed to arrive this summer, but we still lack information regarding its price. Valve has announced that Steam Machine will be released this summer, but the price has yet to be disclosed. The newly introduced Verified program assists in clarifying game compatibility, as potential buyers await additional information that will influence their upgrade choices.

It appears that educating small AI models using games like Battleship can significantly enhance their intelligence.

Researchers at MIT conducted a Battleship-style experiment to demonstrate that smaller AI models can enhance their performance by formulating more precise questions, which could increase the utility of less expensive AI agents without depending on larger systems.