It turns out that replicating what OpenAI does can be done more easily and at a lower cost.
OpenAI continues to maintain that the only route to achieving AGI is through substantial financial and energy investments. However, independent researchers are utilizing open-source technologies to rival the capabilities of its most advanced models at a significantly lower cost.
Last Friday, a joint team from Stanford University and the University of Washington announced they had created a large language model focused on math and coding that matches the performance of OpenAI's o1 and DeepSeek's R1 reasoning models, for just $50 in cloud computing credits. The researchers reportedly employed a standard base model and then distilled Google’s Gemini 2.0 Flash Thinking Experimental model into it. Distillation in AI involves extracting relevant information for a specific task from a larger model and transferring it to a smaller one.
Additionally, on Tuesday, Hugging Face researchers introduced a rival to OpenAI’s Deep Research and Google Gemini’s tools, called Open Deep Research, which they developed in just 24 hours. According to Hugging Face, “While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research. So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!” The project is estimated to have cost around $20 in cloud compute credits, and training would take less than 30 minutes.
Hugging Face’s model achieved 55% accuracy on the General AI Assistants (GAIA) benchmark, which assesses the capabilities of agentic AI systems. In comparison, OpenAI's Deep Research recorded an accuracy of 67-73%, depending on the evaluation methods. While the 24-hour model may not perform as well as OpenAI's, it also did not require billions of dollars and the energy resources of a mid-sized European country to develop.
These advancements follow a January report about a team from the University of California, Berkeley’s Sky Computing Lab, which trained their Sky T1 reasoning model for approximately $450 in cloud computing credits. The Sky-T1-32B-Preview model performed comparably to the earlier o1-preview reasoning model. As more open-source competitors to OpenAI's market leadership appear, their existence raises questions about whether OpenAI's strategy of investing half a trillion dollars in AI data centers and energy production is indeed the solution.
In a related note, on Monday, OpenAI’s CEO Sam Altman commented on DeepSeek's sudden success, hinting at upcoming releases. On Tuesday, OpenAI launched a new product called ChatGPT Gov, specifically designed to provide U.S. government agencies with access to OpenAI’s frontier models. This version promises enhanced data security compared to ChatGPT Enterprise; however, it remains to be seen how it will address the inaccuracies that affect the company's other models.
OpenAI reports that over 90,000 federal, state, and local government employees from 3,500 agencies have made more than 18 million queries to ChatGPT since the start of 2024. The new platform will allow government agencies to input “non-public, sensitive information” into ChatGPT while operating within their secure hosting environments, such as Microsoft Azure commercial cloud or Azure Government community cloud, as well as cybersecurity frameworks like IL5 or CJIS. This setup allows agencies to "manage their own security, privacy, and compliance requirements,” as noted by Felipe Millon, the Government Sales lead at OpenAI.
Moreover, Microsoft, a major investor in OpenAI, is investigating whether the Chinese company DeepSeek misused its methods to train its reasoning models. Reports suggest that DeepSeek may have violated its software terms of service by using its application programming interface (API) to develop the recently launched R1 model.
DeepSeek, developed by a Chinese startup, has rapidly gained traction, seemingly surpassing ChatGPT in popularity. It has become a crucial topic in news, with significant developments occurring within the last day. Notably, Nvidia's stock experienced a drastic downturn in response to DeepSeek’s emergence; even former President Donald Trump weighed in, and Mark Zuckerberg is reportedly forming a team to address the implications of DeepSeek. In the wake of DeepSeek's launch, Nvidia suffered the largest single-day market loss in history, amounting to nearly $600 billion, as it highlighted how economical it is to develop its large language model (LLM) compared to competitors such as Anthropic, Meta, or OpenAI.
Other articles
It turns out that replicating what OpenAI does can be done more easily and at a lower cost.
While Hugging Face replicated OpenAI's Deep Research in just 24 hours, a collaborative team of researchers from multiple institutions developed a competitor to o1 for only $50.
