A startup claims it has solved the bottleneck that has been hindering AI development.

      A Miami startup claims to have solved a mathematical issue that has caused AI models to be slow and energy-intensive for nearly a decade. This assertion was audacious enough to draw parallels with Theranos. However, the company now possesses independent test results that substantiate much of its claim.

      The startup, named Subquadratic, emerged from stealth mode in May with $29 million in seed funding and introduced a new language model called SubQ. The company asserts that SubQ is quicker, more affordable, and significantly less power-consuming than the current leading models. It also reportedly has the capacity to read up to 12 times more text simultaneously.

      The decade-old bottleneck

      To understand why this is significant, it is essential to grasp how most large language models function. At the heart of these models is a "transformer," which was developed by Google researchers in 2017. The transformer utilizes a process known as dense attention.

      Dense attention is comprehensive but costly. It compares every word in a text to every other word. Therefore, when the text length is doubled, the computational effort roughly quadruples. This "quadratic" scaling is the primary reason large language models consume so much computing power and energy.

      Subquadratic’s solution

      Subquadratic addresses this issue by replacing dense attention with “sparse attention.” Instead of comparing every word with all others, sparse attention focuses only on the relevant pairs. While this idea has been around for some time, no team had previously matched the quality of dense attention.

      According to the company, its version finally achieves this. Notably, it dynamically selects which words to emphasize based on content rather than following a fixed pattern. “That’s kind of where the secret sauce is,” explains co-founder and chief technology officer Alex Whedon.

      The evidence

      Initially, the startup's claims relied on a few self-reported scores, which led to skepticism. One AI engineer remarked on X that SubQ might be “the biggest breakthrough since the Transformer … or it’s AI Theranos.”

      To substantiate its claims, the company engaged a third-party evaluator, Appen, to conduct tests. The findings were impressive: in a raw speed test, SubQ outperformed FlashAttention, a top existing method, by 56 times. On a challenging coding benchmark, it achieved a score of 89.7 percent, closely rivaling the best models available.

      The cost difference is equally significant. According to the startup, running a long-context test on Anthropic’s leading model costs approximately $2,600, whereas the same test on SubQ would only be eight dollars.

      Still too good to be true?

      Despite these promising results, caution is warranted. Benchmarks do not directly translate to real-world application, and SubQ is not yet widely accessible. Many thousands have signed up for the waitlist, but only a limited number have received access.

      Additionally, there is a twist in the development narrative. Instead of creating SubQ from the ground up, Subquadratic started with an existing open-weight model and integrated its new attention method. While this is a common approach, it somewhat contradicts the claim of completely reinventing how large language models operate.

      “They may have built something real and useful,” remarks Will Depue, an independent researcher formerly with OpenAI. “However, the public evidence does not yet support the more ambitious assertion that they have resolved the quadratic attention bottleneck.”

      Why it matters

      If these results prove consistent, the potential benefits are substantial. More affordable and faster long-context models could process complete codebases, sets of contracts, or large collections of documents in a single pass. They would also reduce the costs and energy associated with running AI systems.

      This prize is one that the entire industry is aiming for. AI is already challenged by the escalating costs of AI agents, and other startups, like Thomas Reardon's Flourish, are pursuing efficiency through different methods. Nevertheless, Subquadratic is betting that the entire field will pivot in its direction. “We believe that in a few years, no one will be building on transformers,” says chief executive Justin Dangel.

Other articles

Immigrant-centered VC Geek Ventures is back with larger investments. Geek Ventures is in the process of securing a second fund of as much as $40 million to issue larger pre-seed and seed investments for immigrant founders who are developing projects in AI, robotics, and deep tech throughout the US, Europe, and Israel.

Trust is the objective: the emerging supply-chain attacks in the AI era. This week’s two campaigns, involving TeamPCP's 1,000 compromised open-source packages and the misuse of Claude's chat feature, demonstrate that trust has become the latest target for attacks.

AI's memory crisis has led to the downfall of a phone. Even Apple took a step back. The demand for memory chips from AI led Nothing to discard a phone, prompting Tim Cook to describe the shortage as 'unsustainable.' The smartphone market could decline by 15%.

According to a book, Trump ridiculed Zuckerberg and Bezos for their subservient behavior. A recent book by Haberman & Swan alleges that Trump privately ridiculed Zuckerberg and Bezos for seeking his approval, presenting Musk with their messages as examples of "first-class groveling."

OpenAI recruits the AI architect behind Trump and Google's Shazeer. Before its IPO, OpenAI appointed Dean Ball, the creator of Trump's AI Action Plan, to head a new policy team, shortly after acquiring Shazeer from Google.

According to the book, Trump ridiculed Zuckerberg and Bezos for their subservient behavior. A recent book by Haberman & Swan alleges that Trump privately mocked Zuckerberg and Bezos for seeking his approval, revealing Musk their messages as examples of "first-class groveling."

A startup claims it has solved the bottleneck that has been hindering AI development.

Miami-based startup Subquadratic asserts that its SubQ model overcomes the 'quadratic attention' limitation. While independent evaluations support many of its claims, skepticism persists.