A startup claims it has solved the bottleneck that has been hindering AI development.
A Miami startup claims to have solved a mathematical issue that has caused AI models to be slow and energy-intensive for nearly a decade. This assertion was audacious enough to draw parallels with Theranos. However, the company now possesses independent test results that substantiate much of its claim.
The startup, named Subquadratic, emerged from stealth mode in May with $29 million in seed funding and introduced a new language model called SubQ. The company asserts that SubQ is quicker, more affordable, and significantly less power-consuming than the current leading models. It also reportedly has the capacity to read up to 12 times more text simultaneously.
The decade-old bottleneck
To understand why this is significant, it is essential to grasp how most large language models function. At the heart of these models is a "transformer," which was developed by Google researchers in 2017. The transformer utilizes a process known as dense attention.
Dense attention is comprehensive but costly. It compares every word in a text to every other word. Therefore, when the text length is doubled, the computational effort roughly quadruples. This "quadratic" scaling is the primary reason large language models consume so much computing power and energy.
Subquadratic’s solution
Subquadratic addresses this issue by replacing dense attention with “sparse attention.” Instead of comparing every word with all others, sparse attention focuses only on the relevant pairs. While this idea has been around for some time, no team had previously matched the quality of dense attention.
According to the company, its version finally achieves this. Notably, it dynamically selects which words to emphasize based on content rather than following a fixed pattern. “That’s kind of where the secret sauce is,” explains co-founder and chief technology officer Alex Whedon.
The evidence
Initially, the startup's claims relied on a few self-reported scores, which led to skepticism. One AI engineer remarked on X that SubQ might be “the biggest breakthrough since the Transformer … or it’s AI Theranos.”
To substantiate its claims, the company engaged a third-party evaluator, Appen, to conduct tests. The findings were impressive: in a raw speed test, SubQ outperformed FlashAttention, a top existing method, by 56 times. On a challenging coding benchmark, it achieved a score of 89.7 percent, closely rivaling the best models available.
The cost difference is equally significant. According to the startup, running a long-context test on Anthropic’s leading model costs approximately $2,600, whereas the same test on SubQ would only be eight dollars.
Still too good to be true?
Despite these promising results, caution is warranted. Benchmarks do not directly translate to real-world application, and SubQ is not yet widely accessible. Many thousands have signed up for the waitlist, but only a limited number have received access.
Additionally, there is a twist in the development narrative. Instead of creating SubQ from the ground up, Subquadratic started with an existing open-weight model and integrated its new attention method. While this is a common approach, it somewhat contradicts the claim of completely reinventing how large language models operate.
“They may have built something real and useful,” remarks Will Depue, an independent researcher formerly with OpenAI. “However, the public evidence does not yet support the more ambitious assertion that they have resolved the quadratic attention bottleneck.”
Why it matters
If these results prove consistent, the potential benefits are substantial. More affordable and faster long-context models could process complete codebases, sets of contracts, or large collections of documents in a single pass. They would also reduce the costs and energy associated with running AI systems.
This prize is one that the entire industry is aiming for. AI is already challenged by the escalating costs of AI agents, and other startups, like Thomas Reardon's Flourish, are pursuing efficiency through different methods. Nevertheless, Subquadratic is betting that the entire field will pivot in its direction. “We believe that in a few years, no one will be building on transformers,” says chief executive Justin Dangel.
Other articles
A startup claims it has solved the bottleneck that has been hindering AI development.
Miami-based startup Subquadratic asserts that its SubQ model overcomes the 'quadratic attention' limitation. While independent evaluations support many of its claims, skepticism persists.
