SpaceX leased Colossus 1 to Anthropic due to difficulties in making the data center operate for Grok.
SpaceX has leased its Colossus 1 data center to Anthropic after struggling with latency and chip compatibility issues while attempting to use it for its Grok AI models. The newer facilities utilize uniform Blackwell chips.
The decision to rent Colossus 1 was not due to excess capacity, but rather because SpaceX could not effectively utilize the facility for its own AI needs. According to Bloomberg, SpaceX faced latency challenges connecting the Memphis site to two other data centers over 10 miles away, worsened by outdated network infrastructure.
SpaceX had aimed to train its advanced Grok models using a collaborative setup across three facilities. The training of large AI models necessitates ultra-fast connections between locations. Older or lower bandwidth connections can create delays that hinder the efficiency of the entire cluster. Consequently, SpaceX concluded that it would be more beneficial to generate revenue from the facility rather than letting it sit underused.
The hardware integration issues exacerbated the situation. Colossus 1 features a mix of Nvidia chips, including Hopper and Blackwell, along with some older models. In contrast, Colossus 2 and 3 were designed more uniformly around Nvidia’s Blackwell chips. In a distributed training environment, workloads are distributed across machines that must remain synchronized. Older chips can create bottlenecks, causing faster accelerators to idle, which leads the entire cluster to perform at the level of its slowest component.
As a result, Anthropic is now paying $1.25 billion monthly to utilize a facility that SpaceX engineers were unable to fully exploit. Together with a $920 million monthly deal with Google, SpaceX is generating approximately $2.17 billion per month in compute revenue from infrastructure initially developed for its own use.
This development complicates the narrative SpaceX put forth during its IPO roadshow, where the company emphasized that Colossus 1 was constructed in just 122 days, faster than industry standards. While the rapid construction was a selling point, Bloomberg’s report indicates that this speed came with the drawback of not having the facility built uniformly enough for larger training purposes.
SpaceX CFO Bret Johnsen stated that the company has not abandoned its internal AI initiatives, including Grok. Musk characterized the lease with Anthropic as a 180-day arrangement with a 90-day mutual cancellation option, allowing for the potential to reclaim the capacity. “If compute gets super tight, I said we might need it back at some point,” he noted.
However, the performance of Grok makes regaining the compute less critical. Downloads dropped from 20 million in January to 8.3 million in April, with paid conversions only reaching a fifth of ChatGPT’s. Furthermore, federal adoption has come to a halt. The product anticipated to justify the data center investment is not meeting expectations, while the rental income from Anthropic and Google has evolved into a $26 billion annual revenue stream. SpaceX built a data center for AI training but has inadvertently become a provider of leasing services in the AI sector.
Otros artículos
SpaceX leased Colossus 1 to Anthropic due to difficulties in making the data center operate for Grok.
Bloomberg: SpaceX experienced latency problems and chip incompatibilities while linking Colossus 1 to its other data centers. It leased the facility to Anthropic for $1.25 billion per month.
