SpaceX leased Colossus 1 to Anthropic because it was unable to get the data center operational for Grok.
TL;DR: SpaceX leased its Colossus 1 to Anthropic due to latency and chip compatibility problems, hindering its use for Grok. The newer facilities feature uniform Blackwell chips.
SpaceX rented out its Colossus 1 data center to Anthropic not due to excess capacity, but rather because it was unable to effectively utilize the facility for its AI models. Bloomberg reported on Friday that SpaceX faced latency challenges when trying to connect the Memphis site with two other data center campuses situated over 10 miles apart, exacerbated by outdated network infrastructure.
The company had aimed to train its most advanced Grok models using a network of three facilities working in unison. Training large AI models demands ultra-fast connections between locations. If the connections are older or of lower bandwidth, they cause delays that can hinder the entire cluster. Ultimately, SpaceX concluded that the facility would generate more revenue than remain underutilized.
The hardware mismatch further complicated matters. Colossus 1 is equipped with a blend of Nvidia chip generations, incorporating both Hopper and Blackwell systems along with older accelerators. In contrast, Colossus 2 and 3 are more uniformly constructed around Nvidia’s Blackwell chips. In a distributed training cluster, workloads are shared across machines that must remain synchronized. Older chips can create bottlenecks, forcing quicker accelerators to wait, which results in the cluster performing closer to its slowest components rather than its fastest.
As a result, Anthropic is now paying $1.25 billion per month to utilize a facility that SpaceX's engineers could not completely optimize. Coupled with the $920 million monthly partnership with Google, SpaceX is generating approximately $2.17 billion per month in computing revenue from infrastructure initially intended for its own use.
This information complicates the narrative SpaceX shared during its IPO roadshow. Musk’s firm highlighted that Colossus 1 was constructed in just 122 days, surpassing industry standards. The speed of construction was a key selling point. However, Bloomberg’s findings imply that this rapid development may have compromised the facility's uniformity, impacting its suitability as part of a larger training cluster.
SpaceX CFO Bret Johnsen mentioned that the company has not abandoned its internal AI operations, including Grok. Musk characterized the arrangement with Anthropic as a 180-day lease with a 90-day mutual cancellation option, allowing for the possibility of reclaiming the facility. “If compute gets super tight, I said we might need it back at some point,” he noted.
However, Grok's performance trajectory makes reclaiming the resources less pressing. Downloads dropped from 20 million in January to 8.3 million in April, with paid conversions being only a fifth of ChatGPT’s. Adoption on a federal level has stalled, and the product that was expected to validate the data center investment is falling short. Meanwhile, the rental income from Anthropic and Google has become a significant $26 billion annual revenue stream. SpaceX constructed a data center for AI training but inadvertently transitioned into an AI landlord role.
Other articles
SpaceX leased Colossus 1 to Anthropic because it was unable to get the data center operational for Grok.
Bloomberg: SpaceX experienced latency problems and chip incompatibilities while connecting Colossus 1 to its other data centers. It leased the facility to Anthropic for $1.25 billion per month.
