Days after Anthropic released Claude Opus 4.8, users are reporting a familiar and frustrating message: "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited." The error, which indicates Anthropic's infrastructure is overloaded rather than any issue with the user's account, has sparked fresh discussion about whether the company has actually resolved the capacity constraints that have plagued it for months.
The timing is notable. Opus 4.8 launched on May 28, arriving just 41 days after its predecessor. Anthropic described the upgrade as a "modest but tangible improvement," with stronger coding benchmarks and better honesty metrics. But for users hitting server-side limits within days of release, the model's capabilities matter less than its availability.
The xAI Deal Was Supposed to Fix This
Three weeks before Opus 4.8 shipped, Anthropic announced a landmark compute agreement with SpaceX-xAI. The deal gives Anthropic access to all of Colossus 1's capacity, a Memphis data center housing over 220,000 Nvidia GPUs drawing 300 megawatts of power. According to SpaceX's S-1 filing with the SEC, Anthropic will pay $1.25 billion per month through May 2029 for the privilege.
Anthropic explicitly tied the deal to relieving user-facing constraints, stating the additional capacity would "directly improve capacity for Claude Pro and Claude Max subscribers." The company said it had already doubled Claude Code rate limits for Pro, Max, and Team plans and removed peak-hour restrictions that had frustrated paying users since March.
But the Colossus infrastructure isn't fully online yet. The contract includes a discounted rate for the first two months while xAI completes its ramp-up, with one report indicating Anthropic is still scaling to 100% utilization of the facility. If Colossus 1 isn't running at full capacity, the rate limit relief it promised hasn't fully materialized.
A Pattern of Infrastructure Strain
This isn't Anthropic's first capacity crisis. In early March, Claude experienced multiple outages after a surge in signups following OpenAI's Pentagon contract sparked a user boycott. Claude topped U.S. App Store downloads, and web traffic jumped over 30% month-over-month. Anthropic's infrastructure buckled under the load.
The company responded by tightening session limits during weekday peak hours and running a temporary promotion that doubled off-peak usage. In April, a Fortune report revealed Anthropic acknowledged engineering missteps behind a widely noticed decline in Claude Code performance. The company's statement was blunt: "Demand for Claude has grown at an unprecedented rate, and our infrastructure has been stretched to meet it."
An internal OpenAI memo, first reported by CNBC, claimed Anthropic had made a "strategic misstep" by failing to secure sufficient compute and was "operating on a meaningfully smaller curve" than competitors. Whether that characterization is accurate or competitive posturing, the rate limit errors suggest Anthropic's capacity problems haven't fully resolved.
The Neocloud Dependency
Anthropic's compute portfolio now spans multiple providers: Amazon (up to 5 GW, with nearly 1 GW by end of 2026), Google and Broadcom (5 GW starting 2027), Microsoft and Nvidia ($30 billion in Azure capacity), a $50 billion infrastructure investment with Fluidstack, and now the SpaceX deal. The scale is staggering. So is the dependency risk.
The xAI arrangement in particular carries unusual terms. Either side can terminate with 90 days' notice, and Elon Musk has publicly stated that SpaceX "might need it back at some point" if compute gets tight. Simon Willison, a respected developer, noted the clause represents "a new form of supply chain risk for Anthropic."
For enterprise customers building production workflows on Claude, server-side rate limiting is more than an inconvenience. It breaks agentic workflows and erodes confidence in the platform's reliability. Anthropic's technical documentation distinguishes between 429 errors (user rate limits) and 529 errors (server overload), but for developers watching their requests fail, the distinction offers cold comfort.
The Opus 4.8 launch adds pressure. The model defaults to higher effort settings, which burn through tokens faster. Anthropic says it raised rate limits to compensate, but early reports suggest the headroom isn't enough for heavy users. The new Dynamic Workflows feature, which spins up hundreds of parallel subagents, will only intensify demand once it moves beyond research preview.
Anthropic hasn't commented on the specific Opus 4.8 rate limiting reports. The company's status page shows resolved incidents from May 27 and 28, coinciding with the model launch, though current API services show no active disruptions.


