NVIDIA's Vera CPU Marks a Structural Shift in How AI Thinks About Hardware

On Friday, NVIDIA Vice President Ian Buck hand-delivered the first Vera CPU systems to three of the world's leading AI labs: Anthropic, OpenAI, and SpaceXAI in the Bay Area. Oracle Cloud Infrastructure received its unit on Monday. These are the first production units of NVIDIA's first custom CPU, a chip the company has positioned as its next multi-billion dollar business.

The deliveries come two months after Jensen Huang introduced Vera at GTC San Jose in March, and they mark a turning point in how the AI industry thinks about compute architecture. For years, the assumption was simple: GPUs do the heavy lifting, CPUs play host. That assumption is breaking down.

Why CPUs Are Suddenly Cool Again

Agentic AI changes the workload profile entirely. Instead of processing a prompt and returning an answer, agents break goals into multi-step workflows, call external tools, manage persistent memory, and coordinate with other agents. Every orchestration layer, every tool call, every sandbox environment runs on the CPU. As Buck put it during the delivery tour: "When AI models are posed a question, the answer, often, isn't already prepped and ready to go. The models actually have to generate some Python code to arrive at the correct answer."

The numbers support this shift. According to Arm, traditional AI data centers require about 30 million CPU cores per gigawatt of power. In the agentic era, that figure jumps to 120 million cores per gigawatt. TrendForce expects the CPU-to-GPU ratio to shift from 1:4-1:8 in chatbot-era infrastructure to somewhere between 1:1 and 1:2 as agentic workloads scale.

Vera is built for this new reality. It features 88 custom Olympus cores with a 10-wide instruction decoder and a neural branch predictor, delivering 1.2 TB/s of memory bandwidth. NVIDIA claims Vera runs agentic sandboxes 50% faster and at twice the energy efficiency of traditional x86 rack-scale CPUs. The chip uses LPDDR5X memory in SOCAMM modules, a first for the data center market, and connects to Rubin GPUs via NVLink-C2C at 1.8 TB/s, seven times the bandwidth of PCIe Gen 6.

The Competitive Landscape Is Suddenly Crowded

NVIDIA is not alone in recognizing this opportunity. The same month Vera was announced, Arm unveiled its AGI CPU, a 136-core chip built on TSMC's 3nm process that marks the company's first foray into production silicon after 35 years of pure licensing. Meta, OpenAI, and Cerebras are launch partners.

AMD sees this as validation of its existing strategy. CEO Lisa Su expects the server CPU market to grow by strong double digits in 2026, driven by agentic workloads. The company's 5th Gen EPYC Turin CPUs captured more than half of total server CPU revenue by Q4 2025. AMD's forthcoming EPYC Venice, built on TSMC 2nm with 256 Zen 6 cores, is positioned as a direct competitor to Vera. Intel's response comes via Xeon 6+ Clearwater Forest, a 288-core E-core chip on its 18A process, though yield issues may push volume production into 2027.

The competitive picture is unusually complex. When a GPU company, an IP licensor, and two x86 incumbents all prioritize agentic CPU design simultaneously, the shift is structural.

The Token Multiplier Problem

Hardware is only half the constraint. Agentic AI faces a fundamental economics problem that no CPU can fully solve: multi-step reasoning multiplies token consumption. Standard generative AI processes input tokens and returns output tokens in a single exchange. Agentic AI executes reasoning chains, tool calls, and coordination loops that generate 20 to 30 times more tokens per interaction.

This compounds non-linearly. A naive 5-step agent loop can process 27,000 tokens compared to 2,000 tokens for a single chatbot call, a 13.5x cost increase. Google reportedly processes approximately 1.3 quadrillion tokens per month as of 2026, a 130-fold jump from the previous year. Gartner predicts 40% of agent projects will be canceled by 2027 due to infrastructure cost overruns.

The tension between latency and accuracy creates additional friction. A single LLM call might take 800 milliseconds. An orchestrator-worker flow with reflection loops can take 10 to 30 seconds. For user-facing applications, that latency is often unacceptable. But without multi-turn reasoning, accuracy plateaus at roughly 60 to 70 percent on complex tasks.

What Constrains the Agentic Future

Several bottlenecks remain unresolved. Key-value cache management is becoming critical as agents maintain longer context across sessions and users. NVIDIA's BlueField-4 STX storage system addresses this with dedicated KV cache processing, claiming 5x inference throughput improvement. But memory architecture remains a limiting factor.

Supply chains are already strained. Intel has admitted it cannot meet demand and is deprioritizing consumer PC CPUs to free fab capacity for data center products. AMD's lead times have stretched to 8-10 weeks. Server CPU prices in China have risen more than 10%, with high-end lines facing potential increases up to 50% if TSMC wafer allocation shortfalls persist.

Software orchestration adds another layer of complexity. Agents require tools, databases, API access, permission checks, memory retrieval, and output validation. Each operation introduces latency and failure modes. Hallucination, looping, context overflow, and tool misuse remain common problems. The non-determinism inherent to LLM outputs makes cost forecasting nearly impossible.

Oracle Plans Hundreds of Thousands of Veras

Despite these constraints, hyperscalers are moving aggressively. Oracle's product management lead stated plans to deploy hundreds of thousands of Vera CPUs beginning in 2026. CoreWeave will be the first cloud provider to offer Vera as a standalone platform. Meta has signed a deal for multiple generations of NVIDIA CPU-only systems.

The AWS-OpenAI partnership announced last November may be the most telling signal. The $38 billion, seven-year deal specified GPU access "with the ability to expand to tens of millions of CPUs to rapidly scale agentic workloads." Most analysts focused on the GPU headline. The CPU figure deserved more attention.

NVIDIA expects Vera-based systems from major OEMs in the second half of 2026. Partners include Dell, HPE, Lenovo, Supermicro, ASUS, Foxconn, and Gigabyte. National laboratories planning deployments include Los Alamos, Lawrence Berkeley, and the Texas Advanced Computing Center.

Whether Vera becomes the dominant platform for enterprise agentic infrastructure depends on factors beyond the chip itself: software ecosystem maturity, competitive pricing from AMD and Intel, and whether the token economics of agentic AI can be made sustainable at scale. For now, the deliveries to Anthropic and OpenAI signal that the companies building the most capable agents believe NVIDIA's architecture is worth the bet.