NVIDIA GTC 2026: Seven Chips Live, $1 Trillion in Orders, and the Official Start of the Agentic AI Era

A vast array of photorealistic GPU server racks inside a futuristic data center, emerald-green and electric-blue LED lighting illuminating dense rows of liquid-cooled compute nodes stretching to the horizon, no people present

On Monday afternoon at the SAP Center in San Jose, Jensen Huang walked onto a stage in front of a capacity crowd and confirmed the single most consequential infrastructure announcement in years: the NVIDIA Vera Rubin platform is not a roadmap slide. Seven new chips are in full production. Five new rack-scale systems are shipping to customers this year. And NVIDIA — now a $4.5 trillion company — expects combined orders for Blackwell and Vera Rubin to reach $1 trillion through 2027.

From Roadmap to Reality: The Vera Rubin Platform Is Shipping

For months, Vera Rubin existed primarily as a compelling promise: a full-stack, rack-scale system that NVIDIA claimed would deliver 10x better inference performance per watt than its predecessor, Grace Blackwell. At GTC 2026, Huang confirmed the transition from promise to production.

The Vera Rubin platform now comprises seven chips — the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU — all in full production and designed to operate as a single coordinated AI supercomputer. The platform ships in five distinct rack configurations, each purpose-built for a specific phase of AI: pretraining, post-training, test-time scaling, agentic inference, and storage.

The flagship Vera Rubin NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. NVIDIA claims the NVL72 trains large mixture-of-experts models using one-fourth the number of GPUs required by the Blackwell platform, while delivering 10x higher inference throughput per watt at one-tenth the cost per token — a compound efficiency improvement that, if it holds at hyperscale, would substantially reshape the unit economics of running frontier AI models.

"Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI," Huang said from the GTC stage. "The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history."

The Vera CPU: When NVIDIA's Quietest Chip Becomes the Most Strategic

Historically, when people talked about NVIDIA, they meant GPUs. That framing is now incomplete. The Vera CPU — purpose-built for the orchestration demands of agentic AI — is becoming one of the company's most strategically important assets, and GTC 2026 marked its operational debut as a standalone compute resource.

The Vera CPU Rack packs 256 Vera CPUs into a dense, liquid-cooled MGX chassis with Spectrum-X Ethernet networking. Its target workload is the part of agentic AI pipelines that GPUs are structurally bad at: reinforcement learning environments, agent validation loops, tool-calling scaffolds, and the millions of CPU-bound tasks that modern multi-agent systems spawn. NVIDIA claims the Vera CPU delivers results twice as efficiently and 50% faster than conventional data center CPUs in these workloads.

The strategic context matters here. In February, NVIDIA struck a multiyear deal with Meta that included the first large-scale deployment of Grace CPUs running without GPU pairs — a signal that standalone NVIDIA CPU racks are no longer a niche configuration. Thousands of standalone CPUs are also powering supercomputers at the Texas Advanced Computing Center and Los Alamos National Lab. Bank of America projects the CPU market to more than double, from $27 billion in 2025 to $60 billion by 2030, and NVIDIA is now positioned to capture a growing share of that expansion.

"CPUs are becoming the bottleneck in terms of growing out this AI and agentic workflow," Dion Harris, NVIDIA's head of AI infrastructure, told CNBC ahead of the conference. "It's an exciting opportunity."

Groq 3 LPX: The $20 Billion Bet Starts Paying Off

One of the most closely watched announcements at GTC 2026 was the first commercial reveal of the NVIDIA Groq 3 LPU — the silicon that emerged from NVIDIA's $20 billion asset acquisition of Groq in December 2025, its largest deal in company history.

The Groq 3 LPX Rack holds 256 LPU processors, featuring 128GB of on-chip SRAM per chip and 640 TB/s of scale-up bandwidth. Deployed alongside Vera Rubin NVL72, the LPX rack is designed to attack the fundamental inefficiency of GPU-based inference: decode latency. The Groq LPU architecture — optimized for deterministic, low-latency token generation — pairs with Rubin GPUs so both processors compute every layer of the AI model for every output token jointly. The result, NVIDIA claims, is up to 35x higher inference throughput per megawatt compared with GPU-only configurations, and up to 10x more revenue opportunity for trillion-parameter models.

"We united two processors of extreme differences, one for high throughput, one for low latency," Huang said from the stage. "And so we're just going to add a whole bunch of Groq chips, which expands the amount of memory it has."

The LPX rack is fully liquid-cooled, built on MGX infrastructure, and expected to ship in the second half of 2026. Mistral AI's CTO Timothée Lacroix called the BlueField-4 STX storage system — which works in tandem with LPX to handle KV cache data at AI factory scale — "ideally positioned to ensure that our models can maintain coherence and speed when reasoning across massive datasets."

Storage and Networking: The Infrastructure Nobody Talks About

Two less-discussed but operationally critical components rounded out the Vera Rubin platform reveal. The BlueField-4 STX rack is an AI-native storage layer that extends GPU memory coherently across an entire POD. Powered by a chip that combines the Vera CPU with the ConnectX-9 SuperNIC, STX is built for the KV cache problem — the growing bottleneck in long-context and multi-turn agentic inference. NVIDIA's DOCA Memos framework enables dedicated KV cache processing that NVIDIA claims boosts inference throughput by up to 5x while improving power efficiency compared with general-purpose storage architectures.

The Spectrum-6 SPX Ethernet rack handles east-west traffic across AI factories, configurable with either Spectrum-X Ethernet or Quantum-X800 InfiniBand switches. At hyperscale, keeping GPU clusters tightly synchronized across thousands of nodes is one of the largest practical constraints on training efficiency. Spectrum-6 is NVIDIA's answer to that constraint for the Vera Rubin generation.

What Comes After Vera Rubin: Feynman, Rosa, and AI Factories in Orbit

Huang devoted a substantial portion of the keynote to what comes after Vera Rubin, cementing NVIDIA's roadmap visibility at a time when rivals are racing to close the gap.

The next major architecture is called Feynman, named for physicist Richard Feynman. It introduces a new CPU, NVIDIA Rosa — named for Rosalind Franklin, the crystallographer whose X-ray work revealed the structure of DNA — paired with the LP40 LPU, BlueField-5, and ConnectX-10, connected through NVIDIA Kyber for both copper and co-packaged optical interconnects. Huang showed a prototype of the Kyber rack architecture: 144 GPUs arranged in vertical compute trays instead of horizontal ones, boosting density and reducing latency. The Kyber design will debut in Vera Rubin Ultra, the next rack-scale system after the current NVL72, expected to ship in 2027.

Then there was the announcement that no one had on their bingo card: NVIDIA Space-1 Vera Rubin. Huang announced that future systems are being designed to bring AI data centers into orbit, extending accelerated computing from terrestrial facilities to space. Details were limited, but the signal was unmistakable — NVIDIA is thinking about AI infrastructure at a scale that transcends geography.

To help companies plan deployments before committing to physical hardware, NVIDIA also announced the Vera Rubin DSX AI Factory reference design and the Omniverse DSX Blueprint. DSX Air lets organizations simulate AI factories in software before any steel is bolted together.

$1 Trillion and the Scale of What's Being Built

The number that defined the keynote was $1 trillion. Huang told the audience that combined purchase orders for Blackwell and Vera Rubin are expected to reach that level through 2027 — a figure that nearly doubled the $500 billion estimate NVIDIA gave last year. NVIDIA's finance chief Colette Kress had already signaled in February's earnings call that growth would exceed the prior estimate; GTC put the revised number in public view for the first time.

The revenue backdrop supports the claim. NVIDIA's latest quarterly revenue forecast stands at approximately $78 billion — a 77% year-over-year increase — and the company has reported 11 consecutive quarters of revenue growth above 55%. The $4.5 trillion market cap reflects a market that has come to treat AI compute as something closer to indispensable infrastructure than discretionary technology spending.

Huang framed the scale of demand in stark terms: "I believe computing demand has increased by 1 million times over the last few years." He also cited $150 billion in venture investment into AI-native startups over the past year, calling the category one of the fastest-growing in the history of technology. "If they could just get more capacity, they could generate more tokens, their revenues would go up," Huang said of the startup ecosystem — describing a demand curve that, from NVIDIA's vantage, shows no signs of plateauing.

What the Vera Rubin Launch Actually Means for the Industry

The transition from Blackwell to Vera Rubin is not merely an upgrade cycle. It represents a structural change in how AI infrastructure is conceived, procured, and operated. Previous NVIDIA generations sold chips; the Vera Rubin platform sells an entire operating environment — compute, memory, storage, networking, and the software stack to orchestrate all of it — as a vertically integrated unit.

For hyperscalers, that vertical integration creates both efficiency and dependency. The 10x inference throughput per watt improvement is not achievable by cherry-picking NVIDIA components and mixing them with third-party networking or storage. It requires the full stack. That codesign advantage is NVIDIA's deepest competitive moat — and it's one that becomes harder to replicate as each generation ships faster and integrates more deeply.

OpenAI CEO Sam Altman said at GTC that with Vera Rubin, the company will "run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people." Anthropic's Dario Amodei noted that the infrastructure requirements of complex reasoning and agentic workflows "demand infrastructure that can keep pace." Neither company commented on the vertical dependency that comes with that reliance — but both are deeply embedded in it.

GTC 2026 was, in essence, the launch event for the next phase of AI infrastructure. Vera Rubin is not shipping someday. It is shipping. Seven chips are in production. The orders are logged. The factories are being built. What happens next — how quickly model capabilities scale with inference cost at one-tenth of Blackwell, how agentic architectures evolve now that CPUs have a dedicated NVIDIA rack, and whether the Feynman roadmap holds — will define the AI hardware landscape through the end of the decade.

Related Articles