Nvidia’s 2028 data center roadmap stacks the deck for AI scaling

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
8 Min Read
Nvidia's 2028 data center roadmap stacks the deck for AI scaling

Nvidia’s data center roadmap charts a path from 2026 through 2028 that leans heavily on memory density and networking bandwidth to unlock the next phase of AI scaling. The company announced the multi-year strategy at GTC 2025, positioning itself to handle exponentially larger models and faster training cycles across enterprise and cloud infrastructure.

Key Takeaways

  • Rubin GPUs (2026) pair HBM4/HBM4e memory with Vera CPUs and NVLink 6/7 switches delivering 3600 GB/s bandwidth.
  • Feynman GPUs (2028) use 3D-stacked custom HBM for higher memory density and efficiency versus prior generations.
  • Rosa CPU replaces Vera CPU in 2028; Spectrum7 204T CPO doubles network capacity compared to Rubin.
  • Optical NVLink and ConnectX 10 networking address physical transmission limits for massive AI workloads.
  • Groq LPUs with NVFP4 and NVLink support signal broader ecosystem expansion beyond Nvidia’s own silicon.

Rubin Sets the 2026 Baseline for HBM4 Memory

Nvidia’s 2026 Rubin generation delivers multiple times greater performance than prior series by pairing GPUs with HBM4 memory in standard configurations and HBM4e in advanced models. The Vera CPU replaces Grace CPU as the compute companion, connected directly to Rubin GPUs via NVLink. This architecture eliminates intermediate hops that previously throttled data movement between processor and accelerator.

Rubin’s networking backbone relies on NVLink 6 and NVLink 7 switches—specifically the CX9 1600G and Spectrum6 102T CPO—delivering 3600 GB/s of aggregate bandwidth. For data center operators, this means training loops that previously bottlenecked at the network interface can now push data at speeds matching GPU memory throughput. BlueField 4 DPU handles the networking layer, offloading security and packet processing from the main compute path.

The 2027 Rubin Ultra variant adds incremental improvements but remains within the HBM4 family. For organizations planning 2026-2027 deployments, Rubin represents the immediate scaling target—a known architecture with proven software stacks and established cooling/power delivery practices.

Feynman’s 3D Stacking and Rosa CPU Drive 2028 Efficiency

By 2028, Feynman GPUs introduce a fundamental shift: custom HBM memory stacked directly onto the GPU die using 3D die-stacking techniques. This approach compresses the distance between compute and memory, reducing latency and power consumption per transaction. Custom HBM—tailored to Feynman’s specific memory access patterns—squeezes additional capacity into the same physical footprint compared to off-the-shelf HBM4e modules.

Rosa CPU, the successor to Vera, pairs with Feynman to form a cohesive platform. Spectrum7 204T CPO networking doubles Rubin’s capacity, pushing aggregate bandwidth even higher. ConnectX 10 networking interfaces provide the physical connection layer, while BlueField-5 DPU handles the data plane. For large-scale AI factories training trillion-parameter models, this combination addresses a hard physical limit: getting data into and out of the GPU faster than the GPU can process it.

The 3D-stacking approach also signals Nvidia’s willingness to abandon commodity memory interfaces in favor of custom silicon. This is not a minor tweak—it is a departure from the HBM4 standard that underpins Rubin and prior generations. Competitors relying on off-the-shelf memory modules will face inherent latency and power penalties by comparison.

Optical NVLink and Ecosystem Expansion Beyond Nvidia Silicon

The roadmap explicitly mentions optical NVLink and support for Groq LPUs with NVFP4 precision and NVLink connectivity. Optical interconnects replace copper at scale, eliminating signal degradation over longer distances and enabling denser rack configurations. This is not new technology, but integrating it into NVLink—Nvidia’s proprietary interconnect—is a strategic move to lock ecosystem partners into Nvidia’s networking standard.

Groq’s LPU (Language Processing Unit) integration with NVFP4 and NVLink support signals that Nvidia is opening its interconnect to third-party accelerators, not just its own GPUs. This could fragment or strengthen the ecosystem depending on adoption. If Groq LPUs become a standard pairing with Nvidia GPUs in production clusters, it validates Nvidia’s interconnect as the industry standard. If adoption stalls, it remains a niche feature.

The optical NVLink announcement also hints at where Nvidia sees the bottleneck moving. GPU compute will not be the constraint in 2028—interconnect bandwidth will be. By moving to optical, Nvidia buys another generation of headroom before the laws of physics force a new architecture.

What This Means for Data Center Planning

Organizations evaluating data center refresh cycles should note the 2026 Rubin launch as the practical inflection point. Existing Blackwell-era infrastructure will begin aging relative to HBM4 memory and Vera CPU pairings. By 2028, Feynman’s 3D-stacked memory will make 2026-era Rubin systems look dated by comparison, but the jump in efficiency may not justify wholesale replacement for all workloads.

The roadmap also reveals where Nvidia is placing its bets: memory density and networking bandwidth, not raw compute cores. This reflects a maturing market where GPU flops are abundant but data movement is the bottleneck. Teams building AI infrastructure today should design for interconnect scalability, not just GPU count.

Will Feynman’s 3D stacking deliver the promised efficiency gains?

3D die-stacking reduces latency between compute and memory, but yields depend on thermal management and custom HBM yields. If Nvidia can manufacture these stacks at volume without defect rates spiraling, the efficiency gains are real. If yields fall below 70-75%, costs could spike and availability could slip past 2028.

How does optical NVLink compare to copper NVLink in real deployments?

Optical interconnects eliminate signal loss over longer distances and support higher density, but require optoelectronic transceivers that add cost and latency. In 2028, the trade-off likely favors optical for large-scale clusters where rack-to-rack distances exceed 10 meters. Smaller deployments may stick with copper.

Can third-party LPUs like Groq’s actually integrate with Nvidia’s NVLink ecosystem?

The roadmap states Groq LPUs support NVFP4 and NVLink, but full integration requires software stacks, driver support, and ecosystem certification. Early adopters will likely face compatibility friction until both companies mature the integration.

Nvidia’s data center roadmap is not a revolution—it is a disciplined engineering response to the reality that GPU compute is no longer the limiting factor in AI training. Rubin in 2026 delivers the immediate scaling tool with proven HBM4 memory and networking. Feynman in 2028 bets that custom 3D-stacked memory and optical interconnects will sustain another generation of growth before the next architectural rethink becomes necessary. For teams building AI infrastructure, the roadmap signals where to invest: interconnect capacity and memory bandwidth, not just raw GPU counts.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.