The Nvidia Vera Rubin POD is a 40-rack AI supercomputer announced at GTC 2026 that delivers 60 exaflops of compute power through a seven-chip architecture designed to turn entire data centers into AI factories. Rather than a single monolithic design, the system stacks five purpose-built rack-scale subsystems, each targeting different AI bottlenecks, to handle everything from pretraining to agentic inference at unprecedented scale.
Key Takeaways
- Vera Rubin POD combines 1,152 Rubin GPUs across 40 racks with 10 petabytes per second total bandwidth
- NVL72 core rack delivers 260 TB/s NVLink bandwidth — more than the entire global internet, per Nvidia
- Seven-chip platform includes Rubin GPU, Vera CPU, NVLink 6 switch, BlueField-4 DPU, ConnectX-8 SuperNIC, CX-9 NIC, and Spectrum-6 Ethernet
- Intelligent Power Smoothing reduces peak current by 25% with warm-water cooling at 45°C inlet temperature
- Production units ship in second half of 2026
The Seven-Chip Architecture Behind Vera Rubin POD
The Nvidia Vera Rubin POD is built on seven distinct chips working in concert across the platform. The Rubin GPU forms the compute backbone, packing 288 GB of HBM4 memory with 22 TB/s internal bandwidth and 3.6 TB/s NVLink connectivity per GPU. Paired with it is the Vera CPU, an Arm-compatible processor with 88 Olympus cores and 1.2 TB/s LPDDR5X bandwidth, handling control plane tasks and reducing GPU overhead.
The remaining five chips handle networking and data movement. NVLink 6 switches connect GPUs within each rack at 3.6 TB/s per GPU, totaling 260 TB/s per NVL72 rack. The BlueField-4 data processing unit upgrades from its predecessor with 800 Gb/s bandwidth (double the BlueField-3), 64 Arm Neoverse V26x cores, 128 GB memory, and 250 GB/s memory bandwidth. ConnectX-8 SuperNICs deliver 800 Gb/s per GPU for inter-rack communication, while CX-9 NICs integrate PCIe switches for scale-out flexibility. Finally, Spectrum-6 Ethernet components reduce optical signal loss from 22 dB to 4 dB, improving signal integrity 64x over prior generations.
NVL72 Rack Performance and Thermal Innovation
Each NVL72 core compute rack houses 72 Rubin GPUs and 36 Vera CPUs connected via NVLink 6, delivering 3,600 petaflops of NVFP4 inference performance and 20.7 TB of HBM4 memory with 1,580 TB/s aggregate bandwidth. This single rack handles all four AI scaling laws — pretraining, post-training, test-time scaling, and agentic scaling — without architectural compromise.
Thermal design separates Vera Rubin POD from previous Nvidia systems. The platform uses Intelligent Power Smoothing with 400 joules per GPU of energy storage, reducing peak current draw by 25% and enabling warm-water cooling at 45°C inlet temperatures. This ambient-air-friendly approach allows operators to fit 10 percent more racks within the same power budget, directly lowering total cost of ownership for hyperscalers. The third-generation MGX rack architecture distributes 1.2 quadrillion transistors across the full POD, with six rack-level energy storage units providing the smoothing capacity.
How Vera Rubin POD Compares to Blackwell-Era Systems
The Nvidia Vera Rubin POD delivers 10x better inference performance per watt than Blackwell, the previous-generation architecture. This efficiency gain stems from both the new GPU design and the entire seven-chip ecosystem optimized for data movement rather than raw compute. Where Blackwell systems struggled with bandwidth bottlenecks between GPUs and CPUs, Vera Rubin’s NVLink 6 and dedicated Vera CPU remove that constraint entirely.
The architectural shift also reflects a change in AI workload priorities. Blackwell excelled at dense training jobs, but Vera Rubin’s design — with its support for all four scaling laws and agentic inference — targets the emerging era where inference, reasoning, and agent-based tasks dominate data center spending. The seven-chip modular approach also allows Nvidia to iterate faster: the company has already announced Rubin Ultra for 2027, with 576 GPUs per rack and 600-kilowatt Kyber racks, suggesting the platform will scale beyond the POD’s current 40-rack footprint.
Data Center Integration and Serviceability
The Vera Rubin POD integrates into data centers as a complete system, not a collection of loosely coupled components. The compute tray in CPX racks uses four daughter cards per side (two stacked), each with two 800G CX-9 NICs and E1.S SSD storage, connected to a central BlueField-4 with Grace CPU. Power delivery boards step down from 48–54V to 12–13.5V, and modular slide-in daughter cards enable field serviceability without full rack teardown.
This modular design matters because the Vera Rubin POD is not a research prototype — production units begin shipping in the second half of 2026. Hyperscalers deploying these systems will need to replace failed components, upgrade memory, or swap networking hardware without taking entire racks offline. Nvidia’s engineering choices reflect this operational reality, prioritizing serviceability alongside performance.
Is the Vera Rubin POD worth the infrastructure investment?
The Vera Rubin POD targets hyperscalers and cloud providers building next-generation AI infrastructure, not individual researchers or smaller companies. At 60 exaflops across 40 racks, the system demands data center space, power delivery, and cooling infrastructure most organizations do not have. However, for companies running agentic AI workloads at scale, the efficiency gains and bandwidth advantages justify the capital expenditure.
When will Vera Rubin POD systems be available?
Production units ship in the second half of 2026, according to Nvidia’s GTC 2026 announcement. Pricing has not been disclosed. Early adopters will likely be cloud providers like AWS, Google Cloud, and Microsoft Azure, which can amortize the cost across many customers.
How does Vera Rubin POD’s bandwidth compare to previous systems?
The NVL72 rack provides 260 TB/s of NVLink bandwidth — Nvidia claims this exceeds the entire global internet’s capacity. While that comparison is promotional hyperbole, the underlying point is real: the seven-chip architecture eliminates bandwidth as a constraint for AI workloads, shifting bottlenecks to algorithmic efficiency and memory capacity instead.
The Nvidia Vera Rubin POD represents a fundamental shift in how data centers approach AI infrastructure. By combining seven purpose-built chips into a modular, thermally efficient 40-rack system, Nvidia has created a platform that supports all current and near-term AI scaling paradigms without architectural compromise. For organizations ready to invest in next-generation compute, shipping in late 2026, the Vera Rubin POD will define the baseline for enterprise AI capacity.
Edited by the All Things Geek team.
Source: Tom's Hardware


