Nvidia Vera Rubin NVL72 racks hit $8.8 million as AI costs spiral

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
9 Min Read
Nvidia Vera Rubin NVL72 racks hit $8.8 million as AI costs spiral

Nvidia Vera Rubin NVL72 racks are entering the market at eye-watering prices—$5 million to $7 million per unit, with some quotes reaching $8.8 million—as the chip giant shifts toward shipping complete, ready-to-deploy systems rather than letting server makers assemble them. This move consolidates Nvidia’s control over the AI infrastructure stack but leaves equipment manufacturers with razor-thin margins and little room for customization.

Key Takeaways

  • Vera Rubin NVL72 racks cost $5–8.8 million each, double or triple the price of older systems
  • Liquid cooling for these racks costs $49,860–$55,710 per unit, a major driver of total cost
  • Nvidia is shipping complete rack systems with 80+ ecosystem partners, reducing ODM assembly work
  • A single rack delivers 3.6 exaflops of NVFP4 inference performance with 20.7 TB of HBM4 memory
  • First deployments expected in H2 2026 from AWS, Google Cloud, Microsoft, and other cloud providers

Why Vera Rubin NVL72 Racks Cost More Than Ever

The Vera Rubin NVL72 is not just expensive—it is a step-change in AI infrastructure pricing. A single rack now carries a price tag that rivals a luxury real estate purchase or a small aircraft. The core reason is straightforward: power and cooling have become the bottleneck. Vera Rubin GPUs consume up to 1,800 watts each, and a full NVL72 rack houses 72 of them, demanding 100% liquid cooling with 45°C inlet water temperatures. That cooling infrastructure alone costs $49,860 for the predecessor Blackwell generation, and climbs to $55,710 for the next-generation NVL144 variant—a 17% increase. When you add the cost of 72 Rubin GPUs, 36 Vera CPUs, NVLink 6 switches, ConnectX-9 SuperNICs, and BlueField-4 DPUs, the total spirals into the millions within weeks. The result is a system that costs roughly four times what a comparable Blackwell NVL72 would, though Nvidia claims it delivers four times the performance.

Nvidia is also pulling the assembly work in-house. The company’s new MGX modular rack design is cable-free and arrives nearly ready to operate, with Nvidia claiming 2-hour full-rack assembly time—down from 2 days with earlier designs. This shift eliminates a traditional revenue stream for original equipment manufacturers and server integrators, who previously charged for engineering, integration, and testing. Now they are glorified logistics partners, and their margins are collapsing.

Server Makers Face Margin Squeeze

The real story is not the price to customers—it is the price to the companies that used to profit from selling these systems. Server makers and ODMs are caught in a squeeze. Nvidia controls the GPU, the memory, the networking chips, and now increasingly the system design itself. Equipment manufacturers can still order Vera Rubin NVL72 components and assemble them, but Nvidia is increasingly shipping complete racks directly to cloud providers and enterprises. This vertical integration leaves little margin for value-added services or differentiation. A server maker that once charged $500,000 to $1 million for integration, testing, and support now finds that Nvidia has automated much of that work and captured the margin itself. The ecosystem still exists—Nvidia lists 80+ partners—but their role has shrunk to distribution and deployment logistics rather than engineering and customization.

This dynamic reflects a broader shift in AI infrastructure. As systems grow larger and more standardized, the advantage goes to the company that controls the entire stack. Nvidia has done this ruthlessly. When you buy a Vera Rubin NVL72 rack, you are buying Nvidia’s design, Nvidia’s integration, Nvidia’s cooling strategy, and Nvidia’s support model. Server makers are becoming contract manufacturers, not systems architects.

Performance That Justifies the Price—Barely

On paper, Vera Rubin NVL72 delivers staggering performance: 3.6 exaflops of NVFP4 inference throughput, 2.5 exaflops for training, and 18 petaops INT8 per rack. A single unit contains roughly 1,300 chips and 1.3 million components, all connected via NVLink 6 for 260 TB/s scale-up bandwidth and Quantum-X800 InfiniBand or Spectrum-X Ethernet for scale-out. Memory is abundant—20.7 TB of HBM4 with 1.6 PB/s bandwidth, plus 54 TB of LPDDR5X. Nvidia claims the system delivers 5x the inference performance and 10x lower cost per million tokens compared to Blackwell at the rack level. It also requires roughly one-quarter the GPUs for training and one-tenth the cost for inference versus earlier generations.

But here is the catch: those cost-per-token claims are Nvidia’s own projections, not independently verified benchmarks. Cloud providers and enterprises will need to test these claims in real deployments, which are not expected until H2 2026. Groq has also entered the conversation with its 3 LPX rack, co-designed with Vera Rubin NVL72, offering 256 LPUs, 128 GB of SRAM, and 640 TB/s scale-up bandwidth per rack—a different architectural approach that trades some GPU flexibility for specialized inference speed. The market will ultimately decide which trade-off wins.

The Vera Rubin POD: Scaling to 40 Racks and 60 Exaflops

Nvidia is also pushing the scale-up story. A full Vera Rubin POD (point-of-delivery) scales to 40 racks, 1,152 GPUs, and 60 exaflops of performance. At $6 million per rack—the midpoint of the quoted range—a single POD would cost roughly $240 million. That is a staggering capital expenditure, but for a hyperscale cloud provider training trillion-parameter models, it is the cost of staying competitive. AWS, Google Cloud, Microsoft, and OCI have all signaled interest in early deployments. CoreWeave, an AI infrastructure specialist, is also planning to offer Vera Rubin capacity to customers who cannot afford their own PODs.

What Happens to the Next Generation?

Nvidia is already preparing NVL144 and NVL576 variants with even higher power demands and cooling costs. The NVL144 cooling system will cost $55,710, up 17% from the NVL72 generation. The NVL576 ‘Kyber’ system, with 144 GPU packages, will demand even higher-capacity cooling plates, some exceeding $400 per unit. This trajectory suggests that within two or three generations, a single rack could cost $10–15 million, and the cooling bill alone could rival the cost of a mid-range data center from five years ago. The arms race is accelerating, and margins are evaporating faster than liquid coolant.

Is the Vera Rubin NVL72 rack price justified for enterprises?

For hyperscale cloud providers and large AI labs, yes—the performance-per-rack is unmatched, and the cost per token for inference is competitive if Nvidia’s claims hold up in real deployments. For mid-market companies, the $5–8.8 million entry cost is a significant barrier; they may wait for smaller systems or turn to cloud providers that absorb the capital expense. For server makers, the shift to Nvidia-controlled full systems is a wake-up call to move upstream into software, optimization, and managed services—the only layers where margins still exist.

Will Vera Rubin NVL72 racks become cheaper over time?

Unlikely in the near term. Nvidia controls supply, demand is insatiable, and cooling costs are rising, not falling. Prices may stabilize once competition emerges, but that is years away. Server makers and customers should expect Vera Rubin NVL72 racks to remain in the $5–8 million range through 2026 and beyond, with newer variants commanding even higher prices.

Nvidia Vera Rubin NVL72 racks represent a watershed moment in AI infrastructure: the end of the era when server makers could differentiate on integration and the beginning of an era where Nvidia controls the entire system, the pricing, and the margins. Cloud providers and enterprises will pay the price—literally—but they will get unmatched performance in return. For everyone else, the AI infrastructure arms race has just become a lot more expensive.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.