Nvidia drops Rubin CPX from roadmap, pivots to Groq 3 LPUs

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
7 Min Read
Nvidia drops Rubin CPX from roadmap, pivots to Groq 3 LPUs

Nvidia has removed Rubin CPX accelerators from its official roadmap, a significant shift that marks a strategic pivot away from the specialized long-context inference chip announced just months earlier. The absence of Rubin CPX from GTC slides signals that the company is instead elevating Groq 3 LPUs as its priority accelerator for massive-context AI workloads, even as the broader Vera Rubin platform accelerates toward general availability in Q3 2026.

Key Takeaways

  • Rubin CPX, a specialized GPU optimized for 1M+ token inference, has been removed from Nvidia’s roadmap despite earlier announcements
  • The chip featured 30 PFLOPS NVFP4 compute, 128GB GDDR7 memory, and 3x attention acceleration cores for prefill processing
  • Vera Rubin platform now targets Q3 2026 general availability, accelerated from original H2 2026 timeline
  • Groq 3 LPUs now take center stage in Nvidia’s inference acceleration strategy
  • Vera Rubin NVL144 CPX integrated system promised 8 exaflops AI compute and 1.7PB/s bandwidth per rack

What Rubin CPX Was Supposed to Do

Rubin CPX represented a departure from Nvidia’s traditional GPU design philosophy. Rather than chasing peak throughput across all workloads, it was purpose-built for a specific problem: processing massive context windows efficiently. The chip was engineered to handle prefill operations—the computationally intensive first phase of inference where an AI model processes a full input prompt before generating output. For applications like long-form code generation, video analysis, and document processing where context windows exceed 1 million tokens, this specialization mattered.

The architecture diverged from standard Rubin GPUs in several ways. Where regular Rubin chips use HBM4 memory connected via dual-die configurations, Rubin CPX relied on a single monolithic die paired with 128GB of GDDR7 memory running at approximately 32Gbps. This choice prioritized cost efficiency over bandwidth—GDDR7 delivers roughly 2TB/s per chip, significantly less than the 20.5TB/s of HBM4, but sufficient for prefill-dominated workloads where compute, not memory bandwidth, is the bottleneck. The chip also integrated three specialized attention acceleration cores and dedicated video encode/decode units, making it a genuinely novel processor rather than a rebadged variant.

According to Nvidia’s original positioning, Rubin CPX would deliver 3x faster attention processing compared to Blackwell GB300 NVL72 systems while maintaining cost advantages for inference scenarios where prefill compute dominates over token generation. The integrated Vera Rubin NVL144 CPX system was projected to deliver 8 exaflops of AI compute—roughly 7.5x the capacity of GB300 NVL72 configurations—with 100TB of fast memory and 1.7PB/s bandwidth per rack.

Why Nvidia Is Abandoning the Approach

The removal of Rubin CPX from the roadmap reflects a broader market reality: specialized hardware for inference is harder to justify than it initially appeared. Long-context inference is growing, but it remains a niche workload compared to general-purpose model serving. Building, qualifying, and supporting an entirely new GPU family requires enormous engineering investment—software optimization, compiler support, customer training, and multi-year commitment to the platform.

Groq 3 LPUs, now elevated to replace Rubin CPX in Nvidia’s messaging, represent a different architectural philosophy entirely. Rather than CUDA-based GPUs, Groq’s Language Processing Units are fundamentally different processors optimized for transformer inference through a completely different instruction set and memory hierarchy. By promoting Groq 3 LPUs at GTC, Nvidia signals that it may be hedging its bets on specialized inference accelerators while maintaining its core GPU business for training and general-purpose workloads.

The timing also matters. Vera Rubin platform acceleration to Q3 2026 means Nvidia is compressing its development timeline. Removing a late-stage product variant from the roadmap simplifies manufacturing, reduces SKU complexity, and lets engineering teams focus on getting the main Rubin GPU and Vera CPU to market on schedule. This is pragmatic: shipping a solid general-purpose accelerator on time beats shipping a specialized variant late.

What This Means for Long-Context Inference

Users requiring long-context inference acceleration will need to rely on standard Rubin GPUs, which will support massive-context workloads through software optimization and architectural improvements, or explore alternative solutions. Groq’s LPU approach, now positioned as Nvidia’s preferred inference strategy, offers a fundamentally different trade-off: fixed-function hardware optimized for transformer inference at the cost of less flexibility for other workloads.

The broader Vera Rubin platform remains on track. All chips have been taped out with working silicon, and general availability is now expected in H2 2026, with Q3 2026 as the accelerated target. This includes the Rubin GPU for general-purpose AI workloads, the Vera CPU for control plane and software, the BlueField-4 DPU for networking, and NVLink connectivity. Rubin CPX simply will not be part of that lineup.

Is Rubin CPX completely dead?

Nvidia has removed Rubin CPX from its official roadmap and is no longer promoting it in public communications, but the company has not explicitly stated that the chip will never exist. It is possible that Rubin CPX could be revived as an internal or limited-availability variant down the line, but based on current messaging, it should not be factored into purchasing or deployment decisions.

When will Vera Rubin platform actually launch?

Vera Rubin platform general availability is targeted for Q3 2026, an acceleration from the original H2 2026 timeline. All necessary chips have been taped out and are in working silicon, according to Nvidia. Partners will begin offering systems in that timeframe, though exact availability may vary by region and partner.

Nvidia’s removal of Rubin CPX from the roadmap is a clear signal that specialized inference hardware, while theoretically appealing, faces real-world challenges in justifying the engineering and manufacturing complexity. By pivoting to Groq 3 LPUs and focusing the Vera Rubin platform on general-purpose acceleration, Nvidia is making a calculated bet that software optimization and architectural flexibility matter more than single-purpose silicon for the next generation of AI compute.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.