Nvidia LPU decode shipping this year marks a turning point in the company’s inference strategy, according to VP of Hyperscale and HPC Ian Buck at GTC 2026. The decision to shelve CPX in favor of accelerating LPU decode deployment signals Nvidia’s confidence in a new architectural direction for cost-conscious AI workloads at scale.
Key Takeaways
- Nvidia is shelving the CPX architecture and prioritizing LPU decode shipping in 2026.
- Ian Buck, VP of Hyperscale and HPC, confirmed the strategic pivot during GTC 2026 press Q&A.
- Vera Rubin data center system uses copper and optical interconnects for scaled AI infrastructure.
- Groq 3 LPX chip, fabricated by Samsung, is scheduled to ship in Q3 2026.
- Tiered inference economics and co-packaged optics (CPO) are now in production.
Nvidia’s Vera Rubin Architecture Redefines Hyperscale Inference
Nvidia CEO Jensen Huang’s keynote at GTC 2026 introduced Vera Rubin as the company’s next-generation data center system, built on a fundamentally different approach to scaling AI workloads. Rather than chasing raw compute density, Vera Rubin emphasizes interconnect efficiency through copper and optical technologies, reducing the cost bottleneck that has plagued hyperscale deployments. This architectural choice reflects a maturing market where inference, not training, drives margin pressure.
The shift away from CPX toward LPU decode represents Nvidia’s bet that specialized decode processors will outcompete general-purpose GPUs on price-per-token metrics. By shelving CPX, the company is signaling that the traditional GPU path to inference efficiency has hit a ceiling. Vera Rubin’s scale-up capabilities using copper and optical interconnects suggest Nvidia is rethinking how data centers should be wired, not just how chips should be designed.
Why Shelving CPX Matters for the Inference Market
CPX was positioned as Nvidia’s answer to inference cost optimization, but market realities have shifted faster than product cycles. Ian Buck’s confirmation that CPX is shelved indicates the company has concluded that a purpose-built decode processor—the LPU—offers better economics than a modified GPU architecture. This is a rare admission from Nvidia that one of its planned products cannot compete in its intended market.
The decision also reflects pressure from specialized competitors. Groq’s integration into Nvidia’s ecosystem, via the Groq 3 LPX chip fabricated by Samsung and shipping in Q3 2026, suggests Nvidia is hedging against inference startups by acquiring or partnering with them rather than outcompeting them alone. This is a defensive move dressed as a strategic partnership—Nvidia is bringing the LPU threat inside the tent rather than fighting it from outside.
LPU Decode Shipping This Year: What Changes
Nvidia LPU decode shipping in 2026 accelerates the timeline for inference workload migration away from general-purpose GPUs. The LPU’s specialized instruction set and memory hierarchy are optimized for the token-generation phase of large language models, where latency and power efficiency matter more than raw throughput. Hyperscalers running inference at billions of requests per day will see immediate cost reductions if the LPU delivers on its architectural promises.
However, shipping LPU decode this year also raises integration questions. Customers will need to partition their inference workloads between GPU-based prefill (the initial prompt-processing phase) and LPU-based decode (the token-generation phase). This hybrid approach adds operational complexity. The success of LPU decode shipping depends on Nvidia and its partners delivering seamless orchestration software—a problem the company has historically struggled with outside of CUDA.
Vera Rubin’s Copper and Optical Interconnects: The Real Innovation
Beyond the chip announcements, Vera Rubin’s use of copper and optical interconnects is where Nvidia is making its boldest bet. Traditional data centers rely on electrical interconnects that scale poorly beyond certain distances and power budgets. By integrating co-packaged optics (CPO) into production, Vera Rubin enables denser, more power-efficient clusters. This is infrastructure-level innovation, not chip-level tweaking.
The implication is stark: future AI supercomputers will be defined by their interconnect topology, not their GPU count. Nvidia is positioning Vera Rubin as the architecture for this new era. Competitors like AMD and Intel will struggle to match this level of integration without massive capital investment in new fab partnerships and system-level design.
CPU Sales as a Multi-Billion Dollar Business
One often-overlooked detail from Huang’s keynote is the revelation that Nvidia’s standalone CPU business is now multi-billion-dollar in scale. This reflects the company’s success in data center CPUs (Grace) and the growing importance of CPU-GPU coherence in modern AI systems. As inference becomes more cost-sensitive, CPU efficiency matters more—a token-generation processor still needs a capable CPU for orchestration, caching, and control logic.
This diversification also hedges Nvidia against GPU commoditization. If LPUs and other specialized processors eventually erode GPU margins, Nvidia’s CPU business provides a fallback revenue stream. The company is no longer a GPU company; it is becoming a system company.
Will AI Inference Economics Kill SaaS as We Know It?
During the Q&A, Buck addressed a question that has haunted the industry: does the rise of tiered inference economics—where decode costs plummet—destroy the SaaS model? His answer was no, but with caveats. As inference becomes cheaper, SaaS margins compress unless companies can shift value upstream (e.g., better models, faster inference, lower latency). The winners will be those who own the inference stack end-to-end.
Nvidia’s Vera Rubin and LPU decode strategy is designed to capture value at every layer—chips, interconnects, system design, and software. This vertical integration is Nvidia’s insurance policy against margin collapse in the inference era.
What About Groq 3 and Samsung Fabrication?
The Groq 3 LPX chip, fabricated by Samsung and shipping in Q3 2026, adds another layer to Nvidia’s inference strategy. By not fabricating LPU decode entirely in-house, Nvidia is managing risk and securing supply diversity. Samsung’s involvement also signals that specialized inference processors have crossed a threshold of maturity—they are now worth fabbing at scale.
The Q3 2026 timeline is aggressive. It suggests Nvidia has already finalized designs and secured capacity. Missing this window would be a credibility blow, so expect Nvidia to execute flawlessly on this shipping date.
FAQ
Why is Nvidia shelving CPX instead of shipping it?
CPX was designed as a cost-optimized inference chip, but the rise of specialized LPU processors made it obsolete before launch. Nvidia chose to redirect engineering resources toward LPU decode, which offers better economics. This is a rare product cancellation from Nvidia and signals confidence that the LPU path is superior.
What does Nvidia LPU decode shipping mean for hyperscalers?
Nvidia LPU decode shipping in 2026 gives hyperscalers a new tool for reducing inference costs at scale. By offloading token generation to a specialized processor, companies can reduce power consumption and latency compared to GPU-based inference. Early adopters will see immediate competitive advantages in cost-per-inference metrics.
Is Vera Rubin replacing Hopper for inference workloads?
Vera Rubin is not a replacement for Hopper; it is a complementary architecture designed for inference-heavy clusters. Hopper will continue to dominate training and prefill-phase inference. Vera Rubin’s value lies in its interconnect efficiency and integration with LPU decode for the full inference stack.
Nvidia’s GTC 2026 announcements reveal a company betting hard on inference as the next growth engine for AI infrastructure. By shelving CPX and shipping LPU decode this year, the company is signaling that specialized processors have won the inference wars. Vera Rubin’s interconnect innovations and Groq 3’s Q3 2026 arrival suggest Nvidia is moving faster than competitors to capture margin in a market where cost-per-token is king. For hyperscalers, the message is clear: the era of GPU-only inference is ending, and the race to optimize hybrid inference stacks has begun.
Edited by the All Things Geek team.
Source: Tom's Hardware


