DirectStorage GPU decompression: Blackwell’s edge over prior generations

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
9 Min Read
DirectStorage GPU decompression: Blackwell's edge over prior generations — AI-generated illustration

DirectStorage GPU decompression represents a fundamental shift in how PC games stream assets from storage to GPU memory, bypassing the CPU bottleneck that has plagued PC gaming for years. NVIDIA’s Blackwell architecture (RTX 5090) handles this workload more gracefully than its Ada predecessor, the RTX 4090, but the technology still carries performance trade-offs that developers and gamers need to understand before assuming it is a universal win.

Key Takeaways

  • Blackwell GPUs show reduced performance penalties from DirectStorage GPU decompression versus Ada generation cards.
  • GDeflate, NVIDIA and Microsoft’s GPU-optimized compression standard, enables over 2x effective bandwidth through compression ratios.
  • Spider-Man 2 performs better with CPU decompression at 4K, while Ratchet & Clank: Rift Apart favors GPU decompression.
  • DirectStorage on PC lacks true hardware decompression; GPU and CPU paths both carry overhead.
  • RTX 5090 architecture improvements reduce scheduling conflicts between rendering and GDeflate compute shaders.

What DirectStorage GPU decompression actually does

DirectStorage GPU decompression bypasses traditional CPU-based asset decompression by routing compressed data directly into GPU memory, where a shader (GDeflate) decompresses it on the fly. This sounds elegant in theory. Consoles achieve this with dedicated hardware decompression engines that sit between the NVMe controller and unified memory. PC has no such luxury. Instead, developers choose between CPU decompression (which ties up processor cores) or GPU decompression (which adds shader workload during rendering). Neither is free.

NVIDIA and Microsoft developed GDeflate as an open standard for DirectStorage 1.1, specifically designed to scale with GPU parallelism. The technology requires NVIDIA Game Ready Driver 526.47 or later and leverages hidden compute queues to run decompression shaders alongside rendering. In the ideal scenario, compressed textures and geometry land directly in VRAM without touching the CPU staging heap, reducing memory copies and freeing up system resources.

The minimum DirectStorage performance guarantee is 2.0 GB/s of raw data over a 250 millisecond window, but effective throughput climbs significantly when compression ratios are factored in. A texture compressed at 4:1 ratio theoretically delivers 8.0 GB/s effective bandwidth—a meaningful advantage for asset-heavy open worlds and next-gen streaming demands.

Blackwell closes the GPU decompression gap

The RTX 4090 suffered a consistent performance hit from GPU decompression: up to 17% worse 1% lows in some games and as much as 30% on older RTX 3080 hardware. This penalty stemmed from scheduling conflicts between rendering workloads and GDeflate compute shaders competing for GPU resources. Blackwell’s architecture improvements reduce this contention significantly. The RTX 5090 does not eliminate the penalty, but it narrows it enough that GPU decompression becomes viable in more scenarios.

Testing in Ratchet & Clank: Rift Apart showed GPU decompression delivering a 1.19% average FPS gain and 5.26% improvement in 1% lows compared to CPU decompression across tested resolutions. Spider-Man 2 told a different story: CPU decompression outperformed GPU at 4K (8.2% higher average FPS, 10.53% better 1% lows), at 1440p (3.53% average, 14.29% 1% lows), and at 1080p (3.3% average, 9.23% 1% lows). The inconsistency reveals a hard truth—DirectStorage GPU decompression is not universally better. Game engines, asset types, and GPU utilization patterns all determine whether offloading decompression to the GPU actually improves frame times.

Why CPU decompression still wins in some games

Spider-Man 2’s CPU decompression advantage hints at a deeper issue: GPU decompression only benefits scenarios where the GPU would otherwise be idle during asset streaming. In a game already pushing the RTX 5090 to its limits, adding decompression shaders creates contention rather than relief. CPU decompression, by contrast, leverages processor cores that might otherwise sit underutilized during GPU-bound rendering. The architecture of each game determines which path is optimal.

This is why the console advantage remains so pronounced. PlayStation 5 and Xbox Series X route compressed data through dedicated decompression silicon that operates entirely outside the rendering pipeline. PC developers must choose between two imperfect solutions: tie up CPU cores or add GPU overhead. Neither approach scales as cleanly as purpose-built hardware.

DirectStorage adoption remains sluggish despite improvements

Two years after DirectStorage arrived on Windows, adoption among PC games remains minimal. Ratchet & Clank: Rift Apart and Spider-Man 2 are among the few titles leveraging it meaningfully. Developers cite complexity, inconsistent driver support, and unclear performance benefits as reasons to delay integration. Blackwell’s improved handling of GPU decompression removes one barrier, but it does not solve the fundamental problem: DirectStorage delivers fast asset streaming, not guaranteed FPS gains. Marketing that distinction is harder than promising raw speed.

The technology also reveals a deeper PC architecture problem. Consoles integrate storage, GPU, and CPU through unified memory and custom silicon. PCs split these domains—separate VRAM and system RAM, discrete GPUs, commodity SSDs. DirectStorage tries to bridge these gaps through software, but software solutions always carry overhead that hardware solutions avoid.

Machine learning compression as the next frontier

Blackwell’s exceptional machine learning performance opens a new possibility for DirectStorage: learned texture compression using neural networks. Rather than relying solely on GDeflate’s algorithmic compression, future games could use Blackwell’s Tensor Cores to decompress AI-compressed assets in real time. This approach could push effective bandwidth even higher while reducing file sizes more aggressively than traditional codecs. It is speculative, but it represents the most promising direction for DirectStorage optimization on high-end GPUs.

Is DirectStorage GPU decompression worth enabling?

For RTX 5090 owners playing Ratchet & Clank: Rift Apart, yes—the GPU path shows measurable frame-time improvements. For Spider-Man 2 or other CPU-decompression-favoring titles, disable it and let the CPU handle the load. There is no universal answer. Gamers need to test both paths in each title and measure 1% lows, not just average FPS, since DirectStorage’s real benefit is frame-time consistency during asset streaming, not peak frame rates.

Does Blackwell handle DirectStorage better than RTX 4090?

Yes, but not dramatically. The RTX 5090 shows smaller performance penalties from GPU decompression than the RTX 4090, particularly at lower resolutions where scheduling conflicts are more pronounced. However, the improvement is incremental, not revolutionary. Blackwell’s architectural refinements reduce contention between rendering and compute shaders, but the fundamental limitation—GPU decompression adds workload during rendering—remains.

Will DirectStorage ever match console-style decompression?

Not without dedicated hardware. PCs could theoretically add a dedicated decompression engine between the SSD and GPU, but this would require significant hardware redesign and would only benefit a small subset of games. The more realistic path is improving GPU decompression efficiency through better scheduling, AI-assisted compression, and driver optimization. Blackwell takes a step in that direction, but it is not the final answer.

DirectStorage GPU decompression on Blackwell represents genuine progress—faster asset streaming with smaller performance trade-offs than prior generations. But progress is not perfection. Developers still face a choice between imperfect solutions, and gamers should test each game individually rather than assuming GPU decompression is always the right call. The technology is maturing, but it remains a specialized tool for specific scenarios, not a universal performance upgrade.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.