DeepSeek V4 Challenges AI Giants With 1.6T Parameters and Huawei Chips

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
7 Min Read
A person holding a cell phone in their hand

DeepSeek V4 is a large language model with 1.6 trillion total parameters released by Chinese AI startup DeepSeek on April 24, 2026, running on Huawei chips and available globally via API. The release marks a significant escalation in the open-source AI race, arriving amid intensifying U.S. government accusations that DeepSeek and other Chinese AI firms have engaged in intellectual property theft. Two versions launched simultaneously: V4-Pro with 1.6 trillion total parameters and 49 billion active parameters, and V4-Flash with 284 billion total parameters and 13 billion active parameters, both supporting a 1 million token context window.

Key Takeaways

  • DeepSeek V4-Pro claims performance rivaling Anthropic Claude Opus 4.6, OpenAI GPT-5.4, and Google Gemini 3.1 Pro
  • Both versions support 1 million token context length, roughly equivalent to 750,000 words
  • V4-Pro pricing starts at $3.48 per 1 million tokens, undercutting closed-source competitors
  • Hybrid Attention architecture reduces inference compute to 27% of DeepSeek V3.2 at 1M context
  • Launched on Huawei chips as U.S. escalates IP theft allegations against Chinese AI companies

How DeepSeek V4 Compares to Frontier Models

DeepSeek V4-Pro trails marginally behind GPT-5.4 and Gemini 3.1 Pro, suggesting a developmental trajectory approximately three to six months behind state-of-the-art frontier models. Yet within the open-source ecosystem, DeepSeek V4 dominates decisively. It beats all open-source competitors in agentic coding benchmarks, mathematics, STEM reasoning, and general coding tasks. On the MATH benchmark, V4-Pro-Base scores 64.5% compared to V3.2-Base’s 60.5%, demonstrating measurable improvement in mathematical reasoning. V4-Flash-Base achieves 85.7% on the MGSM benchmark, delivering strong performance in a lighter-weight variant.

Against GPT-4o specifically, DeepSeek V4 maintains 100% logical consistency in long-context tasks, whereas GPT-4o experiences what users describe as logic hallucinations beyond 10,000 tokens. This architectural advantage matters for real-world applications requiring sustained reasoning across documents, code repositories, or multi-turn conversations. The comparison underscores a critical weakness in competing models: their inability to maintain coherence at scale.

DeepSeek V4’s Architectural Innovations and Efficiency Gains

The core innovation powering DeepSeek V4 is its Hybrid Attention mechanism, which combines Compressed Sparse Attention, Heavily Compressed Attention, and DeepSeek Sparse Attention. This design reduces inference compute requirements to just 27% of what DeepSeek V3.2 required at the 1 million token context length, while cutting key-value cache memory to 10%. For practitioners, this translates to lower latency, reduced GPU memory consumption, and dramatically lower operating costs—a direct economic advantage in production deployments.

The Mixture-of-Experts architecture in V4-Pro activates only 49 billion parameters during inference despite the 1.6 trillion parameter count, allowing the model to maintain parameter efficiency while accessing specialized knowledge pools. This design choice reflects lessons learned from DeepSeek’s earlier V3 and R1 releases, which demonstrated that frontier performance need not require proportional computational overhead.

Pricing and API Availability

DeepSeek V4-Pro pricing begins at $3.48 per 1 million input tokens, making it substantially cheaper than comparable closed-source models. V4-Flash offers even more aggressive pricing for cost-sensitive use cases, though specific per-token rates were not disclosed at launch. Both versions became available immediately via API on April 24, 2026, supporting OpenAI ChatCompletions and Anthropic API formats. Users can access the model by updating their model name parameter while keeping the same base URL, lowering switching friction.

The model supports both Thinking and Non-Thinking modes, allowing users to trade inference speed for reasoning depth depending on task requirements. DeepSeek also open-sourced the weights on Hugging Face, enabling researchers and organizations to self-host the model without relying on external APIs.

The Huawei Hardware Question and U.S. Accusations

DeepSeek V4 runs on Huawei chips, a detail that carries geopolitical weight. The launch arrives as the U.S. government escalates accusations that DeepSeek and other Chinese AI firms have engaged in intellectual property theft. These allegations have not been independently verified and remain contested. The timing of the accusations alongside the V4 release suggests intensifying scrutiny of Chinese AI development, particularly regarding how quickly these companies achieve frontier capabilities despite U.S. export controls on advanced semiconductors.

Whether DeepSeek developed V4 through independent innovation or by leveraging proprietary techniques from other organizations remains a matter of official investigation. What is clear: the model’s performance on public benchmarks demonstrates technical competence regardless of its origin story.

Is DeepSeek V4 worth using over GPT-5.4 or Gemini 3.1 Pro?

For cost-sensitive applications, DeepSeek V4 is compelling. Its pricing undercuts closed-source competitors by a significant margin, and its long-context performance rivals much pricier alternatives. For organizations prioritizing absolute frontier capability in knowledge or reasoning, GPT-5.4 and Gemini 3.1 Pro maintain narrow advantages. For open-source deployments and research, DeepSeek V4 is the strongest available option.

Can DeepSeek V4 handle 1 million tokens reliably?

Yes. Both V4-Pro and V4-Flash support 1 million token context length, roughly 750,000 words, with the Hybrid Attention architecture designed to maintain consistency across such extended inputs. Real-world performance depends on specific use cases, but the architectural approach suggests sustained reliability compared to models that degrade in long-context settings.

How does V4-Flash compare to V4-Pro for production use?

V4-Flash trades some reasoning capability for speed and cost, making it suitable for latency-sensitive applications like real-time chat, customer support, and rapid content generation. V4-Pro is better suited for complex reasoning, code generation, and mathematical problem-solving where accuracy outweighs speed. The choice depends on your application’s priority: throughput or precision.

DeepSeek V4 reshapes the cost-performance frontier in AI. It does not displace GPT-5.4 or Gemini 3.1 Pro for organizations willing to pay for marginal capability gains, but it makes those premium models harder to justify for budget-conscious teams and researchers. The real story is not whether V4 is better than the best—it is that it is good enough for most use cases at a fraction of the price, while running on hardware that avoids U.S. export restrictions. That combination is what makes this launch significant.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.