The LineShine LX2 supercomputer represents a significant shift in how nations approach extreme-scale computing when access to foreign accelerators is restricted. China’s National Supercomputing Center in Shenzhen deployed this CPU-only system, which claims 1.54 exaflops of BF16 training performance—a direct challenge to the GPU-centric infrastructure that has dominated high-performance computing for the past decade.
Key Takeaways
- LineShine LX2 delivers 1.54 exaflops using 2.4 million Armv9 CPU cores, no GPUs required
- System contains 20,480 compute nodes, each with two Huawei LX2 processors
- Each LX2 processor contains 304 cores; interconnect bandwidth reaches 1.6 Tb/s per node
- Peaks at 2.16 exaflops during AI training workloads on 6.3-billion-parameter models
- CPU-only design mirrors Japan’s Fugaku supercomputer strategy, avoiding CUDA dependency
Why CPU-Only Architecture Matters for LineShine LX2
The LineShine LX2 supercomputer abandons the accelerator-heavy model that Nvidia’s GPUs and CUDA ecosystem have dominated for AI training and scientific simulation. Instead, the system packs 2,451,840 total Armv9 cores across 40,960 LX2 processors—a brute-force approach to raw compute that sidesteps the need for restricted foreign hardware. This matters because U.S. export controls have made advanced Nvidia accelerators inaccessible to Chinese institutions, forcing a rethink of how exascale systems are architected.
The CPU-only strategy is not new. Japan’s Fugaku supercomputer proved that dense CPU arrays can deliver world-class performance without relying on GPU acceleration, using Fujitsu’s A64FX processor architecture. The LineShine LX2 follows this precedent, but at a scale that pushes the boundaries of what all-CPU systems can achieve. Each compute node pairs two LX2 processors and connects them through the LingQi high-speed network, delivering 1.6 Tb/s of bandwidth per node—critical for keeping 608 cores per node fed with data.
Raw Performance Claims and What They Mean
The LineShine LX2 supercomputer reports three different performance metrics, and understanding the distinction matters. The system delivers 1.54 exaflops of BF16 training performance, a standard metric for AI workloads using lower-precision floating-point arithmetic. During a specific task—training a 6.3-billion-parameter Earth observation generative compression model—it reportedly peaks at 2.16 exaflops. Theoretical peak FP64 performance (the gold standard for scientific computing) is cited as 2.47 exaflops.
These numbers are impressive on paper, but context is crucial. BF16 performance is easier to achieve than FP64 because lower precision requires fewer transistors and less memory bandwidth. The 2.16-exaflop figure is task-specific, not a general-purpose sustained rate. The 2.47-exaflop theoretical peak is exactly that—theoretical, assuming perfect utilization and zero overhead, which no real system achieves. What matters for users is how the LineShine LX2 supercomputer performs on their actual workloads, not marketing-friendly peak numbers.
Geopolitics and Semiconductor Independence
The LineShine LX2 supercomputer is not primarily a technical achievement—it is a geopolitical statement. China has designed, manufactured, and deployed an exascale system using entirely domestic Armv9 processors and interconnect technology. The LingQi network (also called LQLink) is a homegrown fabric, not reliant on Mellanox or Intel networking stacks. This matters because it demonstrates that China can build world-class HPC infrastructure without depending on U.S. suppliers.
The system deployed at Shenzhen’s National Supercomputing Center signals that China is serious about reducing dependence on CUDA and the broader Nvidia ecosystem. For researchers and institutions operating under U.S. export restrictions, the LineShine LX2 supercomputer offers a viable alternative for AI training and climate modeling, though the software ecosystem around it remains smaller than CUDA’s mature tooling. Developers accustomed to CUDA will face a learning curve migrating to whatever software stack runs on these Armv9 cores.
How LineShine LX2 Compares to GPU-Accelerated Systems
GPU-accelerated supercomputers like the U.S. Department of Energy’s El Capitan use thousands of Nvidia H200 accelerators paired with CPUs. These hybrid systems achieve higher FLOPS per watt and per dollar than pure CPU designs, but they depend on access to the latest Nvidia hardware—access China no longer has. The LineShine LX2 supercomputer trades efficiency for independence. Running 2.4 million cores requires massive power consumption and cooling infrastructure, but it avoids the bottleneck of waiting for Nvidia to sell you hardware they are legally prohibited from exporting.
The architectural trade-off is real. GPU-accelerated systems excel at tensor operations and matrix multiplication, the core of modern AI training. CPUs are more flexible but less specialized. The LineShine LX2 supercomputer likely performs well on AI workloads it was specifically optimized for, but may struggle with workloads that GPU CUDA kernels were written to exploit. Performance per watt favors GPUs; performance per dollar and independence favor the CPU-only approach.
What This Means for Global Supercomputing
The LineShine LX2 supercomputer is a watershed moment for supercomputing competition. It proves that exascale performance is achievable without Nvidia, though at a cost in efficiency and software maturity. Other nations and institutions facing export restrictions will study this design closely. The success or failure of the LineShine LX2 in real-world AI and climate modeling workloads will determine whether CPU-only exascale becomes a viable alternative or remains a geopolitical workaround.
For the global HPC community, the message is clear: the era of GPU-only dominance is not inevitable. Competitors with sufficient resources and manufacturing capability can build alternatives. Whether those alternatives are preferable depends on workload, power budget, and software ecosystem maturity—factors that will take years to fully evaluate.
Can the LineShine LX2 supercomputer compete with GPU systems for AI training?
The LineShine LX2 supercomputer will compete effectively on specific AI tasks, particularly those already optimized for CPU execution or where lower precision (BF16) is acceptable. However, workloads with existing GPU CUDA implementations will require significant porting effort. The system’s real advantage is availability to Chinese institutions under export restrictions, not outright superiority over GPU alternatives.
How many cores does the LineShine LX2 supercomputer contain?
The LineShine LX2 supercomputer contains 2,451,840 total Armv9 CPU cores distributed across 40,960 LX2 processors. Each processor has 304 cores, and each of the 20,480 compute nodes contains two processors, yielding 608 cores per node. This extreme core count compensates for the lower per-core performance of CPUs compared to specialized AI accelerators.
What is the power consumption of the LineShine LX2 supercomputer?
Power consumption figures for the LineShine LX2 supercomputer were not disclosed in available reporting. CPU-only systems typically consume more power than GPU-accelerated alternatives for equivalent performance, but exact specifications remain proprietary. Operating costs at scale are a critical factor that will determine long-term viability.
The LineShine LX2 supercomputer is a pragmatic response to U.S. export controls, not a revolutionary leap forward in computing. It proves that exascale CPU-only systems are feasible and can deliver respectable performance on appropriate workloads. Whether it becomes a model for future systems depends on how well it performs in production use and whether the software ecosystem matures to support it. For now, it stands as evidence that semiconductor independence is possible—if you have the resources and patience to build it yourself.
Edited by the All Things Geek team.
Source: TechRadar


