AI transformer model runs on 47-year-old PDP-11 computer

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
7 Min Read
AI transformer model runs on 47-year-old PDP-11 computer

An AI transformer model has been successfully implemented and trained on a 47-year-old PDP-11 computer, demonstrating that machine learning fundamentals can run on severely constrained hardware from the 1970s. The project, called ATTN-11 (Attention 11), is a minimal single-layer, single-head transformer written entirely in PDP-11 assembly language, capable of training on sequence-reversal tasks using fixed-point arithmetic. This “gloriously absurd” achievement highlights how understanding neural networks at their core does not require modern GPUs, cloud infrastructure, or frameworks like PyTorch.

Key Takeaways

  • ATTN-11 trains a transformer model on PDP-11 hardware with 6 MHz CPU and 64KB RAM using only assembly language
  • The system uses 8-bit fraction activations and 15-bit fraction gradients, shifted to Q15 format for fixed-point arithmetic
  • Execution is simple: boot the PDP-11, type S1000 to run, and watch training progress display on screen or DECwriter
  • The project serves educational purposes, demonstrating transformer and self-attention mechanics without modern ML frameworks
  • Hardware sourced from junkyard parts shows retro computing remains accessible for hands-on learning

Why Run AI on 1979-Era Hardware?

The AI transformer model on PDP-11 exists primarily as an educational tool, not a practical alternative to modern systems. By stripping machine learning down to its mathematical essence—matrix operations, softmax tables, and attention mechanisms—the project reveals how transformers work at the lowest level. A developer running this system can see exactly how each computation flows through registers, understand fixed-point arithmetic constraints, and grasp why modern hardware accelerators matter. There is no abstraction layer hiding the work.

This approach differs fundamentally from using PyTorch or TensorFlow on a laptop. Those frameworks obscure implementation details behind libraries optimized for speed. On the PDP-11, every multiplication, every memory access, every shift operation is explicit. Students and retrocomputing enthusiasts gain insight into transformer architecture that is impossible when training models on contemporary hardware.

The Hardware and Implementation Details

The PDP-11 used for ATTN-11 testing typically includes a 6 MHz CPU, 64KB of RAM (requiring at least 32KB core or MOS memory), and an extended instruction set. One well-documented build sourced parts from junkyards: a backplane, an M8189 CPU card, two 512KB RAM cards totaling roughly 1MB, a QBone Linux boot device, and power/halt-run switches. The system boots 2.9BSD, from which users type S1000 to launch the transformer training routine.

The mathematical operations rely on fixed-point arithmetic to avoid floating-point hardware that the PDP-11 lacks. The system multiplies 8-bit activation values by 15-bit gradients, producing 23-bit intermediate results stored in register pairs, then shifts to Q15 format. This constraint-driven design is not a limitation but a teaching moment: it shows how quantization, precision trade-offs, and arithmetic efficiency shaped early computing.

How the AI Transformer Model Trains and Performs

Running ATTN-11 involves booting the PDP-11, loading the assembly-language transformer code, and letting it train on a sequence-reversal task. The model outputs its parameter count and mathematical operation sizes to the terminal or a connected DECwriter printer. Training is slow by modern standards—the performance gap between a 1976 PDP-11/34 and a 2023 M2 Ultra system is roughly 200,000x in single-thread Dhrystone benchmarks—but the point is not speed. It is comprehension.

The transformer learns to reverse input sequences, a task simple enough to complete in reasonable time on 1979-era hardware yet complex enough to require attention mechanisms and gradient descent. Observers watching the training progress on screen see the model gradually improve, reinforcing that transformers are not magic—they are mathematical systems that learn through iteration, even on a computer that predates the personal computer era.

Educational Value and Retro-Computing Context

The AI transformer model project sits at the intersection of machine learning education and retro-computing nostalgia. For students learning transformers, seeing the algorithm implemented in assembly forces a deeper engagement with matrix math and attention operations than clicking through a Jupyter notebook. For retro-computing enthusiasts, it proves that old hardware is not just a museum piece—it can still execute sophisticated algorithms, albeit slowly.

This differs from modern AI research, which chases speed, scale, and accuracy on specialized hardware. ATTN-11 instead prioritizes clarity and accessibility. A hobbyist with a junkyard PDP-11 and assembly language knowledge can build and run this project, something impossible with training a large language model on consumer hardware. The project democratizes understanding of transformer mechanics in a way that large-scale systems cannot.

Is the PDP-11 transformer project practical for real AI work?

No. ATTN-11 trains on toy tasks and delivers no practical advantage over modern systems. The project is educational, demonstrating transformer concepts at the lowest level, not a viable approach to building production AI systems.

What makes the AI transformer model run on such old hardware?

The model is minimal—a single-layer, single-head transformer—and uses fixed-point arithmetic instead of floating-point, eliminating the need for specialized math hardware. Assembly language provides direct hardware control, avoiding the overhead of higher-level languages and frameworks.

Could this AI transformer model scale to larger tasks?

Theoretically yes, but practically no. The 64KB RAM constraint and 6 MHz processing speed would make training on realistic datasets infeasible in human timescales. The project remains a proof-of-concept for understanding transformer fundamentals on constrained hardware.

The AI transformer model running on a 47-year-old PDP-11 is a reminder that computing principles transcend hardware generations. Modern machine learning frameworks hide complexity behind abstraction layers; this project strips that away and shows transformers as they truly are—mathematical operations that run anywhere sufficient memory and processing power exist, even on computers from the 1970s. For educators and enthusiasts, that clarity is worth the trade-off in speed.

Edited by the All Things Geek team.

Source: Tom's Hardware

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.