MIT’s Sandook nearly doubles SSD performance without new hardware

Craig Nash
By
Craig Nash
AI-powered tech writer covering artificial intelligence, chips, and computing.
8 Min Read
MIT's Sandook nearly doubles SSD performance without new hardware — AI-generated illustration

SSD performance optimization has become critical for data centers handling massive AI and machine learning workloads, and MIT researchers just demonstrated a way to nearly double throughput without buying new hardware. Sandook, a software system developed by MIT researchers, addresses three major sources of SSD slowdown simultaneously—read-write interference, garbage collection delays, and performance variability across devices in a pooled storage system. The name Sandook means “box” or “storage” in Urdu.

Key Takeaways

  • Sandook boosts SSD throughput 12-94% depending on workload, achieving 95% of theoretical maximum performance
  • Improves overall SSD utilization by 23% through intelligent workload distribution across drives
  • Requires no hardware changes or application-specific updates, making it immediately deployable in existing data centers
  • Uses a two-tier architecture to dynamically balance workloads and adapt to changing demands in real time
  • Tested on a 10-SSD pool running database, machine learning, image compression, and data storage tasks

How SSD performance optimization works at scale

Data centers don’t run a single application on a single drive. They pool storage across dozens or hundreds of SSDs, each handling different workloads simultaneously. This creates a coordination problem: when one drive gets bogged down by garbage collection (the background process that reclaims space), other drives might be idle while applications wait. Sandook solves this by rotating which SSDs handle read and write operations, preventing simultaneous reads and writes on the same drive that cause interference delays. When the system detects that garbage collection is slowing a particular drive, it profiles that drive’s typical performance and then redirects workloads to healthier drives until collection completes.

The two-tier architecture enables real-time adaptation. As workloads shift—a database query finishing, a machine learning training job starting—Sandook rebalances tasks across the pool dynamically rather than using static allocation rules. This approach outperforms traditional methods that address only one source of variability at a time, delivering 12-94% throughput improvements per application depending on the workload. Across a diverse pool of tasks, the system achieved 95% of theoretical maximum SSD performance, unlocking hidden capacity that conventional systems leave on the table.

Why SSD performance optimization matters for data center economics

At data center scale, every percentage point of efficiency compounds. A 23% improvement in SSD utilization means fewer drives needed to handle the same workload, directly reducing hardware costs, power consumption, and physical space. As one MIT researcher noted, “Our dynamic solution can unlock more performance for all the SSDs and really push them to the limit. Every bit of capacity you can save really counts at this scale”. For operators managing thousands of drives, that savings translates to millions of dollars annually.

The timing matters too. AI and machine learning workloads are accelerating, and data centers face mounting pressure to improve efficiency without proportional increases in hardware spending. Sandook delivers performance gains through software alone—no specialized hardware, no application rewrites, no integration overhead. That “for free” framing in headlines reflects the reality: existing infrastructure immediately becomes more capable.

Comparing Sandook to alternative approaches

Other research has tackled data center storage efficiency differently. LightStore, another MIT project, modifies SSDs for direct network connection, achieving 620 Mbps throughput on 10 Gigabit Ethernet while consuming 10W per node versus the typical 20W, and requiring half the physical space. But LightStore requires hardware changes—it is not a drop-in software solution. Sandook’s advantage is deployment simplicity: it works with existing drives and systems. The trade-off is scope: Sandook optimizes workload distribution across a pooled SSD environment, while LightStore redesigns the drive itself. For data centers with mature infrastructure, Sandook offers immediate gains. For new deployments, hardware-level improvements like LightStore might offer complementary benefits.

What the testing actually showed

MIT researchers tested Sandook on a pool of 10 SSDs running four distinct tasks: running a database, training a machine learning model, compressing images, and storing user data. The throughput improvements ranged from 12% to 94% depending on which application was running. This variation matters—the “nearly doubled” framing applies to best-case scenarios, not average performance. Across all tasks, the system improved overall SSD capacity utilization by 23%.

The constraint is important: Sandook is designed for data centers with pooled storage environments, not consumer laptops or single-drive systems. It assumes multiple SSDs working in parallel and the ability to dynamically route workloads between them. A home user with one SSD won’t see these gains. But for cloud providers, hyperscalers, and enterprise data centers running diverse workloads on shared storage infrastructure, the performance uplift and utilization improvement represent substantial competitive advantage.

Is Sandook being commercialized or released?

The research brief does not specify whether Sandook is available for commercial deployment, under active development, or still in the research phase. MIT has published the findings, but specifics about licensing, availability timelines, or partnerships with storage vendors are not detailed in available sources.

How does Sandook compare to upgrading SSDs?

Buying faster SSDs is the traditional approach to data center performance improvement, but it requires capital expenditure, hardware replacement, and downtime. Sandook achieves comparable or better throughput gains through software optimization of existing hardware. The choice depends on current utilization: if SSDs are already near capacity, hardware upgrade may be necessary. If utilization is poor due to workload imbalance, Sandook unlocks existing capacity first.

What makes SSD performance optimization hard?

SSDs are not simple storage boxes—they contain complex firmware that manages wear leveling, garbage collection, and error correction. Performance variability stems from these background processes, device-to-device manufacturing differences, and interference between simultaneous read-write operations. Traditional approaches tackle one problem at a time. Sandook’s insight is that addressing all three simultaneously, dynamically, across a pool of drives yields compounding gains that exceed the sum of individual improvements.

For data center operators drowning in hardware costs and power budgets, Sandook represents a rare win: substantial performance gains from software optimization alone. It won’t replace hardware upgrades forever, but it defers them, saves money, and improves sustainability by extracting more value from existing infrastructure. That is why this research matters right now.

This article was written with AI assistance and editorially reviewed.

Source: TechRadar

Share This Article
AI-powered tech writer covering artificial intelligence, chips, and computing.