📺 Watch Video
🎧 Listen

1. Introduction

In January 2025, Jensen Huang took the stage at CES and announced "Project DIGITS" — a Grace Blackwell–powered AI supercomputer that would fit on your desk. By March at GTC, it was renamed the DGX Spark and positioned as the entry point to NVIDIA's DGX ecosystem. After months of anticipation, the Founder's Edition shipped in October 2025 at $3,999.

The pitch is compelling: 128 GB of unified memory, a Blackwell GPU with full CUDA support, 1 petaFLOP of AI compute, all in a champagne-gold box smaller than a Mac Mini. But the reality is more nuanced. The memory bandwidth is a fraction of discrete GPUs, the 1 PFLOP claim relies on sparse FP4, and the ARM CPU means not everything "just works" yet.

This guide covers everything: the real specs behind the marketing, actual benchmark numbers from LMSYS and others, power consumption, the software ecosystem, who should buy one, and how it stacks up against the Mac Studio, DIY GPU rigs, and cloud alternatives.

2. What Is the DGX Spark?

The DGX Spark is NVIDIA's first desktop-class AI workstation built on the Grace Blackwell architecture. At its core is the GB10 Superchip, a system-on-chip co-developed with MediaTek that combines a 20-core ARM CPU with a Blackwell GPU on TSMC's 3nm process.

Unlike traditional GPU workstations where the CPU and GPU have separate memory pools, the DGX Spark uses a unified memory architecture — both processors share 128 GB of LPDDR5x RAM. This means models load directly into a single address space without the overhead of system-to-VRAM transfers, and you can run models far larger than any consumer GPU's VRAM allows.

The machine measures just 150 × 150 × 50.5 mm (roughly 6 inches square) and weighs about 1.2 kg. It's powered by a 240W external USB-C power brick. The design pays homage to the original DGX-1 that Jensen hand-delivered to OpenAI in 2016 — NVIDIA even recreated that moment by delivering a Spark to Elon Musk at launch.

Key Connectivity

The dual QSFP ports are the secret weapon: two DGX Sparks connected via a copper DAC cable can operate as a mini-cluster with 256 GB unified memory, running models up to 405B parameters in FP4. NVIDIA officially supports two-node clusters, though technically nothing stops you from going further.

3. Full Specifications

Component Specification
SoCNVIDIA GB10 Grace Blackwell Superchip (TSMC 3nm)
CPU20-core ARM: 10× Cortex-X925 (4 GHz) + 10× Cortex-A725 (2.8 GHz)
GPU ArchitectureBlackwell (same as RTX 50-series)
CUDA Cores6,144
Tensor Cores192 (5th-gen)
RT Cores48 (4th-gen)
AI Compute1 PFLOP sparse FP4 / ~500 TFLOPS dense FP4
Memory128 GB unified LPDDR5x @ 8533 MT/s
Memory Bus256-bit
Memory Bandwidth273 GB/s
Storage4 TB NVMe SSD (Founder's Edition)
Networking10 GbE + 2× ConnectX-7 200 Gbps QSFP56
WirelessWiFi 7, Bluetooth 5.4
USB4× USB-C 3.2 (20 Gbps), 1 for PD
DisplayHDMI 2.1A + USB-C DisplayPort Alt Mode
NVENC / NVDEC1× / 1×
Dimensions150 × 150 × 50.5 mm
Weight~1.2 kg
Peak Power240W (USB-C PD)
OSDGX OS (Ubuntu 24.04 + NVIDIA drivers)
⚠️ The "1 PetaFLOP" Asterisk NVIDIA's headline claim of 1 PFLOP uses sparse FP4 — a technique called structured sparsity that assumes ~50% of tensor values are zero. In dense workloads (which most real inference is), you get roughly 500 TFLOPS. This puts the GPU's raw capability between an RTX 5070 and 5070 Ti. The real bottleneck, however, is memory bandwidth at 273 GB/s — far below the 1,792 GB/s of an RTX 5090's GDDR7.

4. Real-World Benchmarks

The most comprehensive benchmarks come from LMSYS (the SGLang team), who tested extensively with both SGLang and Ollama across multiple model sizes. The key takeaway: the DGX Spark shines on small-to-medium models, but memory bandwidth limits large model performance significantly.

SGLang Benchmarks (FP8)

Model Batch Prefill (tps) Decode (tps) Notes
Llama 3.1 8B17,99120.5Excellent single-user speed
Llama 3.1 8B327,949368Linear batch scaling — impressive
DeepSeek-R1 14B82,07483.5Sustained without thermal throttle
Gemma 3 27B1~3,500~10Usable for prototyping
Qwen 3 32B1~3,200~9Similar to Gemma 27B
Llama 3.1 70B18032.7Loads, but very slow decode

Ollama Benchmarks

Model Quant Prefill (tps) Decode (tps)
GPT-OSS 20BMXFP42,05349.7
GPT-OSS 120BMXFP4~350~5
Llama 3.1 8Bq8_0~4,500~30
Llama 3.1 70Bq4_K_M~700~3
DeepSeek-R1 14Bq8_0~2,800~22

Comparison Context

For context, running GPT-OSS 20B (MXFP4) on an RTX 5090 yields ~8,519 tps prefill / 205 tps decode — roughly 4× faster than the Spark. The RTX PRO 6000 Blackwell hits 10,108 / 215 tps. But neither of those GPUs can load a 70B or 120B model into their 32 GB / 96 GB of VRAM respectively. The Spark's 128 GB of unified memory is its competitive moat.

✅ The DGX Spark's sweet spot Models under 30B parameters at FP8/Q8 — you get excellent throughput with batch scaling. Think Llama 3.1 8B, DeepSeek-R1 14B, Gemma 12B. For 70B+ models, the Spark lets you run them locally for prototyping, but decode speed (~3 tps) is too slow for interactive use.

5. Energy Consumption & Power Costs

One of the DGX Spark's strongest advantages is its power efficiency. The system uses a 240W USB-C power adapter, but real-world measurements paint a compelling picture:

State Power Draw Notes
Idle40-45WHigher than typical ARM devices due to Blackwell GPU + ConnectX-7
CPU Only Load120-130WAll 20 ARM cores active
Heavy AI Load~170WTypical during inference (Signal65 measured)
Peak Measured~200WServeTheHome couldn't reach the 240W rated max
GB10 SoC TDP140WCPU + GPU combined

Annual Electricity Cost Comparison

Assuming 24/7 operation at average load, using $0.15/kWh (US average):

System Avg. Power Annual Cost
DGX Spark (idle/light)~50W~$66/yr
DGX Spark (inference)~170W~$223/yr
Mac Studio M4 Ultra~60-120W~$79-$158/yr
Single RTX 5090 desktop~400-575W~$526-$756/yr
4× RTX 3090 rig~1,200-1,600W~$1,577-$2,102/yr
🔑 Key Insight The DGX Spark saves $1,300-1,900/year in electricity vs a 4× GPU rig while offering comparable model capacity (128 GB unified vs ~96 GB VRAM). The trade-off is ~4× slower inference. For prototyping and development where you need to run large models but not at production speed, the economics are excellent.
⚠️ John Carmack's 100W Observation In November 2025, John Carmack reported that his DGX Spark appeared to max out at 100W, delivering only about half the expected performance. NVIDIA investigated and this appears related to early firmware/driver issues with power management. Check for firmware updates if you experience similar behavior.

6. Software Stack

The DGX Spark runs DGX OS, which is Ubuntu 24.04 with NVIDIA's drivers, CUDA toolkit, and AI software pre-installed. Setup is straightforward — plug in a keyboard, mouse, and monitor for desktop mode, or power on without peripherals for headless mode (accessible via WiFi or Ethernet).

What Comes Pre-installed

Container Support

Full Docker and Podman support with GPU passthrough. NVIDIA's NGC (NVIDIA GPU Cloud) container registry provides optimized containers for PyTorch, TensorFlow, Triton Inference Server, and more. This is identical to the experience on DGX datacenter systems, which means code and containers developed on the Spark can deploy directly to larger NVIDIA infrastructure.

What Works Well

7. ARM Architecture — What Works, What Doesn't

The DGX Spark uses ARM Cortex cores rather than x86 (Intel/AMD). This is the same approach Apple took with M-series chips, and it has real implications:

Advantages

Challenges

🔑 The CUDA Advantage Despite ARM challenges, the DGX Spark's killer feature is full CUDA support. Apple's Metal and AMD's ROCm have matured significantly, but nearly 20 years of CUDA ecosystem means most AI code works out of the box — the ARM CPU is largely invisible to GPU workloads.

8. Pricing & Availability

Pricing

OEM Partners

Where to Buy

The Founder's Edition was sold directly through NVIDIA's website to reservation holders. OEM versions are available through:

Bundles

NVIDIA offers a copper DAC cable bundle for connecting two Sparks. Individual QSFP56 cables are available separately. Some OEM partners offer bundles with monitors or peripherals.

9. Who Is the DGX Spark For?

✅ Great For

  • AI developers who need CUDA and large memory on their desk
  • Researchers prototyping with 70B+ parameter models
  • Small businesses wanting local AI without cloud costs
  • Data scientists building RAG pipelines with large context windows
  • Executives evaluating AI capabilities hands-on
  • Teams who need a shared inference server (batch mode is excellent)
  • NVIDIA ecosystem developers who want code portability to datacenter DGX

❌ Not Ideal For

  • Gamers — No gaming GPU drivers, limited display support
  • Production inference at scale — 3 tps on 70B is too slow
  • Training large models — Memory bandwidth is the bottleneck
  • Budget-conscious hobbyists — A Mac Mini M4 Pro at $1,999 runs smaller models well
  • People who need x86 compatibility — ARM ecosystem is still maturing

The Register's review summarized it perfectly: the DGX Spark is "the AI equivalent of a pickup truck" — it's not the fastest at any one thing, but it can haul loads that nothing else in its class can handle.

10. Competitors

Feature DGX Spark Mac Studio M4 Ultra DIY 4× RTX 3090 Cloud (A100)
Price$3,999$3,999-$5,999~$3,500-$4,300~$2-4/hr
Memory128 GB unified192 GB unified96 GB VRAM80 GB HBM2e
Mem Bandwidth273 GB/s800 GB/s~3,700 GB/s total2,039 GB/s
CUDA✅ Native❌ Metal only✅ Native✅ Native
Max Model Size~200B (FP4)~300B (FP4)~65B (FP16)~65B (FP16)
Power~170W~120W~1,400WN/A
Form FactorMini PCMini desktopFull tower/rackCloud
NoiseLow (stable)Very lowLoudN/A
Multi-node2-node 200 GbpsPossible (complex)Multi-GPU easy

DGX Spark vs Mac Studio M4 Ultra

Apple's M4 Ultra offers more memory (up to 192 GB), 3× higher bandwidth (800 GB/s), and whisper-quiet operation. It's the better choice if you don't need CUDA. But if your workflow depends on CUDA, TensorRT, or NVIDIA-specific tools — and most AI development still does — the Spark is the only desktop option with full Blackwell compatibility. The Mac also doesn't cluster; the Spark can scale to two nodes.

DGX Spark vs DIY Multi-GPU Rig

A 4× RTX 3090 rig gives you ~96 GB of VRAM with ~3,700 GB/s aggregate bandwidth — dramatically faster for inference. But it costs similar money, draws 8-10× more power, generates significant heat/noise, and requires complex multi-GPU software configuration. The Spark is plug-and-play; the rig is a project.

DGX Spark vs Cloud

An A100 on AWS/GCP costs $2-4/hour. At 8 hours/day of usage, that's $500-1,000/month. The DGX Spark pays for itself in 4-8 months of moderate use. Plus, your data stays local — no egress costs, no latency, no compliance concerns.

11. Community Reception

Hacker News

Simon Willison's post "Nvidia DGX Spark: great hardware, early days for the ecosystem" hit 189 points and 111 comments on HN. The consensus: the hardware is impressive, but the ARM + NVIDIA software ecosystem needs time to mature. Many commenters noted that Ollama and SGLang work well, but edge cases and less-common tools may need workarounds.

Reddit (r/LocalLLaMA)

The LocalLLaMA community has been actively benchmarking DGX Sparks. Common themes:

Industry Reviews

12. The Verdict — Does It Deliver?

Does the DGX Spark deliver on NVIDIA's promise of a "personal AI supercomputer"? Yes, with caveats.

The 128 GB of unified memory is genuinely groundbreaking for a desktop device. No other CUDA-compatible system lets you load and run 200B parameter models on your desk. The build quality is exceptional, the form factor is tiny, and the power efficiency is remarkable. The DGX Playbooks and software stack provide a real on-ramp to AI development.

But the "supercomputer" label stretches the truth. The 273 GB/s memory bandwidth means large models run at a crawl — usable for prototyping, not for production. The 1 PFLOP claim requires sparse FP4, which few real workloads use. And at $3,999, you're paying a premium for NVIDIA's ecosystem and the Blackwell architecture.

Our Recommendation

✅ Bottom Line The DGX Spark is the first desktop device where you can ollama run llama3.1:70b on a box the size of a tissue box. That alone makes it significant. It won't replace your GPU rig or cloud instances, but it fills a gap that nothing else in the CUDA ecosystem does.

References

  1. NVIDIA, "DGX Spark Product Page," nvidia.com.
  2. LMSYS Org, "NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference," lmsys.org, October 2025.
  3. Tom's Hardware, "Nvidia DGX Spark Review," tomshardware.com, January 2026.
  4. The Register, "DGX Spark, Nvidia's Tiniest Supercomputer," theregister.com, October 2025.
  5. ServeTheHome, "NVIDIA DGX Spark Review," servethehome.com, October 2025.
  6. Robert McDermott, "NVIDIA's DGX Spark: Mini AI Supercomputer Overview and Review," medium.com, October 2025.
  7. Simon Willison, "Nvidia DGX Spark: Great Hardware, Early Days for the Ecosystem," simonwillison.net, October 2025.
  8. Signal65, "NVIDIA DGX Spark First Look," signal65.com, October 2025.
  9. TWOWIN Technology, "NVIDIA DGX Spark Performance Evaluation and Analysis," twowintech.com, October 2025.
  10. IntuitionLabs, "NVIDIA DGX Spark Review: Pros, Cons & Performance Benchmarks," intuitionlabs.ai, October 2025.
  11. NVIDIA Developer Forums, "DGX Spark Power Clarification," forums.developer.nvidia.com, October 2025.
  12. NVIDIA Developer Forums, "Suggestions for Reducing Idle Power Consumption," forums.developer.nvidia.com, October 2025.
  13. r/LocalLLaMA, "DGX Spark Review with Benchmark," reddit.com, October 2025.
  14. AI Multiple Research, "DGX Spark vs Mac Studio & Halo: Benchmarks & Alternatives," aimultiple.com.

💬 Comments

This article was written collaboratively by Michel (human) and Yaneth (AI agent) as part of ThinkSmart.Life's research initiative. Prices reflect February 2026 market conditions and may fluctuate.

🛡️ No Third-Party Tracking