1. Introduction
In January 2025, Jensen Huang took the stage at CES and announced "Project DIGITS" — a Grace Blackwell–powered AI supercomputer that would fit on your desk. By March at GTC, it was renamed the DGX Spark and positioned as the entry point to NVIDIA's DGX ecosystem. After months of anticipation, the Founder's Edition shipped in October 2025 at $3,999.
The pitch is compelling: 128 GB of unified memory, a Blackwell GPU with full CUDA support, 1 petaFLOP of AI compute, all in a champagne-gold box smaller than a Mac Mini. But the reality is more nuanced. The memory bandwidth is a fraction of discrete GPUs, the 1 PFLOP claim relies on sparse FP4, and the ARM CPU means not everything "just works" yet.
This guide covers everything: the real specs behind the marketing, actual benchmark numbers from LMSYS and others, power consumption, the software ecosystem, who should buy one, and how it stacks up against the Mac Studio, DIY GPU rigs, and cloud alternatives.
2. What Is the DGX Spark?
The DGX Spark is NVIDIA's first desktop-class AI workstation built on the Grace Blackwell architecture. At its core is the GB10 Superchip, a system-on-chip co-developed with MediaTek that combines a 20-core ARM CPU with a Blackwell GPU on TSMC's 3nm process.
Unlike traditional GPU workstations where the CPU and GPU have separate memory pools, the DGX Spark uses a unified memory architecture — both processors share 128 GB of LPDDR5x RAM. This means models load directly into a single address space without the overhead of system-to-VRAM transfers, and you can run models far larger than any consumer GPU's VRAM allows.
The machine measures just 150 × 150 × 50.5 mm (roughly 6 inches square) and weighs about 1.2 kg. It's powered by a 240W external USB-C power brick. The design pays homage to the original DGX-1 that Jensen hand-delivered to OpenAI in 2016 — NVIDIA even recreated that moment by delivering a Spark to Elon Musk at launch.
Key Connectivity
- 4× USB-C (one for power, three for peripherals/displays via DisplayPort Alt Mode)
- 1× HDMI 2.1A for display output
- 1× 10 GbE RJ-45 Ethernet (Realtek)
- 2× QSFP56 200 Gbps ports (ConnectX-7 NIC) — connect two Sparks together for distributed inference
- WiFi 7 and Bluetooth 5.4
The dual QSFP ports are the secret weapon: two DGX Sparks connected via a copper DAC cable can operate as a mini-cluster with 256 GB unified memory, running models up to 405B parameters in FP4. NVIDIA officially supports two-node clusters, though technically nothing stops you from going further.
3. Full Specifications
| Component | Specification |
|---|---|
| SoC | NVIDIA GB10 Grace Blackwell Superchip (TSMC 3nm) |
| CPU | 20-core ARM: 10× Cortex-X925 (4 GHz) + 10× Cortex-A725 (2.8 GHz) |
| GPU Architecture | Blackwell (same as RTX 50-series) |
| CUDA Cores | 6,144 |
| Tensor Cores | 192 (5th-gen) |
| RT Cores | 48 (4th-gen) |
| AI Compute | 1 PFLOP sparse FP4 / ~500 TFLOPS dense FP4 |
| Memory | 128 GB unified LPDDR5x @ 8533 MT/s |
| Memory Bus | 256-bit |
| Memory Bandwidth | 273 GB/s |
| Storage | 4 TB NVMe SSD (Founder's Edition) |
| Networking | 10 GbE + 2× ConnectX-7 200 Gbps QSFP56 |
| Wireless | WiFi 7, Bluetooth 5.4 |
| USB | 4× USB-C 3.2 (20 Gbps), 1 for PD |
| Display | HDMI 2.1A + USB-C DisplayPort Alt Mode |
| NVENC / NVDEC | 1× / 1× |
| Dimensions | 150 × 150 × 50.5 mm |
| Weight | ~1.2 kg |
| Peak Power | 240W (USB-C PD) |
| OS | DGX OS (Ubuntu 24.04 + NVIDIA drivers) |
4. Real-World Benchmarks
The most comprehensive benchmarks come from LMSYS (the SGLang team), who tested extensively with both SGLang and Ollama across multiple model sizes. The key takeaway: the DGX Spark shines on small-to-medium models, but memory bandwidth limits large model performance significantly.
SGLang Benchmarks (FP8)
| Model | Batch | Prefill (tps) | Decode (tps) | Notes |
|---|---|---|---|---|
| Llama 3.1 8B | 1 | 7,991 | 20.5 | Excellent single-user speed |
| Llama 3.1 8B | 32 | 7,949 | 368 | Linear batch scaling — impressive |
| DeepSeek-R1 14B | 8 | 2,074 | 83.5 | Sustained without thermal throttle |
| Gemma 3 27B | 1 | ~3,500 | ~10 | Usable for prototyping |
| Qwen 3 32B | 1 | ~3,200 | ~9 | Similar to Gemma 27B |
| Llama 3.1 70B | 1 | 803 | 2.7 | Loads, but very slow decode |
Ollama Benchmarks
| Model | Quant | Prefill (tps) | Decode (tps) |
|---|---|---|---|
| GPT-OSS 20B | MXFP4 | 2,053 | 49.7 |
| GPT-OSS 120B | MXFP4 | ~350 | ~5 |
| Llama 3.1 8B | q8_0 | ~4,500 | ~30 |
| Llama 3.1 70B | q4_K_M | ~700 | ~3 |
| DeepSeek-R1 14B | q8_0 | ~2,800 | ~22 |
Comparison Context
For context, running GPT-OSS 20B (MXFP4) on an RTX 5090 yields ~8,519 tps prefill / 205 tps decode — roughly 4× faster than the Spark. The RTX PRO 6000 Blackwell hits 10,108 / 215 tps. But neither of those GPUs can load a 70B or 120B model into their 32 GB / 96 GB of VRAM respectively. The Spark's 128 GB of unified memory is its competitive moat.
5. Energy Consumption & Power Costs
One of the DGX Spark's strongest advantages is its power efficiency. The system uses a 240W USB-C power adapter, but real-world measurements paint a compelling picture:
| State | Power Draw | Notes |
|---|---|---|
| Idle | 40-45W | Higher than typical ARM devices due to Blackwell GPU + ConnectX-7 |
| CPU Only Load | 120-130W | All 20 ARM cores active |
| Heavy AI Load | ~170W | Typical during inference (Signal65 measured) |
| Peak Measured | ~200W | ServeTheHome couldn't reach the 240W rated max |
| GB10 SoC TDP | 140W | CPU + GPU combined |
Annual Electricity Cost Comparison
Assuming 24/7 operation at average load, using $0.15/kWh (US average):
| System | Avg. Power | Annual Cost |
|---|---|---|
| DGX Spark (idle/light) | ~50W | ~$66/yr |
| DGX Spark (inference) | ~170W | ~$223/yr |
| Mac Studio M4 Ultra | ~60-120W | ~$79-$158/yr |
| Single RTX 5090 desktop | ~400-575W | ~$526-$756/yr |
| 4× RTX 3090 rig | ~1,200-1,600W | ~$1,577-$2,102/yr |
6. Software Stack
The DGX Spark runs DGX OS, which is Ubuntu 24.04 with NVIDIA's drivers, CUDA toolkit, and AI software pre-installed. Setup is straightforward — plug in a keyboard, mouse, and monitor for desktop mode, or power on without peripherals for headless mode (accessible via WiFi or Ethernet).
What Comes Pre-installed
- CUDA 12.x — Full CUDA runtime with Blackwell support
- cuDNN — Accelerated deep learning primitives
- TensorRT — Optimized inference engine
- NVIDIA Container Toolkit — Docker/Podman GPU passthrough
- NVIDIA AI Enterprise software suite
- DGX Playbooks — Pre-built workflows for LLM serving, fine-tuning, image generation, multi-agent systems, model quantization, and more
Container Support
Full Docker and Podman support with GPU passthrough. NVIDIA's NGC (NVIDIA GPU Cloud) container registry provides optimized containers for PyTorch, TensorFlow, Triton Inference Server, and more. This is identical to the experience on DGX datacenter systems, which means code and containers developed on the Spark can deploy directly to larger NVIDIA infrastructure.
What Works Well
- SGLang, vLLM, Ollama — major inference frameworks
- PyTorch, TensorFlow — native CUDA support
- Hugging Face Transformers — full ecosystem
- NVIDIA NIM (NVIDIA Inference Microservices)
- Fine-tuning with LoRA/QLoRA for models up to ~70B
7. ARM Architecture — What Works, What Doesn't
The DGX Spark uses ARM Cortex cores rather than x86 (Intel/AMD). This is the same approach Apple took with M-series chips, and it has real implications:
Advantages
- Power efficiency — ARM's big.LITTLE design (10 performance + 10 efficiency cores) means the CPU draws minimal power during light workloads
- Unified memory — No PCIe bus bottleneck between CPU and GPU memory
- Single-core performance — Cortex-X925 cores are competitive with Apple M4 cores
Challenges
- Software compatibility — Not all Linux packages have ARM builds. You may encounter issues with older or niche tools
- Docker images — Many Docker images are x86-only. You'll need ARM-compatible images or multi-arch builds
- Compilation — Some projects assume x86 and may need patches or cross-compilation
- Early ecosystem — Simon Willison's review (featured on Hacker News with 189 points) called it "great hardware, early days for the ecosystem"
8. Pricing & Availability
Pricing
- NVIDIA Founder's Edition: $3,999 — 4 TB NVMe, gold metal chassis (limited run)
- OEM partner versions: Starting at ~$2,999-$3,000 — may have less storage, standard chassis
OEM Partners
- Acer — Veriton GN100 AI Mini Workstation
- ASUS — Ascent GX10
- Dell — Pro Max with GB10
- MSI — EdgeXpert MS-C931
- Lenovo, HP — Various configurations
Where to Buy
The Founder's Edition was sold directly through NVIDIA's website to reservation holders. OEM versions are available through:
- NVIDIA Marketplace (direct)
- Amazon (select OEM models)
- Micro Center (in-store availability varies)
- Best Buy (limited stock)
- OEM direct stores (Dell.com, Acer.com, ASUS.com, etc.)
Bundles
NVIDIA offers a copper DAC cable bundle for connecting two Sparks. Individual QSFP56 cables are available separately. Some OEM partners offer bundles with monitors or peripherals.
9. Who Is the DGX Spark For?
✅ Great For
- AI developers who need CUDA and large memory on their desk
- Researchers prototyping with 70B+ parameter models
- Small businesses wanting local AI without cloud costs
- Data scientists building RAG pipelines with large context windows
- Executives evaluating AI capabilities hands-on
- Teams who need a shared inference server (batch mode is excellent)
- NVIDIA ecosystem developers who want code portability to datacenter DGX
❌ Not Ideal For
- Gamers — No gaming GPU drivers, limited display support
- Production inference at scale — 3 tps on 70B is too slow
- Training large models — Memory bandwidth is the bottleneck
- Budget-conscious hobbyists — A Mac Mini M4 Pro at $1,999 runs smaller models well
- People who need x86 compatibility — ARM ecosystem is still maturing
The Register's review summarized it perfectly: the DGX Spark is "the AI equivalent of a pickup truck" — it's not the fastest at any one thing, but it can haul loads that nothing else in its class can handle.
10. Competitors
| Feature | DGX Spark | Mac Studio M4 Ultra | DIY 4× RTX 3090 | Cloud (A100) |
|---|---|---|---|---|
| Price | $3,999 | $3,999-$5,999 | ~$3,500-$4,300 | ~$2-4/hr |
| Memory | 128 GB unified | 192 GB unified | 96 GB VRAM | 80 GB HBM2e |
| Mem Bandwidth | 273 GB/s | 800 GB/s | ~3,700 GB/s total | 2,039 GB/s |
| CUDA | ✅ Native | ❌ Metal only | ✅ Native | ✅ Native |
| Max Model Size | ~200B (FP4) | ~300B (FP4) | ~65B (FP16) | ~65B (FP16) |
| Power | ~170W | ~120W | ~1,400W | N/A |
| Form Factor | Mini PC | Mini desktop | Full tower/rack | Cloud |
| Noise | Low (stable) | Very low | Loud | N/A |
| Multi-node | 2-node 200 Gbps | ❌ | Possible (complex) | Multi-GPU easy |
DGX Spark vs Mac Studio M4 Ultra
Apple's M4 Ultra offers more memory (up to 192 GB), 3× higher bandwidth (800 GB/s), and whisper-quiet operation. It's the better choice if you don't need CUDA. But if your workflow depends on CUDA, TensorRT, or NVIDIA-specific tools — and most AI development still does — the Spark is the only desktop option with full Blackwell compatibility. The Mac also doesn't cluster; the Spark can scale to two nodes.
DGX Spark vs DIY Multi-GPU Rig
A 4× RTX 3090 rig gives you ~96 GB of VRAM with ~3,700 GB/s aggregate bandwidth — dramatically faster for inference. But it costs similar money, draws 8-10× more power, generates significant heat/noise, and requires complex multi-GPU software configuration. The Spark is plug-and-play; the rig is a project.
DGX Spark vs Cloud
An A100 on AWS/GCP costs $2-4/hour. At 8 hours/day of usage, that's $500-1,000/month. The DGX Spark pays for itself in 4-8 months of moderate use. Plus, your data stays local — no egress costs, no latency, no compliance concerns.
11. Community Reception
Hacker News
Simon Willison's post "Nvidia DGX Spark: great hardware, early days for the ecosystem" hit 189 points and 111 comments on HN. The consensus: the hardware is impressive, but the ARM + NVIDIA software ecosystem needs time to mature. Many commenters noted that Ollama and SGLang work well, but edge cases and less-common tools may need workarounds.
Reddit (r/LocalLLaMA)
The LocalLLaMA community has been actively benchmarking DGX Sparks. Common themes:
- Memory capacity is the star feature — running 70B models locally is a game-changer
- Memory bandwidth (273 GB/s) is the main disappointment vs expectations
- Comparison to AMD Ryzen AI Max 395 (ROCm) is a hot topic
- The ConnectX-7 networking is seen as a unique differentiator
- Price increase from $2,999 to $3,999 for Founder's Edition frustrated some early reservers
Industry Reviews
- Tom's Hardware: "Fast and fun AI toolbox that beats out AMD's Ryzen AI Max+ 395" — positive on CUDA advantage
- The Register: "The AI equivalent of a pickup truck" — practical and versatile
- ServeTheHome: "This is so freaking cool" — especially impressed by the ConnectX-7 networking and build quality
- LMSYS: Thorough benchmarks showing excellent batch scaling but bandwidth-limited decode
12. The Verdict — Does It Deliver?
Does the DGX Spark deliver on NVIDIA's promise of a "personal AI supercomputer"? Yes, with caveats.
The 128 GB of unified memory is genuinely groundbreaking for a desktop device. No other CUDA-compatible system lets you load and run 200B parameter models on your desk. The build quality is exceptional, the form factor is tiny, and the power efficiency is remarkable. The DGX Playbooks and software stack provide a real on-ramp to AI development.
But the "supercomputer" label stretches the truth. The 273 GB/s memory bandwidth means large models run at a crawl — usable for prototyping, not for production. The 1 PFLOP claim requires sparse FP4, which few real workloads use. And at $3,999, you're paying a premium for NVIDIA's ecosystem and the Blackwell architecture.
Our Recommendation
- Buy it if you're a developer or researcher who needs CUDA + large memory on your desk, and you value code portability to NVIDIA datacenter infrastructure
- Wait for OEM versions if you want the same GB10 hardware at $2,999-$3,000 without the gold chassis
- Consider Mac Studio if you don't need CUDA — Apple offers more bandwidth and memory at similar prices
- Build a GPU rig if raw inference speed matters more than model size capacity
ollama run llama3.1:70b on a box the size of a tissue box. That alone makes it significant. It won't replace your GPU rig or cloud instances, but it fills a gap that nothing else in the CUDA ecosystem does.
References
- NVIDIA, "DGX Spark Product Page," nvidia.com.
- LMSYS Org, "NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference," lmsys.org, October 2025.
- Tom's Hardware, "Nvidia DGX Spark Review," tomshardware.com, January 2026.
- The Register, "DGX Spark, Nvidia's Tiniest Supercomputer," theregister.com, October 2025.
- ServeTheHome, "NVIDIA DGX Spark Review," servethehome.com, October 2025.
- Robert McDermott, "NVIDIA's DGX Spark: Mini AI Supercomputer Overview and Review," medium.com, October 2025.
- Simon Willison, "Nvidia DGX Spark: Great Hardware, Early Days for the Ecosystem," simonwillison.net, October 2025.
- Signal65, "NVIDIA DGX Spark First Look," signal65.com, October 2025.
- TWOWIN Technology, "NVIDIA DGX Spark Performance Evaluation and Analysis," twowintech.com, October 2025.
- IntuitionLabs, "NVIDIA DGX Spark Review: Pros, Cons & Performance Benchmarks," intuitionlabs.ai, October 2025.
- NVIDIA Developer Forums, "DGX Spark Power Clarification," forums.developer.nvidia.com, October 2025.
- NVIDIA Developer Forums, "Suggestions for Reducing Idle Power Consumption," forums.developer.nvidia.com, October 2025.
- r/LocalLLaMA, "DGX Spark Review with Benchmark," reddit.com, October 2025.
- AI Multiple Research, "DGX Spark vs Mac Studio & Halo: Benchmarks & Alternatives," aimultiple.com.
This article was written collaboratively by Michel (human) and Yaneth (AI agent) as part of ThinkSmart.Life's research initiative. Prices reflect February 2026 market conditions and may fluctuate.
💬 Comments