Research Hardware Comparison

DGX Spark vs DIY 4×RTX 3090 Rig: Which AI Setup Wins?

You have ~$4,000 to spend on local AI. NVIDIA's sleek mini-PC promises 1 petaFLOP in a Mac Mini form factor. A DIY rig promises raw GPU horsepower and total control. We dug into the benchmarks, the power bills, and the gotchas — here's which one to buy.

Michel Lacle & Yaneth | ThinkSmart.Life Research

February 20, 2026 · min read

🎧 Listen

📺 Watch the video version: DGX Spark vs DIY 3090 Rig — Video Summary

1. Two Paths to Local AI — Same Budget, Very Different Machines

You've decided to run AI models locally. No more cloud bills, no more rate limits, no more sending sensitive data to someone else's servers. You have roughly $4,000 to spend, and in early 2026 two radically different paths sit at the same price point.

Path A is the NVIDIA DGX Spark ($3,999) — a sealed, Mac Mini-sized box powered by the Grace Blackwell GB10 chip with 128 GB of unified memory. Plug it in, power it on, run 200B-parameter models. One petaFLOP of AI compute in a silent golden cube.

Path B is a DIY 4×RTX 3090 rig (~$3,600) — an open-air mining frame stuffed with four used RTX 3090 GPUs, each with 24 GB of VRAM, totaling 96 GB of dedicated GPU memory. It's loud, it's ugly, and it draws 1,400 watts at full tilt. But it has raw CUDA horsepower that embarrasses the Spark on models that fit in VRAM.

This post puts them head-to-head across 13 categories with real benchmark data. By the end, you'll know exactly which one to buy — and it depends entirely on what you plan to run.

2. Side-by-Side Specs

Spec	DGX Spark ($3,999)	DIY 4×RTX 3090 (~$3,600)
Processor	Grace Blackwell GB10 (ARM)	Intel Celeron G5905 (x86)
GPU	Blackwell GPU (integrated)	4× NVIDIA RTX 3090 (discrete)
Memory	128 GB unified LPDDR5x	96 GB VRAM (4×24 GB GDDR6X) + 16 GB DDR4 system
Memory Bandwidth	~273 GB/s (unified)	~936 GB/s each = 3,744 GB/s aggregate VRAM
Storage	4 TB NVMe SSD	1 TB NVMe SSD (expandable)
PCIe	N/A (SoC)	PCIe 3.0 ×1 via USB risers
Peak AI Performance	1 PFLOP (FP4 sparse)	~284 TFLOPS (FP16) across 4 GPUs
TDP / Power	240W PSU (draws ~100-150W typical)	~1,400W at load (350W × 4 GPUs + system)
Networking	ConnectX-7 (100GbE capable)	1 GbE onboard
OS	Ubuntu (ARM64) + NVIDIA AI stack	Ubuntu (x86_64) + vLLM/Ollama
Form Factor	150×150×50 mm (Mac Mini size)	Veddha open-air frame (~60×35×35 cm)
Noise	Near-silent	Loud (4× GPU fans at load)

3. Raw Performance — Tokens Per Second

This is the category everyone cares about most. We collected benchmark data from Ollama's official DGX Spark tests, community benchmarks on GitHub, vLLM 4×3090 experiments, and Reddit comparisons. All numbers are decode speed (token generation) unless noted.

Model	DGX Spark (tok/s)	Single RTX 3090 (tok/s)	4× RTX 3090 (tok/s)	Winner
Llama 3.1 8B (Q4)	38	~112	~105	🏆 3090 (2.8×)
DeepSeek-R1 14B (Q4)	20	~55	~50	🏆 3090 (2.5×)
Gemma3 27B (Q4)	10.8	~40	~35	🏆 3090 (3.3×)
Qwen3 32B (Q4)	9.4	~35.6	~32	🏆 3090 (3.4×)
QwQ-32B (FP8, vLLM)	~12	N/A	39 output tok/s	🏆 3090 (3.3×)
Llama 3.1 70B (Q4)	4.4	N/A (OOM)	~16.9	🏆 3090 (3.8×)
GPT-oss 120B (MXFP4)	41	N/A (OOM)	~9.6	🏆 Spark (4.3×)
200B+ models	Runs (slowly)	Cannot load — exceeds 96 GB VRAM		🏆 Spark (∞)

📊 The pattern is clear: For any model that fits in 96 GB of VRAM (roughly up to 70B Q4), the 4×3090 rig is 2.5–3.8× faster than the DGX Spark. But for models exceeding 96 GB (100B+), the Spark wins by default — the 3090 rig literally can't load them. The crossover point is around 70–80B parameters at Q4 quantization.

Why is the Spark slower on smaller models? Its 128 GB of unified LPDDR5x memory has a bandwidth of ~273 GB/s. Each RTX 3090 has ~936 GB/s of GDDR6X bandwidth. LLM inference is memory-bandwidth-bound during token generation — the GPU that can read weights faster generates tokens faster. Four 3090s have 13.7× more aggregate memory bandwidth than the Spark.

John Carmack publicly noted that the DGX Spark appears to max out at ~100W of actual power draw — about half its 240W rating — delivering roughly half the advertised 1 PFLOP performance. NVIDIA has acknowledged this and pointed to firmware updates, but as of early 2026 the real-world FP4 throughput is closer to ~480 TFLOPS.

4. Memory & Maximum Model Size

DGX Spark: 128 GB Unified

The Spark's 128 GB is shared between CPU and GPU with no copy overhead. NVIDIA states ~120 GB is available for models after OS/framework overhead. This means you can load:

200B parameter models at FP4/MXFP4 quantization (~50 GB)
120B models at FP8 (~60 GB)
70B models at FP16 (~140 GB — doesn't fit; needs Q8 or lower)
405B models — too large even at aggressive quantization

DIY 4×3090: 96 GB Dedicated VRAM

Four RTX 3090s provide 96 GB of dedicated GDDR6X VRAM. With tensor parallelism across 4 GPUs:

70B parameter models at Q4 quantization (~40 GB) — fits easily, runs fast
70B at FP16 (~140 GB) — does not fit
120B at Q4 (~60 GB) — fits but tight; limited KV cache
200B+ models — cannot load

⚠️ The PCIe bottleneck trap: The budget 3090 build uses PCIe 3.0 ×1 USB risers — just ~1 GB/s per GPU. For inference where weights stay in VRAM, this is usually fine. But tensor parallelism requires GPUs to communicate, and that happens over PCIe. The ~1 GB/s link becomes a bottleneck for multi-GPU workloads. The Pro tier build (ROMED8-2T, PCIe 4.0 ×16) avoids this — see our Budget vs Pro comparison.

5. Energy Cost

Metric	DGX Spark	DIY 4×3090
Power at idle	~30W	~120W (4 GPUs idle)
Power at load	~100-150W (observed)	~1,400-1,550W
Monthly cost (8h/day load)	$2.88-$4.32	$40.32-$44.64
Monthly cost (24/7 load)	$8.64-$12.96	$120.96-$133.92
Annual cost (8h/day)	$34.56-$51.84	$483.84-$535.68
Annual cost (24/7)	$103.68-$155.52	$1,451.52-$1,607.04

Using $0.12/kWh US average. The DGX Spark sips power — roughly 10× less than the 4×3090 rig under load. Running the 3090 rig 24/7 adds $1,300-$1,600/year in electricity. That's nearly half the cost of the rig itself every year.

However: you can power-limit each 3090 to ~220W (from 350W) and lose only ~10% performance, dropping the rig to ~930W at load and saving ~30% on electricity. The vLLM benchmark data shows 220W is the sweet spot for efficiency.

6. Noise & Physical Footprint

DGX Spark

150×150×50 mm. Fits on your desk next to your monitor. Near-silent operation — early reviewers describe it as inaudible under normal workloads. TechRadar called it "gorgeous engineering."

DIY 4×3090

A Veddha 8-GPU open-air mining frame is roughly 60×35×35 cm — about the size of a small nightstand. It's an exposed skeleton of aluminum with four large graphics cards hanging off it. Under load, four 3090 fans spin up to ~2,000 RPM producing 45-55 dBA — comparable to a vacuum cleaner. It belongs in a closet, basement, or spare room — not on your desk.

🏠 Living situation matters: If you're running AI in a studio apartment, the Spark wins this category by a mile. If you have a garage, basement, or server closet, the 3090 rig's noise and size become non-issues.

7. Setup Complexity

DGX Spark: ~30 minutes

Unbox, plug in power and Ethernet, power on. Ubuntu is preinstalled with NVIDIA's AI stack (CUDA, cuDNN, TensorRT, container runtime). Install Ollama with one command. It's as close to plug-and-play as AI hardware gets.

DIY 4×3090: 1-3 days

Assemble the open-air frame. Mount the motherboard. Install CPU, RAM, SSD. Wire dual server PSUs through breakout boards. Mount 4 GPUs on risers. Install Ubuntu. Install NVIDIA drivers. Configure CUDA. Set up vLLM or Ollama with tensor parallelism. Debug PCIe riser issues. Power-limit GPUs. Configure fan curves. Set up Open WebUI for a chat interface. You'll learn a lot — but it's a weekend project minimum.

8. Upgradeability

DGX Spark: Sealed — No Upgrades

The GB10 is a system-on-chip. You cannot add RAM, swap the GPU, or expand storage (beyond external drives). What you buy is what you get. If 128 GB isn't enough in 2028, you buy a new unit.

DIY 4×3090: Fully Expandable

Add 2 more 3090s to reach 6 GPUs (144 GB VRAM) — the H510 BTC+ supports 6 slots
Swap in RTX 5090s or PRO 6000s when prices drop (though the budget board's PCIe 3.0 ×1 limits them — see our Budget vs Pro comparison)
Upgrade to the Pro tier (ROMED8-2T) for PCIe 4.0 ×16 and up to 7-13 GPUs
Add more RAM, faster storage, 10GbE networking

✅ The upgrade path is the 3090 rig's killer feature. You can start with 2 GPUs ($1,500 for cards) and scale to 6+ as your needs grow. The Spark is a fixed-performance appliance.

9. Software Compatibility

DGX Spark: ARM64 + Blackwell CUDA

Ollama — Full support, officially benchmarked
vLLM — Works, though ARM64 builds are newer and less battle-tested
SGLang — Officially supported, LMSYS published DGX Spark benchmarks with speculative decoding (2× speedup)
PyTorch/TensorFlow — Full support via NVIDIA containers
Random GitHub repos — Many x86-only scripts/tools won't work without modification

DIY 4×3090: x86_64 + Ampere CUDA

Everything works. The RTX 3090 is the most battle-tested AI GPU in existence
vLLM, Ollama, llama.cpp, exllamaV2, TabbyAPI — all first-class support
Every Docker container, every Python package, every CUDA kernel — built for x86 + Ampere first
Compute capability 8.6 — supports FP16, BF16, INT8, but no native FP4 (that's Blackwell only)

⚠️ The ARM gotcha: The DGX Spark runs ARM64 Linux. Most AI tooling works, but you'll occasionally hit x86-only binaries, incompatible conda packages, or build scripts that assume x86. Early adopters on Hacker News reported friction with the software ecosystem — Simon Willison titled his review "great hardware, early days for the ecosystem."

10. Multi-User / Serving

DGX Spark

The Spark comes with a ConnectX-7 NIC (100GbE capable, though you need the right switch). It can serve multiple users via vLLM or SGLang, but the single-stream decode speed is relatively slow (~10-40 tok/s depending on model). Batched throughput is better, but the memory bandwidth ceiling limits concurrent users.

DIY 4×3090

vLLM on 4×3090 excels at multi-user serving. The vLLM benchmarks show 353-400 tok/s total throughput on QwQ-32B with batched requests. With continuous batching, you can serve 5-10+ concurrent users at acceptable speeds. The bottleneck is the 1 GbE network on the budget build — upgrade to 10 GbE with the Pro tier.

11. Training vs Inference

DGX Spark

NVIDIA's own benchmarks show the Spark can fine-tune a 3B model via LoRA at decent speeds. For full fine-tuning of larger models (7B+), the limited compute and memory bandwidth make it impractical. It's designed as an inference and development machine, not a training rig.

DIY 4×3090

Four RTX 3090s are a legitimate training platform. Each card has 24 GB of VRAM and 35.6 TFLOPS of FP16 compute. With DeepSpeed ZeRO or FSDP, you can fine-tune 7B-13B models across 4 GPUs. The PCIe 3.0 ×1 bandwidth is a real problem for training though — gradient synchronization is constantly limited by the ~1 GB/s inter-GPU link. The Pro tier build with PCIe 4.0 ×16 is strongly recommended for training.

🎯 Training verdict: Neither is ideal for serious training. The 3090 rig on the Pro tier board is the better training platform, but even then, cloud GPUs (H100/A100) are more cost-effective for training jobs. Both setups shine at inference.

12. Resale Value

RTX 3090: Proven resale market

The RTX 3090 has held remarkably steady at $650-$800 on eBay since 2024. It's a known quantity with a massive secondary market. If you decide AI isn't for you, or you upgrade to 5090s, you can sell your 3090s and recover 70-80% of your investment.

DGX Spark: Unknown resale

The DGX Spark is a proprietary, sealed ARM device. Its resale market is tiny and uncertain. It's unlikely to hold value as well as commodity GPUs, especially once the next generation (DGX Spark 2?) arrives. First-gen proprietary hardware historically depreciates faster than commodity components.

13. Total Cost of Ownership

Let's calculate the real cost of owning each system, assuming 8 hours/day of active inference at $0.12/kWh:

Cost Component	DGX Spark	DIY 4×3090 (Budget)	DIY 4×3090 (Pro Tier)
Hardware	$3,999	$3,620	$4,300
Year 1 electricity	~$44	~$484	~$484
Year 1 total	$4,043	$4,104	$4,784
Year 2 electricity	~$44	~$484	~$484
Year 3 electricity	~$44	~$484	~$484
3-Year TCO	$4,131	$5,072	$5,752
Resale value (est.)	~$1,500	~$2,600 (GPUs)	~$3,000 (GPUs+board)
Net 3-year cost	$2,631	$2,472	$2,752

Surprising result: after 3 years with electricity and resale factored in, the budget 3090 rig is actually the cheapest option — because used 3090 GPUs hold their value so well. The Spark's electricity savings (~$1,300 over 3 years) are partially offset by its lower resale value.

If you run 24/7 instead of 8h/day, the math shifts dramatically toward the Spark — the 3090 rig's electricity cost balloons to $4,300+ over 3 years.

14. The Verdict

🟢 Buy the DGX Spark If…

You need to run models >100B parameters locally
You want silent, desk-friendly operation
Electricity cost matters (apartment, high rates)
You value plug-and-play setup
You need NVIDIA's full AI stack on ARM
You're running 24/7 and every watt counts
You're a researcher who needs Blackwell-specific features (FP4, speculative decoding)

🔵 Build the 4×3090 Rig If…

You primarily run models ≤70B (the vast majority of use cases)
Raw speed matters — 3-4× faster decode on most models
You want upgradeability (more GPUs, newer GPUs)
You serve multiple users via vLLM
You plan to fine-tune models
You have space for a loud, large rig
You want maximum resale value
You enjoy building and tinkering

⚡ Michel's take: For most people reading this, the 4×3090 rig is the better buy. The models people actually use daily — Llama 70B, Qwen 32B, DeepSeek-R1, Gemma 27B — all fit comfortably in 96 GB of VRAM and run 3-4× faster on the 3090 rig. The DGX Spark's killer feature is running 120-200B models that won't fit in 96 GB, but that's a niche use case today. If you're deciding right now: build the rig, and spend the savings on a Pro tier motherboard.

The ideal $4,300 setup? The Pro tier 4×3090 rig (ROMED8-2T + EPYC + PCIe 4.0 ×16). You get the 3090 rig's raw speed, proper PCIe bandwidth for training and future GPU upgrades, IPMI remote management, and 10GbE networking. It costs $300 more than the DGX Spark and outperforms it on every model up to 70B. Check our Budget vs Pro Tier comparison and Pro Tier shopping list for the full build guide.

References

Ollama Blog, "NVIDIA DGX Spark performance," ollama.com, October 2025.
LMSYS Org, "NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference," lmsys.org, October 2025.
r/LocalLLaMA, "Benchmarking the DGX Spark against the RTX 3090," reddit.com, October 2025.
XiongjieDai, "GPU Benchmarks on LLM Inference," github.com, 2025.
Himesh P., "VLLM Performance Benchmarks 4x RTX 3090," blogspot.com, March 2025.
IntuitionLabs, "NVIDIA DGX Spark Review: Pros, Cons & Performance Benchmarks," intuitionlabs.ai, October 2025.
NVIDIA Developer Forums, "DGX Spark Power Clarification," forums.developer.nvidia.com, October 2025.
Tom's Hardware, "John Carmack slams Nvidia's $4,000 DGX Spark," tomshardware.com, October 2025.
Simon Willison, "Nvidia DGX Spark: great hardware, early days for the ecosystem," simonwillison.net, October 2025.
Jeff Geerling, "Dell's version of the DGX Spark fixes pain points," jeffgeerling.com, 2025.
NVIDIA, "DGX Spark Hardware Overview," docs.nvidia.com.
Robert McDermott, "NVIDIA's DGX Spark: Mini AI Supercomputer overview and review," medium.com, December 2025.
ThinkSmart.Life, "Budget vs Pro Tier GPU Rig," thinksmart.life, February 2026.

💬 Comments

This article was written collaboratively by Michel (human) and Yaneth (AI agent) as part of ThinkSmart.Life's research initiative. Prices reflect February 2026 market conditions and may fluctuate.