🎧 Listen
📺 Watch the video version: DGX Spark vs DIY 3090 Rig — Video Summary

1. Two Paths to Local AI — Same Budget, Very Different Machines

You've decided to run AI models locally. No more cloud bills, no more rate limits, no more sending sensitive data to someone else's servers. You have roughly $4,000 to spend, and in early 2026 two radically different paths sit at the same price point.

Path A is the NVIDIA DGX Spark ($3,999) — a sealed, Mac Mini-sized box powered by the Grace Blackwell GB10 chip with 128 GB of unified memory. Plug it in, power it on, run 200B-parameter models. One petaFLOP of AI compute in a silent golden cube.

Path B is a DIY 4×RTX 3090 rig (~$3,600) — an open-air mining frame stuffed with four used RTX 3090 GPUs, each with 24 GB of VRAM, totaling 96 GB of dedicated GPU memory. It's loud, it's ugly, and it draws 1,400 watts at full tilt. But it has raw CUDA horsepower that embarrasses the Spark on models that fit in VRAM.

This post puts them head-to-head across 13 categories with real benchmark data. By the end, you'll know exactly which one to buy — and it depends entirely on what you plan to run.

2. Side-by-Side Specs

Spec DGX Spark ($3,999) DIY 4×RTX 3090 (~$3,600)
ProcessorGrace Blackwell GB10 (ARM)Intel Celeron G5905 (x86)
GPUBlackwell GPU (integrated)4× NVIDIA RTX 3090 (discrete)
Memory128 GB unified LPDDR5x96 GB VRAM (4×24 GB GDDR6X) + 16 GB DDR4 system
Memory Bandwidth~273 GB/s (unified)~936 GB/s each = 3,744 GB/s aggregate VRAM
Storage4 TB NVMe SSD1 TB NVMe SSD (expandable)
PCIeN/A (SoC)PCIe 3.0 ×1 via USB risers
Peak AI Performance1 PFLOP (FP4 sparse)~284 TFLOPS (FP16) across 4 GPUs
TDP / Power240W PSU (draws ~100-150W typical)~1,400W at load (350W × 4 GPUs + system)
NetworkingConnectX-7 (100GbE capable)1 GbE onboard
OSUbuntu (ARM64) + NVIDIA AI stackUbuntu (x86_64) + vLLM/Ollama
Form Factor150×150×50 mm (Mac Mini size)Veddha open-air frame (~60×35×35 cm)
NoiseNear-silentLoud (4× GPU fans at load)

3. Raw Performance — Tokens Per Second

This is the category everyone cares about most. We collected benchmark data from Ollama's official DGX Spark tests, community benchmarks on GitHub, vLLM 4×3090 experiments, and Reddit comparisons. All numbers are decode speed (token generation) unless noted.

Model DGX Spark (tok/s) Single RTX 3090 (tok/s) 4× RTX 3090 (tok/s) Winner
Llama 3.1 8B (Q4)38~112~105🏆 3090 (2.8×)
DeepSeek-R1 14B (Q4)20~55~50🏆 3090 (2.5×)
Gemma3 27B (Q4)10.8~40~35🏆 3090 (3.3×)
Qwen3 32B (Q4)9.4~35.6~32🏆 3090 (3.4×)
QwQ-32B (FP8, vLLM)~12N/A39 output tok/s🏆 3090 (3.3×)
Llama 3.1 70B (Q4)4.4N/A (OOM)~16.9🏆 3090 (3.8×)
GPT-oss 120B (MXFP4)41N/A (OOM)~9.6🏆 Spark (4.3×)
200B+ modelsRuns (slowly)Cannot load — exceeds 96 GB VRAM🏆 Spark (∞)
📊 The pattern is clear: For any model that fits in 96 GB of VRAM (roughly up to 70B Q4), the 4×3090 rig is 2.5–3.8× faster than the DGX Spark. But for models exceeding 96 GB (100B+), the Spark wins by default — the 3090 rig literally can't load them. The crossover point is around 70–80B parameters at Q4 quantization.

Why is the Spark slower on smaller models? Its 128 GB of unified LPDDR5x memory has a bandwidth of ~273 GB/s. Each RTX 3090 has ~936 GB/s of GDDR6X bandwidth. LLM inference is memory-bandwidth-bound during token generation — the GPU that can read weights faster generates tokens faster. Four 3090s have 13.7× more aggregate memory bandwidth than the Spark.

John Carmack publicly noted that the DGX Spark appears to max out at ~100W of actual power draw — about half its 240W rating — delivering roughly half the advertised 1 PFLOP performance. NVIDIA has acknowledged this and pointed to firmware updates, but as of early 2026 the real-world FP4 throughput is closer to ~480 TFLOPS.

4. Memory & Maximum Model Size

DGX Spark: 128 GB Unified

The Spark's 128 GB is shared between CPU and GPU with no copy overhead. NVIDIA states ~120 GB is available for models after OS/framework overhead. This means you can load:

DIY 4×3090: 96 GB Dedicated VRAM

Four RTX 3090s provide 96 GB of dedicated GDDR6X VRAM. With tensor parallelism across 4 GPUs:

⚠️ The PCIe bottleneck trap: The budget 3090 build uses PCIe 3.0 ×1 USB risers — just ~1 GB/s per GPU. For inference where weights stay in VRAM, this is usually fine. But tensor parallelism requires GPUs to communicate, and that happens over PCIe. The ~1 GB/s link becomes a bottleneck for multi-GPU workloads. The Pro tier build (ROMED8-2T, PCIe 4.0 ×16) avoids this — see our Budget vs Pro comparison.

5. Energy Cost

Metric DGX Spark DIY 4×3090
Power at idle~30W~120W (4 GPUs idle)
Power at load~100-150W (observed)~1,400-1,550W
Monthly cost (8h/day load)$2.88-$4.32$40.32-$44.64
Monthly cost (24/7 load)$8.64-$12.96$120.96-$133.92
Annual cost (8h/day)$34.56-$51.84$483.84-$535.68
Annual cost (24/7)$103.68-$155.52$1,451.52-$1,607.04

Using $0.12/kWh US average. The DGX Spark sips power — roughly 10× less than the 4×3090 rig under load. Running the 3090 rig 24/7 adds $1,300-$1,600/year in electricity. That's nearly half the cost of the rig itself every year.

However: you can power-limit each 3090 to ~220W (from 350W) and lose only ~10% performance, dropping the rig to ~930W at load and saving ~30% on electricity. The vLLM benchmark data shows 220W is the sweet spot for efficiency.

6. Noise & Physical Footprint

DGX Spark

150×150×50 mm. Fits on your desk next to your monitor. Near-silent operation — early reviewers describe it as inaudible under normal workloads. TechRadar called it "gorgeous engineering."

DIY 4×3090

A Veddha 8-GPU open-air mining frame is roughly 60×35×35 cm — about the size of a small nightstand. It's an exposed skeleton of aluminum with four large graphics cards hanging off it. Under load, four 3090 fans spin up to ~2,000 RPM producing 45-55 dBA — comparable to a vacuum cleaner. It belongs in a closet, basement, or spare room — not on your desk.

🏠 Living situation matters: If you're running AI in a studio apartment, the Spark wins this category by a mile. If you have a garage, basement, or server closet, the 3090 rig's noise and size become non-issues.

7. Setup Complexity

DGX Spark: ~30 minutes

Unbox, plug in power and Ethernet, power on. Ubuntu is preinstalled with NVIDIA's AI stack (CUDA, cuDNN, TensorRT, container runtime). Install Ollama with one command. It's as close to plug-and-play as AI hardware gets.

DIY 4×3090: 1-3 days

Assemble the open-air frame. Mount the motherboard. Install CPU, RAM, SSD. Wire dual server PSUs through breakout boards. Mount 4 GPUs on risers. Install Ubuntu. Install NVIDIA drivers. Configure CUDA. Set up vLLM or Ollama with tensor parallelism. Debug PCIe riser issues. Power-limit GPUs. Configure fan curves. Set up Open WebUI for a chat interface. You'll learn a lot — but it's a weekend project minimum.

8. Upgradeability

DGX Spark: Sealed — No Upgrades

The GB10 is a system-on-chip. You cannot add RAM, swap the GPU, or expand storage (beyond external drives). What you buy is what you get. If 128 GB isn't enough in 2028, you buy a new unit.

DIY 4×3090: Fully Expandable

✅ The upgrade path is the 3090 rig's killer feature. You can start with 2 GPUs ($1,500 for cards) and scale to 6+ as your needs grow. The Spark is a fixed-performance appliance.

9. Software Compatibility

DGX Spark: ARM64 + Blackwell CUDA

DIY 4×3090: x86_64 + Ampere CUDA

⚠️ The ARM gotcha: The DGX Spark runs ARM64 Linux. Most AI tooling works, but you'll occasionally hit x86-only binaries, incompatible conda packages, or build scripts that assume x86. Early adopters on Hacker News reported friction with the software ecosystem — Simon Willison titled his review "great hardware, early days for the ecosystem."

10. Multi-User / Serving

DGX Spark

The Spark comes with a ConnectX-7 NIC (100GbE capable, though you need the right switch). It can serve multiple users via vLLM or SGLang, but the single-stream decode speed is relatively slow (~10-40 tok/s depending on model). Batched throughput is better, but the memory bandwidth ceiling limits concurrent users.

DIY 4×3090

vLLM on 4×3090 excels at multi-user serving. The vLLM benchmarks show 353-400 tok/s total throughput on QwQ-32B with batched requests. With continuous batching, you can serve 5-10+ concurrent users at acceptable speeds. The bottleneck is the 1 GbE network on the budget build — upgrade to 10 GbE with the Pro tier.

11. Training vs Inference

DGX Spark

NVIDIA's own benchmarks show the Spark can fine-tune a 3B model via LoRA at decent speeds. For full fine-tuning of larger models (7B+), the limited compute and memory bandwidth make it impractical. It's designed as an inference and development machine, not a training rig.

DIY 4×3090

Four RTX 3090s are a legitimate training platform. Each card has 24 GB of VRAM and 35.6 TFLOPS of FP16 compute. With DeepSpeed ZeRO or FSDP, you can fine-tune 7B-13B models across 4 GPUs. The PCIe 3.0 ×1 bandwidth is a real problem for training though — gradient synchronization is constantly limited by the ~1 GB/s inter-GPU link. The Pro tier build with PCIe 4.0 ×16 is strongly recommended for training.

🎯 Training verdict: Neither is ideal for serious training. The 3090 rig on the Pro tier board is the better training platform, but even then, cloud GPUs (H100/A100) are more cost-effective for training jobs. Both setups shine at inference.

12. Resale Value

RTX 3090: Proven resale market

The RTX 3090 has held remarkably steady at $650-$800 on eBay since 2024. It's a known quantity with a massive secondary market. If you decide AI isn't for you, or you upgrade to 5090s, you can sell your 3090s and recover 70-80% of your investment.

DGX Spark: Unknown resale

The DGX Spark is a proprietary, sealed ARM device. Its resale market is tiny and uncertain. It's unlikely to hold value as well as commodity GPUs, especially once the next generation (DGX Spark 2?) arrives. First-gen proprietary hardware historically depreciates faster than commodity components.

13. Total Cost of Ownership

Let's calculate the real cost of owning each system, assuming 8 hours/day of active inference at $0.12/kWh:

Cost Component DGX Spark DIY 4×3090 (Budget) DIY 4×3090 (Pro Tier)
Hardware$3,999$3,620$4,300
Year 1 electricity~$44~$484~$484
Year 1 total$4,043$4,104$4,784
Year 2 electricity~$44~$484~$484
Year 3 electricity~$44~$484~$484
3-Year TCO$4,131$5,072$5,752
Resale value (est.)~$1,500~$2,600 (GPUs)~$3,000 (GPUs+board)
Net 3-year cost$2,631$2,472$2,752

Surprising result: after 3 years with electricity and resale factored in, the budget 3090 rig is actually the cheapest option — because used 3090 GPUs hold their value so well. The Spark's electricity savings (~$1,300 over 3 years) are partially offset by its lower resale value.

If you run 24/7 instead of 8h/day, the math shifts dramatically toward the Spark — the 3090 rig's electricity cost balloons to $4,300+ over 3 years.

14. The Verdict

🟢 Buy the DGX Spark If…

  • You need to run models >100B parameters locally
  • You want silent, desk-friendly operation
  • Electricity cost matters (apartment, high rates)
  • You value plug-and-play setup
  • You need NVIDIA's full AI stack on ARM
  • You're running 24/7 and every watt counts
  • You're a researcher who needs Blackwell-specific features (FP4, speculative decoding)

🔵 Build the 4×3090 Rig If…

  • You primarily run models ≤70B (the vast majority of use cases)
  • Raw speed matters — 3-4× faster decode on most models
  • You want upgradeability (more GPUs, newer GPUs)
  • You serve multiple users via vLLM
  • You plan to fine-tune models
  • You have space for a loud, large rig
  • You want maximum resale value
  • You enjoy building and tinkering
⚡ Michel's take: For most people reading this, the 4×3090 rig is the better buy. The models people actually use daily — Llama 70B, Qwen 32B, DeepSeek-R1, Gemma 27B — all fit comfortably in 96 GB of VRAM and run 3-4× faster on the 3090 rig. The DGX Spark's killer feature is running 120-200B models that won't fit in 96 GB, but that's a niche use case today. If you're deciding right now: build the rig, and spend the savings on a Pro tier motherboard.

The ideal $4,300 setup? The Pro tier 4×3090 rig (ROMED8-2T + EPYC + PCIe 4.0 ×16). You get the 3090 rig's raw speed, proper PCIe bandwidth for training and future GPU upgrades, IPMI remote management, and 10GbE networking. It costs $300 more than the DGX Spark and outperforms it on every model up to 70B. Check our Budget vs Pro Tier comparison and Pro Tier shopping list for the full build guide.

References

  1. Ollama Blog, "NVIDIA DGX Spark performance," ollama.com, October 2025.
  2. LMSYS Org, "NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference," lmsys.org, October 2025.
  3. r/LocalLLaMA, "Benchmarking the DGX Spark against the RTX 3090," reddit.com, October 2025.
  4. XiongjieDai, "GPU Benchmarks on LLM Inference," github.com, 2025.
  5. Himesh P., "VLLM Performance Benchmarks 4x RTX 3090," blogspot.com, March 2025.
  6. IntuitionLabs, "NVIDIA DGX Spark Review: Pros, Cons & Performance Benchmarks," intuitionlabs.ai, October 2025.
  7. NVIDIA Developer Forums, "DGX Spark Power Clarification," forums.developer.nvidia.com, October 2025.
  8. Tom's Hardware, "John Carmack slams Nvidia's $4,000 DGX Spark," tomshardware.com, October 2025.
  9. Simon Willison, "Nvidia DGX Spark: great hardware, early days for the ecosystem," simonwillison.net, October 2025.
  10. Jeff Geerling, "Dell's version of the DGX Spark fixes pain points," jeffgeerling.com, 2025.
  11. NVIDIA, "DGX Spark Hardware Overview," docs.nvidia.com.
  12. Robert McDermott, "NVIDIA's DGX Spark: Mini AI Supercomputer overview and review," medium.com, December 2025.
  13. ThinkSmart.Life, "Budget vs Pro Tier GPU Rig," thinksmart.life, February 2026.

💬 Comments

This article was written collaboratively by Michel (human) and Yaneth (AI agent) as part of ThinkSmart.Life's research initiative. Prices reflect February 2026 market conditions and may fluctuate.

🛡️ No Third-Party Tracking