1. Why Build a GPU Rig?
If you want to run large AI models locally — LLaMA 70B, Stable Diffusion XL, video generation, fine-tuning — you need VRAM. Lots of it. A single consumer GPU tops out at 24 GB, but a four-GPU rig gives you 96 GB of combined VRAM for a fraction of cloud GPU costs.
Here's what a $5,000 multi-GPU rig unlocks:
- AI Inference — Run 70B parameter models at full quality with vLLM or llama.cpp
- Fine-Tuning — LoRA and QLoRA training on your own data, no cloud bills
- Image & Video Generation — Stable Diffusion, ComfyUI, Wan2.1 at maximum resolution
- 3D Rendering — Blender, Unreal Engine, with GPU-accelerated rendering
- No Monthly Bills — Pay once, run forever. A $5K rig pays for itself in 3-6 months vs. cloud GPU rental
2. The Build Philosophy
This build follows one principle: start with 4 GPUs, expand to 8 without replacing anything.
Every component — the frame, motherboard, power supply — is chosen with headroom. When you're ready to scale, you just buy more GPUs and plug them in. No new frame, no new motherboard, no new PSU.
- Frame: 8-GPU open-air frame (start with 4 installed)
- Motherboard: Multi-GPU board with 6 PCIe 3.0 slots (start with 4 connected)
- PSU: 1600W server supply (handles 4 GPUs now, 6-8 later with a second PSU)
- GPUs: RTX 3090 — 24 GB VRAM each, the best dollar-per-VRAM card available
3. 🛒 Complete Shopping List
Click any "Buy on Amazon" button to go directly to the product page. Prices reflect February 2026 market conditions and may fluctuate.
| Component | Product | Qty | Price Each | Total | Buy |
|---|---|---|---|---|---|
| GPU | NVIDIA RTX 3090 24GB GDDR6X EVGA FTW3 Ultra or equivalent |
4 | $750 | $3,000 | Buy on Amazon |
| Frame | Kingwin 8-GPU Open Air Mining Frame Aluminum, stackable, fits 8 full-length GPUs |
1 | $70 | $70 | Buy on Amazon |
| Motherboard | ASUS Prime Z390-P (6× PCIe 3.0, LGA 1151, DDR4) Intel Z390 chipset, 6 PCIe 3.0 slots, supports up to 64GB DDR4 |
1 | $196 | $196 | Buy on Amazon |
| CPU | Intel Celeron G4900 (LGA 1151) 2-core, 3.1GHz — enough to feed data to GPUs, has integrated graphics |
1 | $42 | $42 | Buy on Amazon |
| RAM | Crucial 16GB DDR4 3200MHz CT16G4DFRA32A — fast DDR4 for smooth model loading (expandable to 64GB) |
1 | $32 | $32 | Buy on Amazon |
| Power Supply | HP 1200W Server PSU (DPS-1200FB A) Server-grade, 94% efficient, ultra reliable |
2 | $35 | $70 | Buy on Amazon |
| Breakout Board | Server PSU Breakout Board (17× 6-pin) Converts server PSU to GPU power connectors |
2 | $22 | $44 | Buy on Amazon |
| PCIe Risers | USB 3.0 PCIe Riser (VER 009S) — 6 Pack 16X to 1X adapter with USB cable & power |
1 | $30 | $30 | Buy on Amazon |
| SSD | Kingston NV2 500GB M.2 NVMe SSD M.2 NVMe — up to 3,500 MB/s, no cables, plugs directly into motherboard |
1 | $35 | $35 | Buy on Amazon |
| Power Cables | 6-Pin to 8-Pin (6+2) PCIe Power Cable (12 Pack) Connects breakout board to GPUs |
1 | $25 | $25 | Buy on Amazon |
| Cooling Fans | 120mm Case Fan (3-Pack) Mount on frame for extra airflow across GPUs |
1 | $15 | $15 | Buy on Amazon |
| Accessories | Zip ties, thermal paste, power strip, ethernet cable The finishing touches |
1 | $30 | $30 | — |
| TOTAL (4× RTX 3090 Build) | ~$3,381 | ||||
Budget Options & Upgrades
| If you want… | Swap to… | Impact |
|---|---|---|
| Cheaper GPUs | RTX 3080 10GB (~$350 used) | Total drops to ~$2,000 but only 40GB VRAM |
| Newer GPUs | RTX 4070 Ti Super (~$800) | 16GB VRAM each, better perf/watt, but less VRAM |
| Maximum VRAM | RTX 3090 Ti (~$900 used) | Same 24GB but faster memory, slight premium |
| Better mobo | ASUS B250 Mining Expert | 19 PCIe slots, more stable, ~$200-300 |
| Single ATX PSU | EVGA SuperNOVA 1600W | Simpler wiring, more expensive (~$300) |
4. Why Each Part Was Chosen
GPUs: RTX 3090 — The VRAM King
The RTX 3090 packs 24 GB of GDDR6X with 936 GB/s memory bandwidth. For AI inference, VRAM is the bottleneck, not compute. Four RTX 3090s give you 96 GB — enough to run LLaMA 3 70B in FP16, Mixtral 8×7B, or multiple Stable Diffusion instances simultaneously. At $700-750 used on Amazon (prices crashed in mid-2025 after the 5000-series launch), it's the best VRAM-per-dollar card on the market.
Frame: Kingwin KC-8GPU
Open-air frames solve the thermal problem. Eight GPUs in a closed case would thermal throttle instantly. The Kingwin KC-8GPU is a premium aluminum frame from an established PC hardware brand — stackable design, room for 8 full-length triple-fan GPUs with proper spacing, and fan mounts included. At $70, it's the best value 8-GPU frame on Amazon.
Motherboard: ASUS Prime Z390-P
The ASUS Prime Z390-P hits the sweet spot — 6 PCIe 3.0 x16 slots for proper GPU bandwidth, full-size DDR4 RAM expandable up to 64GB for future fine-tuning, and USB 3.1 Gen2. The Intel Z390 chipset supports 8th/9th gen Intel CPUs. Paired with a cheap Intel Celeron G4900 CPU — you don't need a powerful processor for GPU inference since the CPU just feeds data to the GPUs. PCIe 3.0 ensures fast model loading and data transfer between CPU and GPUs, unlike cheaper boards stuck on PCIe 2.0.
Power: Dual Server PSUs + Breakout Boards
Server PSUs are dirt cheap ($35 each used), ultra-reliable (designed for 24/7 data center operation), and 94%+ efficient. Two HP DPS-1200FB units give you 2,400W of clean power. Each breakout board converts the server PSU's proprietary connector into 17× standard 6-pin PCIe power outputs. Way cheaper and more reliable than a single $300 ATX PSU.
PCIe Risers
USB-style PCIe risers connect each GPU to the motherboard via a short USB 3.0 cable. This lets you space GPUs across the open frame instead of cramming them into adjacent PCIe slots. Each riser converts a 1X PCIe slot into a 16X-compatible connector. For inference (not NVLink), this is plenty of bandwidth.
SSD & RAM
A 500GB M.2 NVMe SSD plugs directly into the motherboard — no cables needed — and delivers up to 3,500 MB/s reads (6× faster than SATA). That means faster model loading from disk to GPU. Plenty of space for Ubuntu, CUDA toolkit, and a few models. Store larger models on a NAS or external drive. 16GB DDR4 RAM handles model loading and data preprocessing smoothly — the heavy lifting happens in GPU VRAM, but adequate system RAM prevents bottlenecks.
5. Assembly Tips
- Build the frame first. Assemble the Kingwin aluminum frame following the included instructions (20-30 minutes).
- Install the motherboard. Secure the ASUS Prime Z390-P to the frame's motherboard tray. Install the CPU, RAM, and connect the SSD.
- Attach breakout boards. Mount one on each side of the frame. Plug the server PSUs into the breakout boards.
- Install PCIe risers. Plug the risers into the motherboard's PCIe slots. Route USB cables to GPU positions.
- Mount GPUs. Slide each RTX 3090 into a frame slot. Connect the riser's USB cable, then power cables from the breakout board (2× 8-pin per GPU).
- Connect power. Wire the PSU power switch. Plug both server PSUs into a heavy-duty power strip (20A circuit recommended).
- Boot and test. Connect a monitor via VGA/HDMI, boot from USB with Ubuntu installer.
6. Software Setup — From Bare Metal to Running AI
Here's how to go from a pile of hardware to a fully working AI inference server. You'll need a USB drive (8GB+), a monitor, and a keyboard for initial setup — after that, everything is done remotely from your laptop.
Step 1: Create a Bootable USB
Download Ubuntu Server 22.04 LTS (the ISO file, ~2GB). Then flash it to a USB drive:
- Windows: Use Rufus — select the ISO, select your USB drive, click Start
- Mac: Use Balena Etcher — drag the ISO in, select USB, click Flash
- Linux:
sudo dd if=ubuntu-22.04-server.iso of=/dev/sdX bs=4M status=progress
Step 2: Install Ubuntu
- Plug the USB into your GPU rig, connect a monitor and keyboard
- Power on and press F2/DEL/F12 to enter BIOS — set USB as first boot device
- Follow the Ubuntu installer — use defaults for most options
- Important: Enable OpenSSH server when prompted (this lets you connect remotely)
- Set a username and password you'll remember
- Install completes → remove USB → reboot
After reboot, note the IP address shown on screen (e.g., 192.168.1.50). You can now unplug the monitor and keyboard — everything from here is done remotely.
Step 3: Connect via SSH
From your laptop (Mac/Linux terminal or Windows PowerShell):
# Connect to your GPU rig
ssh your-username@192.168.1.50
# You're now inside your GPU rig's terminal!
Tip: For easy access, add this to your laptop's ~/.ssh/config:
Host gpurig
HostName 192.168.1.50
User your-username
Now you can just type ssh gpurig to connect.
Step 4: Install NVIDIA Drivers + CUDA
# Update the system
sudo apt update && sudo apt upgrade -y
# Install NVIDIA drivers and CUDA
sudo apt install -y nvidia-driver-535 nvidia-cuda-toolkit
# Reboot to load the drivers
sudo reboot
After reboot, SSH back in and verify your GPUs are detected:
nvidia-smi
You should see all 4 GPUs listed with their temperature, memory, and utilization. If you see all 4 — you're golden! 🎉
Step 5: Install AI Frameworks
# Install Python and pip
sudo apt install -y python3-pip python3-venv
# Create a virtual environment
python3 -m venv ~/ai-env
source ~/ai-env/bin/activate
# vLLM — the fastest LLM inference engine (serves models like an API)
pip install vllm
# Ollama — the easiest way to download and run models
curl -fsSL https://ollama.com/install.sh | sh
# llama.cpp — lightweight, efficient, runs quantized models
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make LLAMA_CUDA=1
Step 6: Run Your First Model
Option A: Ollama (easiest — one command)
# Download and run Llama 3 70B
ollama run llama3:70b
# Chat with it right in your terminal!
# Type your question and press Enter
Option B: vLLM (best for serving as an API)
# Serve a 70B model across all 4 GPUs
vllm serve meta-llama/Llama-3-70B --tensor-parallel-size 4
# Now accessible at http://192.168.1.50:8000
# Compatible with the OpenAI API format!
Option C: llama.cpp (best for quantized/GGUF models)
# Run a quantized 70B model
./llama-server -m llama-3-70b-Q4_K_M.gguf -ngl 99 --split-mode layer
# Web interface at http://192.168.1.50:8080
Step 7: Add a Chat Interface (Optional)
Want a ChatGPT-like web interface? Install Open WebUI:
# Install with Docker
sudo apt install -y docker.io
sudo docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--add-host=host.docker.internal:host-gateway \
--name open-webui ghcr.io/open-webui/open-webui:main
# Open http://192.168.1.50:3000 in your browser
# You now have your own private ChatGPT! 🚀
Step 8: Connect OpenClaw (Optional)
You can point OpenClaw at your local GPU rig instead of paying for cloud AI:
# In your OpenClaw config, set the model endpoint to your rig:
# baseUrl: http://192.168.1.50:8000/v1
# Now your AI agent runs on YOUR hardware — no API costs!
Daily Usage
Once everything is set up, your daily workflow is simple:
- SSH in:
ssh gpurig - Start your model:
ollama serveorvllm serve ... - Use it: Open the web interface or hit the API from any app
- Leave it running: Use
tmuxorscreento keep models running after you disconnect
Pro tip: Set up a systemd service to auto-start your model on boot — then your GPU rig is always ready, 24/7.
7. Expansion Path
The beauty of this build: expanding is trivial. Here's the upgrade path:
| Stage | GPUs | Total VRAM | Additional Cost | What It Unlocks |
|---|---|---|---|---|
| Base Build | 4× RTX 3090 | 96 GB | $3,380 | 70B models, Stable Diffusion, fine-tuning |
| +2 GPUs | 6× RTX 3090 | 144 GB | +$1,500 | 120B models, batch inference, multiple models |
| +4 GPUs | 8× RTX 3090 | 192 GB | +$3,000 | Llama 3 405B quantized, massive throughput |
To add GPUs, you literally just:
- Buy more RTX 3090s + risers
- Slide them into empty frame slots
- Connect power cables and USB risers
- Reboot —
nvidia-smisees them automatically
8. What You Can Run
| Model | Size | VRAM Needed | 4× 3090 (96GB) |
|---|---|---|---|
| Llama 3 8B (FP16) | 16 GB | ~16 GB | ✅ Runs on 1 GPU |
| Llama 3 70B (FP16) | 140 GB | ~140 GB | ⚠️ Q4 quantized fits easily |
| Llama 3 70B (Q4_K_M) | 40 GB | ~42 GB | ✅ Across 2 GPUs |
| Mixtral 8×7B | 93 GB | ~93 GB | ✅ Across all 4 GPUs |
| Gemma 3 27B (QAT int4) | 14 GB | ~14 GB | ✅ Runs on 1 GPU |
| Stable Diffusion XL | 6.5 GB | ~8 GB | ✅ 4 instances simultaneously |
| DeepSeek-V2 (Q4) | ~66 GB | ~70 GB | ✅ Across 3 GPUs |
9. How People Build on X
The multi-GPU rig community is thriving. Here's what builders are sharing:
- 5× RTX 3090 builds — Users on r/LocalLLaMA report running 5 RTX 3090s on consumer hardware with open-air frames, serving LLMs 24/7 for their teams. Total build cost: ~$4,500.
- Gemma 3 27B on desktop GPUs — Google's QAT-optimized int4 models slashed VRAM from 54GB to 14.1GB, making the 27B model accessible on a single RTX 3090 via Ollama.
- 3× RTX 3090 rigs — Popular entry point for LocalLLaMA builders who start with 3 cards and expand. Running Llama 70B quantized across all three GPUs with impressive tokens/second.
- "The 3090 is the Honda Civic of AI GPUs" — Community consensus: reliable, cheap, massive aftermarket, and 24GB of VRAM that aged beautifully as model quantization improved.
10. Total Cost Breakdown
The GPU is 86% of the cost. Everything else is commodity parts that you could replace for $50-100 each. This is why the "start small, expand later" strategy works — the expensive part (GPUs) is modular.
Software cost: $0. Ubuntu, CUDA toolkit, vLLM, llama.cpp, Ollama, and ComfyUI are all free and open source.
References
- Best Value GPU, "RTX 3090 Price Tracker US — February 2026," bestvaluegpu.com.
- r/LocalLLaMA, "RTX 3090 prices crashed and are back to baseline," reddit.com, June 2025.
- r/LocalLLaMA, "5x RTX 3090 GPU rig built on mostly used consumer hardware," reddit.com, August 2024.
- Google AI Developers, "Run Gemma 3 27B on your desktop GPU," x.com, April 2025.
- r/LocalLLaMA, "Dual 3090 Build," reddit.com, June 2024.
- r/LocalLLaMA, "New build: 3x RTX 3090," reddit.com, May 2024.
- NVIDIA, "CUDA Toolkit Documentation," nvidia.com.
- vLLM Project, "Easy, fast, and cheap LLM serving," github.com.
- llama.cpp, "Port of Facebook's LLaMA model in C/C++," github.com.
- Amazon, "EVGA GeForce RTX 3090 FTW3 Ultra Gaming," amazon.com.
- Amazon, "Kingwin 8-GPU Miner Rig Case Frame," amazon.com.
- Amazon, "ASUS Prime Z390-P LGA1151 6× PCIe 3.0 Motherboard," amazon.com.
This article was written collaboratively by Michel (human) and Yaneth (AI agent) as part of ThinkSmart.Life's research initiative. Prices reflect February 2026 market conditions and may fluctuate — always check current Amazon listings.
💬 Comments