I’ve been wanting to run larger AI models locally on my Unraid server — but I was totally out of PCIe space, and I didn’t want to rebuild my entire setup. Full-length GPUs also don’t fit well in most server cases. So I decided to try something different:
Run a full-size GPU in an external enclosure and connect it to my Unraid server using an OCuLink PCIe extension
And you know what? It worked. Really well. No BIOS hacks, no PCIe bifurcation, no soldering — just hardware and Docker containers.
Here’s how I added an external NVIDIA RTX 4070 Super to Unraid 7.2, and got it working with Ollama + Open-WebUI to run local AI models like LLaMA 3, Phi-3, and more — all accelerated by the GPU.
Hardware I Used
- Server OS: Unraid 7.2
- Platform: Intel system with a free PCIe slot
- GPU: NVIDIA RTX 4070 Super (12GB)
External GPU enclosure
PCIe 4.0 x4 OCuLink eGPU enclosure that converts back to a standard PCIe x16 slot. It fits full-length GPUs and includes cooling fans, but requires your own ATX or SFX power supply.
PCIe to OCuLink adapter (server side)
This card exposes a true external OCuLink port directly from a PCIe 4.0 x4 slot. No lane splitting, no PCIe bifurcation, no BIOS changes.
Other components
- Shielded SFF-8612 OCuLink cable (short as possible)
- ATX power supply for the enclosure (Corsair 650W)
- NVIDIA Driver plugin for Unraid
Why This Works
This setup works because it’s just PCIe — extended outside the case.
The PCIe adapter exposes a single PCIe Gen4 x4 link over OCuLink. The enclosure converts that back into a PCIe x16 slot for the GPU.
From Unraid’s perspective, the GPU looks like it’s installed directly on the motherboard.
This avoids the sketchy solder mods and PCIe bifurcation hacks you’ll see mentioned in other OCuLink builds.
What I’m Using It For
This external RTX 4070 Super is dedicated to local AI workloads.
- Ollama for running local models
- Open-WebUI as a browser-based chat interface
Models I’m currently running:
- LLaMA 3 (8B)
- Phi-3
- Qwen
- Smaller vision and utility models
Performance has been excellent. Depending on the model and quantization, I’m seeing roughly 50–120 tokens/sec, which is more than enough for everyday use.
It’s basically my own local ChatGPT — private, offline, and fully under my control.
Getting It Working in Unraid
After everything was connected and powered on, the setup was simple.
Verify GPU detection:
lspci | grep -i nvidia
Install the NVIDIA Driver plugin from Community Apps and reboot.
Confirm the GPU is available:
nvidia-smi
Docker Setup
Ollama
docker run -d \
--name=ollama \
--gpus=all \
-p 11434:11434 \
-v ollama:/root/.ollama \
ollama/ollama
Open-WebUI
docker run -d \
--name=open-webui \
-p 3000:3000 \
--add-host=host.docker.internal:host-gateway \
--restart unless-stopped \
ghcr.io/ollama-webui/ollama-webui:main
Access the UI at:
http://<unraid-ip>:3000
Final Thoughts
This ended up being far easier than I expected.
- No PCIe bifurcation
- No BIOS hacks
- No soldering
- No custom drivers
If you’re out of PCIe slots and want GPU power for AI, transcoding, or compute workloads, OCuLink is absolutely worth a look. With the right parts, it really does just work.