External RTX 4070 Super on Unraid via OCuLink

I’ve been wanting to run larger AI models locally on my Unraid server — but I was totally out of PCIe space, and I didn’t want to rebuild my entire setup. Full-length GPUs also don’t fit well in most server cases. So I decided to try something different:

Run a full-size GPU in an external enclosure and connect it to my Unraid server using an OCuLink PCIe extension

And you know what? It worked. Really well. No BIOS hacks, no PCIe bifurcation, no soldering — just hardware and Docker containers.

Here’s how I added an external NVIDIA RTX 4070 Super to Unraid 7.2, and got it working with Ollama + Open-WebUI to run local AI models like LLaMA 3, Phi-3, and more — all accelerated by the GPU.

Hardware I Used

Server OS: Unraid 7.2
Platform: Intel system with a free PCIe slot
GPU: NVIDIA RTX 4070 Super (12GB)

External GPU enclosure

https://www.amazon.com/dp/B0F9FBN5P5

PCIe 4.0 x4 OCuLink eGPU enclosure that converts back to a standard PCIe x16 slot. It fits full-length GPUs and includes cooling fans, but requires your own ATX or SFX power supply.

PCIe to OCuLink adapter (server side)

https://a.co/d/5Wp3mEz

This card exposes a true external OCuLink port directly from a PCIe 4.0 x4 slot. No lane splitting, no PCIe bifurcation, no BIOS changes.

Other components

Shielded SFF-8612 OCuLink cable (short as possible)
ATX power supply for the enclosure (Corsair 650W)
NVIDIA Driver plugin for Unraid

Why This Works

This setup works because it’s just PCIe — extended outside the case.

The PCIe adapter exposes a single PCIe Gen4 x4 link over OCuLink. The enclosure converts that back into a PCIe x16 slot for the GPU.

From Unraid’s perspective, the GPU looks like it’s installed directly on the motherboard.

This avoids the sketchy solder mods and PCIe bifurcation hacks you’ll see mentioned in other OCuLink builds.

What I’m Using It For

This external RTX 4070 Super is dedicated to local AI workloads.

Ollama for running local models
Open-WebUI as a browser-based chat interface

Models I’m currently running:

LLaMA 3 (8B)
Phi-3
Qwen
Smaller vision and utility models

Performance has been excellent. Depending on the model and quantization, I’m seeing roughly 50–120 tokens/sec, which is more than enough for everyday use.

It’s basically my own local ChatGPT — private, offline, and fully under my control.

Getting It Working in Unraid

After everything was connected and powered on, the setup was simple.

Verify GPU detection:

lspci | grep -i nvidia

Install the NVIDIA Driver plugin from Community Apps and reboot.

Confirm the GPU is available:

nvidia-smi

Docker Setup

Ollama

docker run -d \
  --name=ollama \
  --gpus=all \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

Open-WebUI

docker run -d \
  --name=open-webui \
  -p 3000:3000 \
  --add-host=host.docker.internal:host-gateway \
  --restart unless-stopped \
  ghcr.io/ollama-webui/ollama-webui:main

Access the UI at:

http://<unraid-ip>:3000

Final Thoughts

This ended up being far easier than I expected.

No PCIe bifurcation
No BIOS hacks
No soldering
No custom drivers

If you’re out of PCIe slots and want GPU power for AI, transcoding, or compute workloads, OCuLink is absolutely worth a look. With the right parts, it really does just work.