Run GPU Workloads in Docker
The recommended way to run a GPU workload on an E2E GPU node is in a container. The container ships the CUDA toolkit and framework versions your application needs; the host only provides the NVIDIA driver. This guide shows how to verify the setup and run a container with GPU access.
For host-side driver issues, see Troubleshoot GPU Nodes. For SSH access, see Connect to a Linux GPU node.
How GPU Containers Work
- The host runs the NVIDIA datacenter driver. The driver exposes the GPU to the kernel.
- The container ships the CUDA runtime, cuDNN, frameworks (PyTorch, TensorFlow, JAX, vLLM, TGI), and your application code.
- The NVIDIA Container Toolkit is the glue. It tells the Docker daemon to mount the driver, devices, and required libraries into the container at runtime.
A container's CUDA can be older than the driver's supported CUDA, but it cannot be newer. Check the maximum CUDA version with nvidia-smi on the host before picking a base image.
Verify the Setup
E2E GPU images based on Ubuntu 22.04 ship Docker and the NVIDIA Container Toolkit pre-installed. Ubuntu 24.04-based images do not — install them first using the steps in the Install the NVIDIA Container Toolkit section below. After SSH login, run:
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
A healthy run prints the same card, driver version, and CUDA version as nvidia-smi on the host — but produced from inside the container.
If the run fails with unknown flag: --gpus or could not select device driver "" with capabilities: [[gpu]], the toolkit is missing or misconfigured. See Docker Cannot Access the GPU.
Install the NVIDIA Container Toolkit
Only needed if the toolkit is missing from your image, or if you reinstalled the OS.
Ubuntu 22.04 / 24.04
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Rocky Linux 9
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-docker2 is deprecated and is not compatible with Docker Engine 25 or later. Use nvidia-container-toolkit and the --gpus flag on all current GPU nodes.
Common Run Patterns
Single GPU, Interactive Shell
docker run --rm -it --gpus all \
-v $PWD:/workspace -w /workspace \
pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash
Inside the container:
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"
Specific GPU(s) on a Multi-Card Node
# Only card 0
docker run --rm --gpus '"device=0"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
# Cards 0 and 1
docker run --rm --gpus '"device=0,1"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Long-Running Service with Restart Policy
docker run -d --name inference \
--gpus all \
--restart unless-stopped \
-p 127.0.0.1:8080:8080 \
-v /data/models:/models \
your-image:tag
Bind the published port to 127.0.0.1 and tunnel over SSH if you want browser access, instead of opening the port in the security group.
Picking a Base Image
| Use case | Recommended base |
|---|---|
| Lightweight verification | nvidia/cuda:12.4.1-base-ubuntu22.04 |
| CUDA development | nvidia/cuda:12.4.1-devel-ubuntu22.04 (includes nvcc) |
| PyTorch training | pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime |
| TensorFlow training | tensorflow/tensorflow:2.16.1-gpu |
| vLLM inference | vllm/vllm-openai:latest |
| Text Generation Inference | ghcr.io/huggingface/text-generation-inference:latest |
Pin to a specific tag (not latest) for reproducibility once your workload is stable.
Storage Considerations
GPU container images are large — PyTorch and TensorFlow images are 5–10 GB; vLLM and TGI add model weights on top. The root disk fills fast.
- Mount large datasets and model weights from an attached block storage volume or Parallel File Storage, not the root disk.
- Bake your application image once and reuse it. See Bake and Reuse a GPU Image.
- Run
docker system prune -af --volumesto reclaim space from stopped containers and dangling images.
Related Resources
| Resource | Use it for |
|---|---|
| Troubleshoot GPU Nodes | Driver and CUDA issues, container-toolkit failures. |
| Bake and Reuse a GPU Image | Skip multi-GB first-boot installs on new nodes. |
| Serve LLM Inference | Run vLLM or TGI on a GPU node. |
| Block Storage | Attach additional volumes for datasets and model artifacts. |
| Connect to a Linux GPU node | SSH and verify the driver. |