Skip to main content

Run GPU Workloads in Docker

The recommended way to run a GPU workload on an E2E GPU node is in a container. The container ships the CUDA toolkit and framework versions your application needs; the host only provides the NVIDIA driver. This guide shows how to verify the setup and run a container with GPU access.

For host-side driver issues, see Troubleshoot GPU Nodes. For SSH access, see Connect to a Linux GPU node.


How GPU Containers Work

  • The host runs the NVIDIA datacenter driver. The driver exposes the GPU to the kernel.
  • The container ships the CUDA runtime, cuDNN, frameworks (PyTorch, TensorFlow, JAX, vLLM, TGI), and your application code.
  • The NVIDIA Container Toolkit is the glue. It tells the Docker daemon to mount the driver, devices, and required libraries into the container at runtime.

A container's CUDA can be older than the driver's supported CUDA, but it cannot be newer. Check the maximum CUDA version with nvidia-smi on the host before picking a base image.


Verify the Setup

E2E GPU images based on Ubuntu 22.04 ship Docker and the NVIDIA Container Toolkit pre-installed. Ubuntu 24.04-based images do not — install them first using the steps in the Install the NVIDIA Container Toolkit section below. After SSH login, run:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

A healthy run prints the same card, driver version, and CUDA version as nvidia-smi on the host — but produced from inside the container.

If the run fails with unknown flag: --gpus or could not select device driver "" with capabilities: [[gpu]], the toolkit is missing or misconfigured. See Docker Cannot Access the GPU.


Install the NVIDIA Container Toolkit

Only needed if the toolkit is missing from your image, or if you reinstalled the OS.

Ubuntu 22.04 / 24.04

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Rocky Linux 9

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
warning

nvidia-docker2 is deprecated and is not compatible with Docker Engine 25 or later. Use nvidia-container-toolkit and the --gpus flag on all current GPU nodes.


Common Run Patterns

Single GPU, Interactive Shell

docker run --rm -it --gpus all \
-v $PWD:/workspace -w /workspace \
pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash

Inside the container:

python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"

Specific GPU(s) on a Multi-Card Node

# Only card 0
docker run --rm --gpus '"device=0"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

# Cards 0 and 1
docker run --rm --gpus '"device=0,1"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Long-Running Service with Restart Policy

docker run -d --name inference \
--gpus all \
--restart unless-stopped \
-p 127.0.0.1:8080:8080 \
-v /data/models:/models \
your-image:tag

Bind the published port to 127.0.0.1 and tunnel over SSH if you want browser access, instead of opening the port in the security group.


Picking a Base Image

Use caseRecommended base
Lightweight verificationnvidia/cuda:12.4.1-base-ubuntu22.04
CUDA developmentnvidia/cuda:12.4.1-devel-ubuntu22.04 (includes nvcc)
PyTorch trainingpytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
TensorFlow trainingtensorflow/tensorflow:2.16.1-gpu
vLLM inferencevllm/vllm-openai:latest
Text Generation Inferenceghcr.io/huggingface/text-generation-inference:latest

Pin to a specific tag (not latest) for reproducibility once your workload is stable.


Storage Considerations

GPU container images are large — PyTorch and TensorFlow images are 5–10 GB; vLLM and TGI add model weights on top. The root disk fills fast.

  • Mount large datasets and model weights from an attached block storage volume or Parallel File Storage, not the root disk.
  • Bake your application image once and reuse it. See Bake and Reuse a GPU Image.
  • Run docker system prune -af --volumes to reclaim space from stopped containers and dangling images.

ResourceUse it for
Troubleshoot GPU NodesDriver and CUDA issues, container-toolkit failures.
Bake and Reuse a GPU ImageSkip multi-GB first-boot installs on new nodes.
Serve LLM InferenceRun vLLM or TGI on a GPU node.
Block StorageAttach additional volumes for datasets and model artifacts.
Connect to a Linux GPU nodeSSH and verify the driver.
Last updated on May 26, 2026.