Run GPU Workloads in Docker

The recommended way to run a GPU workload on an E2E GPU node is in a container. The container ships the CUDA toolkit and framework versions your application needs; the host only provides the NVIDIA driver. This guide shows how to verify the setup and run a container with GPU access.

For host-side driver issues, see Troubleshoot GPU Nodes. For SSH access, see Connect to a Linux GPU node.

How It Works Verify Setup Install Toolkit Run Patterns Base Image Storage

How GPU Containers Work

The host runs the NVIDIA datacenter driver. The driver exposes the GPU to the kernel.
The container ships the CUDA runtime, cuDNN, frameworks (PyTorch, TensorFlow, JAX, vLLM, TGI), and your application code.
The NVIDIA Container Toolkit is the glue. It tells the Docker daemon to mount the driver, devices, and required libraries into the container at runtime.

A container's CUDA can be older than the driver's supported CUDA, but it cannot be newer. Check the maximum CUDA version with nvidia-smi on the host before picking a base image.

Verify the Setup

E2E GPU images based on Ubuntu 22.04 ship Docker and the NVIDIA Container Toolkit pre-installed. Ubuntu 24.04-based images do not - install them first using the steps in the Install the NVIDIA Container Toolkit section below. After SSH login, run:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

A healthy run prints the same card, driver version, and CUDA version as nvidia-smi on the host - but produced from inside the container.

If the run fails with unknown flag: --gpus or could not select device driver "" with capabilities: [[gpu]], the toolkit is missing or misconfigured. See Docker Cannot Access the GPU.

Install the NVIDIA Container Toolkit

Only needed if the toolkit is missing from your image, or if you reinstalled the OS.

Ubuntu 22.04 / 24.04

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Rocky Linux 9

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

warning

nvidia-docker2 is deprecated and is not compatible with Docker Engine 25 or later. Use nvidia-container-toolkit and the --gpus flag on all current GPU nodes.

Common Run Patterns

Single GPU, Interactive Shell

docker run --rm -it --gpus all \
  -v $PWD:/workspace -w /workspace \
  pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash

Inside the container:

python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"

Specific GPU(s) on a Multi-Card Node

# Only card 0
docker run --rm --gpus '"device=0"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

# Cards 0 and 1
docker run --rm --gpus '"device=0,1"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Long-Running Service with Restart Policy

docker run -d --name inference \
  --gpus all \
  --restart unless-stopped \
  -p 127.0.0.1:8080:8080 \
  -v /data/models:/models \
  your-image:tag

Bind the published port to 127.0.0.1 and tunnel over SSH if you want browser access, instead of opening the port in the security group.

Picking a Base Image

Use case	Recommended base
Lightweight verification	`nvidia/cuda:12.4.1-base-ubuntu22.04`
CUDA development	`nvidia/cuda:12.4.1-devel-ubuntu22.04` (includes `nvcc`)
PyTorch training	`pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime`
TensorFlow training	`tensorflow/tensorflow:2.16.1-gpu`
vLLM inference	`vllm/vllm-openai:latest`
Text Generation Inference	`ghcr.io/huggingface/text-generation-inference:latest`

Pin to a specific tag (not latest) for reproducibility once your workload is stable.

Storage Considerations

GPU container images are large - PyTorch and TensorFlow images are 5–10 GB; vLLM and TGI add model weights on top. The root disk fills fast.

Mount large datasets and model weights from an attached block storage volume or Parallel File Storage, not the root disk.
Bake your application image once and reuse it. See Bake and Reuse a GPU Image.
Run docker system prune -af --volumes to reclaim space from stopped containers and dangling images.

Resource	Use it for
Troubleshoot GPU Nodes	Driver and CUDA issues, container-toolkit failures.
Bake and Reuse a GPU Image	Skip multi-GB first-boot installs on new nodes.
Serve LLM Inference	Run vLLM or TGI on a GPU node.
Block Storage	Attach additional volumes for datasets and model artifacts.
Connect to a Linux GPU node	SSH and verify the driver.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on June 26, 2026.

How GPU Containers Work​

Verify the Setup​

Install the NVIDIA Container Toolkit​

Ubuntu 22.04 / 24.04​

Rocky Linux 9​

Common Run Patterns​

Single GPU, Interactive Shell​

Specific GPU(s) on a Multi-Card Node​

Long-Running Service with Restart Policy​

Picking a Base Image​

Storage Considerations​

Related Resources​

How GPU Containers Work

Verify the Setup

Install the NVIDIA Container Toolkit

Ubuntu 22.04 / 24.04

Rocky Linux 9

Common Run Patterns

Single GPU, Interactive Shell

Specific GPU(s) on a Multi-Card Node

Long-Running Service with Restart Policy

Picking a Base Image

Storage Considerations

Related Resources