---
title: Run GPU Workloads in Docker
---

# Run GPU Workloads in Docker

The recommended way to run a GPU workload on an E2E GPU node is in a container. The container ships the CUDA toolkit and framework versions your application needs; the host only provides the NVIDIA driver. This guide shows how to verify the setup and run a container with GPU access.

For host-side driver issues, see [Troubleshoot GPU Nodes](/docs/myaccount/gpu/troubleshoot). For SSH access, see [Connect to a Linux GPU node](/docs/myaccount/gpu/connect-to-gpu/linux-gpu-node).

---

## How GPU Containers Work

- The **host** runs the NVIDIA datacenter driver. The driver exposes the GPU to the kernel.
- The **container** ships the CUDA runtime, cuDNN, frameworks (PyTorch, TensorFlow, JAX, vLLM, TGI), and your application code.
- The **NVIDIA Container Toolkit** is the glue. It tells the Docker daemon to mount the driver, devices, and required libraries into the container at runtime.

A container's CUDA can be older than the driver's supported CUDA, but it cannot be newer. Check the maximum CUDA version with `nvidia-smi` on the host before picking a base image.

---

## Verify the Setup

E2E GPU images based on Ubuntu 22.04 ship Docker and the NVIDIA Container Toolkit pre-installed. Ubuntu 24.04-based images do not — install them first using the steps in the [Install the NVIDIA Container Toolkit](#install-the-nvidia-container-toolkit) section below. After SSH login, run:

```bash
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
```

A healthy run prints the same card, driver version, and CUDA version as `nvidia-smi` on the host — but produced from inside the container.

If the run fails with `unknown flag: --gpus` or `could not select device driver "" with capabilities: [[gpu]]`, the toolkit is missing or misconfigured. See [Docker Cannot Access the GPU](/docs/myaccount/gpu/troubleshoot#docker-cannot-access-the-gpu).

---

## Install the NVIDIA Container Toolkit

Only needed if the toolkit is missing from your image, or if you reinstalled the OS.

### Ubuntu 22.04 / 24.04

```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```

### Rocky Linux 9

```bash
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```

:::warning
`nvidia-docker2` is deprecated and is not compatible with Docker Engine 25 or later. Use `nvidia-container-toolkit` and the `--gpus` flag on all current GPU nodes.
:::

---

## Common Run Patterns

### Single GPU, Interactive Shell

```bash
docker run --rm -it --gpus all \
  -v $PWD:/workspace -w /workspace \
  pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash
```

Inside the container:

```bash
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"
```

### Specific GPU(s) on a Multi-Card Node

```bash
# Only card 0
docker run --rm --gpus '"device=0"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

# Cards 0 and 1
docker run --rm --gpus '"device=0,1"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
```

### Long-Running Service with Restart Policy

```bash
docker run -d --name inference \
  --gpus all \
  --restart unless-stopped \
  -p 127.0.0.1:8080:8080 \
  -v /data/models:/models \
  your-image:tag
```

Bind the published port to `127.0.0.1` and tunnel over SSH if you want browser access, instead of opening the port in the security group.

---

## Picking a Base Image

| Use case                  | Recommended base                                            |
| ------------------------- | ----------------------------------------------------------- |
| Lightweight verification  | `nvidia/cuda:12.4.1-base-ubuntu22.04`                       |
| CUDA development          | `nvidia/cuda:12.4.1-devel-ubuntu22.04` (includes `nvcc`)    |
| PyTorch training          | `pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime`             |
| TensorFlow training       | `tensorflow/tensorflow:2.16.1-gpu`                          |
| vLLM inference            | `vllm/vllm-openai:latest`                                   |
| Text Generation Inference | `ghcr.io/huggingface/text-generation-inference:latest`      |

Pin to a specific tag (not `latest`) for reproducibility once your workload is stable.

---

## Storage Considerations

GPU container images are large — PyTorch and TensorFlow images are 5–10 GB; vLLM and TGI add model weights on top. The root disk fills fast.

- Mount large datasets and model weights from an attached block storage volume or Parallel File Storage, not the root disk.
- Bake your application image once and reuse it. See [Bake and Reuse a GPU Image](./save-and-reuse-images).
- Run `docker system prune -af --volumes` to reclaim space from stopped containers and dangling images.

---

## Related Resources

| Resource                                                                          | Use it for                                                       |
| --------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| [Troubleshoot GPU Nodes](/docs/myaccount/gpu/troubleshoot)                        | Driver and CUDA issues, container-toolkit failures.              |
| [Bake and Reuse a GPU Image](./save-and-reuse-images)                             | Skip multi-GB first-boot installs on new nodes.                  |
| [Serve LLM Inference](./serve-llm-inference)                                      | Run vLLM or TGI on a GPU node.                                   |
| [Block Storage](/docs/myaccount/storage/block_storage)                            | Attach additional volumes for datasets and model artifacts.      |
| [Connect to a Linux GPU node](/docs/myaccount/gpu/connect-to-gpu/linux-gpu-node)  | SSH and verify the driver.                                       |