--- title: Run GPU Workloads in Docker --- # Run GPU Workloads in Docker The recommended way to run a GPU workload on an E2E GPU node is in a container. The container ships the CUDA toolkit and framework versions your application needs; the host only provides the NVIDIA driver. This guide shows how to verify the setup and run a container with GPU access. For host-side driver issues, see [Troubleshoot GPU Nodes](/docs/myaccount/gpu/troubleshoot). For SSH access, see [Connect to a Linux GPU node](/docs/myaccount/gpu/connect-to-gpu/linux-gpu-node). --- ## How GPU Containers Work - The **host** runs the NVIDIA datacenter driver. The driver exposes the GPU to the kernel. - The **container** ships the CUDA runtime, cuDNN, frameworks (PyTorch, TensorFlow, JAX, vLLM, TGI), and your application code. - The **NVIDIA Container Toolkit** is the glue. It tells the Docker daemon to mount the driver, devices, and required libraries into the container at runtime. A container's CUDA can be older than the driver's supported CUDA, but it cannot be newer. Check the maximum CUDA version with `nvidia-smi` on the host before picking a base image. --- ## Verify the Setup E2E GPU images based on Ubuntu 22.04 ship Docker and the NVIDIA Container Toolkit pre-installed. Ubuntu 24.04-based images do not — install them first using the steps in the [Install the NVIDIA Container Toolkit](#install-the-nvidia-container-toolkit) section below. After SSH login, run: ```bash docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi ``` A healthy run prints the same card, driver version, and CUDA version as `nvidia-smi` on the host — but produced from inside the container. If the run fails with `unknown flag: --gpus` or `could not select device driver "" with capabilities: [[gpu]]`, the toolkit is missing or misconfigured. See [Docker Cannot Access the GPU](/docs/myaccount/gpu/troubleshoot#docker-cannot-access-the-gpu). --- ## Install the NVIDIA Container Toolkit Only needed if the toolkit is missing from your image, or if you reinstalled the OS. ### Ubuntu 22.04 / 24.04 ```bash curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt update sudo apt install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker ``` ### Rocky Linux 9 ```bash curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo sudo dnf install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker ``` :::warning `nvidia-docker2` is deprecated and is not compatible with Docker Engine 25 or later. Use `nvidia-container-toolkit` and the `--gpus` flag on all current GPU nodes. ::: --- ## Common Run Patterns ### Single GPU, Interactive Shell ```bash docker run --rm -it --gpus all \ -v $PWD:/workspace -w /workspace \ pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime bash ``` Inside the container: ```bash python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())" ``` ### Specific GPU(s) on a Multi-Card Node ```bash # Only card 0 docker run --rm --gpus '"device=0"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi # Cards 0 and 1 docker run --rm --gpus '"device=0,1"' nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi ``` ### Long-Running Service with Restart Policy ```bash docker run -d --name inference \ --gpus all \ --restart unless-stopped \ -p 127.0.0.1:8080:8080 \ -v /data/models:/models \ your-image:tag ``` Bind the published port to `127.0.0.1` and tunnel over SSH if you want browser access, instead of opening the port in the security group. --- ## Picking a Base Image | Use case | Recommended base | | ------------------------- | ----------------------------------------------------------- | | Lightweight verification | `nvidia/cuda:12.4.1-base-ubuntu22.04` | | CUDA development | `nvidia/cuda:12.4.1-devel-ubuntu22.04` (includes `nvcc`) | | PyTorch training | `pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime` | | TensorFlow training | `tensorflow/tensorflow:2.16.1-gpu` | | vLLM inference | `vllm/vllm-openai:latest` | | Text Generation Inference | `ghcr.io/huggingface/text-generation-inference:latest` | Pin to a specific tag (not `latest`) for reproducibility once your workload is stable. --- ## Storage Considerations GPU container images are large — PyTorch and TensorFlow images are 5–10 GB; vLLM and TGI add model weights on top. The root disk fills fast. - Mount large datasets and model weights from an attached block storage volume or Parallel File Storage, not the root disk. - Bake your application image once and reuse it. See [Bake and Reuse a GPU Image](./save-and-reuse-images). - Run `docker system prune -af --volumes` to reclaim space from stopped containers and dangling images. --- ## Related Resources | Resource | Use it for | | --------------------------------------------------------------------------------- | ---------------------------------------------------------------- | | [Troubleshoot GPU Nodes](/docs/myaccount/gpu/troubleshoot) | Driver and CUDA issues, container-toolkit failures. | | [Bake and Reuse a GPU Image](./save-and-reuse-images) | Skip multi-GB first-boot installs on new nodes. | | [Serve LLM Inference](./serve-llm-inference) | Run vLLM or TGI on a GPU node. | | [Block Storage](/docs/myaccount/storage/block_storage) | Attach additional volumes for datasets and model artifacts. | | [Connect to a Linux GPU node](/docs/myaccount/gpu/connect-to-gpu/linux-gpu-node) | SSH and verify the driver. |