PyTorch Distributed
PyTorch Distributed Data Parallel (DDP) lets you train models across multiple GPUs on a single node by running one process per GPU and synchronizing gradients after each backward pass. On TIR, your Training Cluster node comes fully pre-configured — drivers, CUDA, NCCL, and PyTorch are ready to use.
Environment
Each TIR Training Cluster node comes pre-configured with:
- PyTorch, CUDA, and NCCL installed and optimized
- GPU drivers and high-bandwidth interconnects (NVLink or PCIe)
- Identical software environments across all GPUs
Connect to the Node
ssh $hostname
Shared Storage
All datasets, checkpoints, and logs should be written to the shared directory so they persist after the deployment ends:
/mnt/shared