Training Cluster
TIR Training Cluster is a dedicated GPU compute environment for distributed AI model training. Powered by Slurm-native scheduling, it gives you fixed-price node allocations, elastic scaling, high availability, and the flexibility to run pre-built framework images or any custom container via Enroot — with full job visibility from the TIR dashboard and no per-job charges.
Quick Start
Overview
Understand what a Training Cluster is and when to use it.
Quick Start
Create your first Training Cluster and start running training jobs.
Features
Explore scheduling, images, monitoring, scaling, and storage capabilities.
Troubleshooting & FAQs
Resolve common issues and get answers to frequently asked questions.
Explore Training Cluster
Cluster Management
Create, scale & manage
Images & Containers
Ubuntu Slurm & Enroot
API Reference
Training Cluster API Reference
Programmatically create, manage, and monitor TIR Training Clusters. Automate cluster provisioning, scale nodes, and control the cluster lifecycle via REST.
/projects/{id}/distributed_jobs_v2/cluster/plans/List available cluster plans/projects/{id}/distributed_jobs_v2/cluster/List training clusters/projects/{id}/distributed_jobs_v2/cluster/Create a training cluster/projects/{id}/distributed_jobs_v2/cluster/{id}/Get cluster details/projects/{id}/distributed_jobs_v2/cluster/{id}/Perform a cluster action/projects/{id}/distributed_jobs_v2/cluster/{id}/Delete a training clusterBilling & Plans
Billing & Credits
Training Clusters are billed at a fixed rate based on your cluster plan. Pricing does not vary with resource utilization, and all jobs running on the cluster incur no additional charges.
Fixed-rate billing
Billed per hour based on your cluster plan — cost does not change with GPU utilization.
No per-job charges
All Slurm jobs running on the cluster are included at no extra cost.
On-Demand or Committed
Start with On-Demand hourly pricing and convert to a Committed plan when ready.