Training Cluster
TIR Training Cluster provides a dedicated compute environment for running distributed AI model training workloads. Get fixed-price allocations of GPU, CPU, and RAM with support for PyTorch, PyTorch Lightning, Slurm, and OpenMPI frameworks — no extra charges for deployments running inside the cluster.
Quick Start
Overview
Understand what a Training Cluster is and when to use it.
Quick Start
Create your first Training Cluster and launch a training deployment.
Deployment Frameworks
Configure PyTorch, Slurm, or MPI deployments on your cluster.
Troubleshooting & FAQs
Resolve common issues and get answers to frequently asked questions.
Explore Training Cluster
Cluster Management
Create, upgrade & terminate
Monitoring
Cluster health & GPU metrics
Storage & Data
Shared file systems & datasets
Deployment Actions
Restart, clone & manage jobs
API Reference
Training Cluster API Reference
Programmatically create, manage, and monitor TIR Training Clusters and deployments. Automate cluster provisioning, retrieve job status, and control lifecycle via REST.
/projects/{id}/distributed_jobs/cluster/List all training clusters/projects/{id}/distributed_jobs/cluster/Create a training cluster/projects/{id}/distributed_jobs/cluster/{id}/Get cluster details/projects/{id}/distributed_jobs/cluster/{id}/Delete a training cluster/projects/{id}/distributed_jobs/jobs/Create a deployment/projects/{id}/distributed_jobs/jobs/List all deploymentsBilling & Plans
Billing & Credits
Training Clusters are billed at a fixed rate based on your cluster plan. Pricing does not vary with resource utilization, and deployments running inside the cluster incur no additional charges.
Fixed-rate billing
Billed per hour based on cluster plan — costs do not change with GPU utilization.
No per-deployment charges
Run multiple training deployments inside your cluster at no extra cost.
Plan-based pricing
Choose a cluster plan that matches your RAM, CPU, and GPU workload requirements.