Training Cluster

TIR Training Cluster is a dedicated GPU compute environment for distributed AI model training. Powered by Slurm-native scheduling, it gives you fixed-price node allocations, elastic scaling, high availability, and the flexibility to run pre-built framework images or any custom container via Enroot — with full job visibility from the TIR dashboard and no per-job charges.

Slurm-Native SchedulingUbuntu Slurm ImagesNeMo (Megatron-Bridge) ImagesFixed PricingElastic Scaling

Quick Start

Overview

Understand what a Training Cluster is and when to use it.

↗

Quick Start

Create your first Training Cluster and start running training jobs.

↗

Features

Explore scheduling, images, monitoring, scaling, and storage capabilities.

↗

Troubleshooting & FAQs

Resolve common issues and get answers to frequently asked questions.

↗

Explore Training Cluster

Cluster Management

Create, scale & manage

→Create a training cluster

→Scale cluster nodes

→Terminate a cluster

Images & Containers

Ubuntu Slurm & Enroot

→Available images

→Custom containers via Enroot

→Update cluster image

Jobs & Monitoring

Slurm squeue & GPU metrics

→Jobs dashboard

→GPU monitoring

→Node health (DCGM)

Storage & Data

PFS, SFS & datasets

→Parallel File System (PFS)

→Shared File System (SFS)

→Datasets

Network & Security

SSH, IPs & security groups

→SSH access

→Reserve IP

→Security groups

API Reference

REST API

</>Training Cluster API Reference

Programmatically create, manage, and monitor TIR Training Clusters. Automate cluster provisioning, scale nodes, and control the cluster lifecycle via REST.

Explore REST APIs

Authentication & Endpoints

Request and Response Schemas

Open API Reference →

tir.e2enetworks.com / api / v1

GET/projects/{id}/distributed_jobs_v2/cluster/plans/List available cluster plans

GET/projects/{id}/distributed_jobs_v2/cluster/List training clusters

POST/projects/{id}/distributed_jobs_v2/cluster/Create a training cluster

GET/projects/{id}/distributed_jobs_v2/cluster/{id}/Get cluster details

PUT/projects/{id}/distributed_jobs_v2/cluster/{id}/Perform a cluster action

DELETE/projects/{id}/distributed_jobs_v2/cluster/{id}/Delete a training cluster

Billing & Plans

Billing & Credits

Training Clusters are billed at a fixed rate based on your cluster plan. Pricing does not vary with resource utilization, and all jobs running on the cluster incur no additional charges.

View Billing Docs →

Fixed-rate billing

Billed per hour based on your cluster plan — cost does not change with GPU utilization.

No per-job charges

All Slurm jobs running on the cluster are included at no extra cost.

On-Demand or Committed

Start with On-Demand hourly pricing and convert to a Committed plan when ready.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

Quick Start​

Explore Training Cluster​

API Reference​

Billing & Plans​

Quick Start

Explore Training Cluster

API Reference

Billing & Plans