--- title: Quick Start --- This guide walks you through creating a Training Cluster and connecting to it. ## Step 1: Navigate to Training Cluster 1. Go to the **TIR Dashboard**. 2. In the left sidebar, click **Training Cluster**. 3. Click **Create Training Cluster**. --- ## Step 2: Configure Your Cluster Enter a name for your cluster. ### Image Select the image and version for your cluster nodes. - **Ubuntu Slurm** and **NeMo Framework** images are available in multiple versions — each includes pre-installed GPU drivers, CUDA, NCCL, and the Slurm runtime. - The version dropdown shows available releases for the selected image. For custom Docker or OCI images, you can pull and convert them using Enroot, then cache the resulting squash files on your shared storage (PFS/SFS) for reuse across jobs — avoiding repeated pulls on every run. ### Plan Configuration :::tip Recommended We recommend setting up a [Private Cluster](/docs/tir/private_cluster/) before creating a Training Cluster — it reserves a dedicated pool of nodes for your team, ensuring GPU capacity is always available and provisioning is faster. For more details, refer to the [Private Cluster documentation](/docs/tir/private_cluster/). ::: Choose between the **GPU** and **Private Cluster** tabs: - **GPU** — Select from available GPU plan cards. Each card shows the GPU type, CPU count, RAM, and hourly rate. Use the **Workers** counter to set the number of nodes. - **Private Cluster** — Nodes are reserved exclusively for your workloads, making this ideal if you plan to create and recreate clusters frequently to hold reserved node capacity. See the [Private Cluster](/docs/tir/private_cluster/) documentation for setup details. A **Pricing** summary appears automatically based on your selected plan and node count. ### Access | Field | Required | Description | |-------|----------|-------------| | **SSH Keys** | Yes | SSH key for connecting to the login node, which is used to schedule and submit workloads to the worker nodes | | **Parallel File System** | Yes | PFS volume mounted on all cluster nodes | ### Advanced Settings Expand **Advanced Settings** to configure: | Field | Required | Description | |-------|----------|-------------| | **Security Group** | Yes | Controls inbound and outbound network access to the cluster | | **Lifecycle Script** | No | Script that runs on each node after the cluster is created | | **Shared File System** | No | SFS volume mounted on cluster nodes | | **Dataset Storage** | No | Dataset attached to cluster nodes — mounted as read-only | :::info At least one storage volume must be mounted on the cluster. Use PFS or SFS for read-write access. Datasets are mounted as read-only and are suitable for loading training data but cannot be used to write checkpoints or logs. ::: Once all required fields are filled, click **Create Training Cluster**. :::info Provisioning typically completes within a few minutes. Node count, SSH keys, and container image can be updated after creation without recreating the cluster. ::: You can also create a Training Cluster using the API. Refer to the [Training Cluster API Reference](/api/tir/#/paths/distributed_jobs-cluster/post) for parameters and examples. --- ## Step 3: Connect to Your Cluster Once the cluster status shows **Running**: 1. Click the **Connect** button (terminal icon) in the top-right area of the Cluster Details page. 2. Follow the connection instructions in the sidebar, or use the **SSH Command** displayed in the **Details** tab directly: ```bash ssh root@ ``` The SSH command and Floating IP are also visible under **Connection Details** in the **Details** tab. --- ## Next Steps - [Features](/docs/tir/TrainingCluster/tc-features) — Scheduling, images, monitoring, scaling, and more - [Actions](/docs/tir/TrainingCluster/tc-actions) — Manage your cluster - [Billing](/docs/tir/TrainingCluster/tc-billing) — Understand cluster pricing ---