Actions

Manage your Training Cluster through the Actions menu and the quick-action buttons on the Cluster Details page.

The Actions menu is available in two places:

Training Cluster list — click the ⋮ icon on any cluster row.
Cluster Details page — click the Actions button in the top-right area.

Action	Description
Update SSH Keys	Add or replace SSH keys on the cluster nodes
Update Image	Change the container image running on cluster nodes without recreating the cluster
Scale Cluster	Increase the number of nodes on a running cluster
Convert to Committed	Switch from On-Demand hourly pricing to a Committed plan
Restart All Workers	Restart all compute nodes in the cluster simultaneously
Restart Cluster	Restart the full cluster, including the Slurm controller and all nodes
Clone Cluster	Create a new cluster with the same configuration
Terminate Cluster	Stop the cluster and release all associated resources
Delete Training Cluster	Permanently delete a terminated cluster and all its records

danger

Terminating a cluster is irreversible. Ensure no active workloads are running before terminating.

warning

Only terminated clusters can be deleted. Terminate the cluster first if it is still running.

Quick Actions

The top-right area of the Cluster Details page also provides direct shortcut buttons:

Button	Description
Refresh	Reload the current cluster status and metrics
Connect	Opens a connection help panel with SSH instructions for the cluster
Restart All Workers	Restart all compute nodes without restarting the Slurm controller
Restart Cluster	Restart the entire cluster including the Slurm controller

Cluster Tabs Reference

Each cluster's detail page is organized into the following tabs:

Tab	Description
Details	Cluster name, image version, status, node count, created by/at; Plan details (plan name, price, CPU, memory, GPU); Connection details (SSH keys, Floating IP, SSH command)
Cluster Overview	GPU and job summary cards, node health status (IDLE / ALLOCATED / MIXED / UNKNOWN), and Slurm partition table
Nodes	Node-level DCGM metrics; filter nodes by All, Failed, XID Errors, or Healthy
Jobs	All Slurm jobs with Running / Pending / Completed / Failed / Unknown counters and a detailed job table
Monitoring	Per-node GPU metrics overlay with time-interval controls (5m, 15m, 1h, 6h, 1d) and job summary stats
Logs	Slurm controller and node logs; select replica, set auto-refresh, and filter by last N lines
Volumes	Manage attached storage — Datasets, Shared File System (SFS), and Parallel File System (PFS)
Network & Security	VPC configuration, Reserve IP management, and Security Group assignment

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

Actions Menu​

Quick Actions​

Cluster Tabs Reference​

Actions Menu

Quick Actions

Cluster Tabs Reference