--- title: Features --- import { PrivateClusterFeaturesNav, PrivateClusterBestPractices } from './PrivateClusterFeaturesCards' # Features ## 1. Node Lifecycle States Every node in a Private Cluster transitions through the following states: | State | Description | |-------|-------------| | **Free** | Available for allocation to a project | | **Allocated** | Assigned to a project but not yet running a workload | | **Occupied** | Actively running a workload (Node, Inference Endpoint, Training Cluster, or Vector Database) | :::warning **Occupied nodes cannot be deallocated.** The running workload must be stopped or deleted first before the node can be freed. ::: Understanding these states helps you manage cluster capacity without disrupting running services. ## 2. Node Monitoring The **Node Monitoring** view provides real-time visibility into the health and performance of every node in your cluster. ### Available Metrics | Metric | Description | |--------|-------------| | **GPU Usage** | Current GPU utilization as a percentage | | **Memory** | Memory consumption and availability | | **Uptime** | Node uptime percentage indicating reliability | | **Power** | Current power consumption in watts | ### How to Access 1. Open your Private Cluster. 2. Navigate to the **Cluster Nodes** tab. 3. Select a node to view its detailed metrics. ### Benefits - **Proactive Management** – Identify underutilized or overloaded nodes before issues arise - **Cost Optimization** – Make informed allocation decisions based on real usage data - **Performance Tracking** – Ensure service reliability through uptime monitoring ## 3. Access Control Private Cluster access is governed by IAM roles, ensuring only authorized users can manage cluster capacity and node allocation. ### Role-Based Access Matrix | Role | View Cluster | Create / Update Cluster | Allocate Nodes | Deallocate Nodes | Scope | |------|-------------|------------------------|----------------|------------------|-------| | **Admin / Owner** | Yes | Yes | Yes | Yes | Cluster (CRN) | | **Project Manager** | Yes | No | No | No | Assigned projects | | **Project Lead** | Yes | No | No | No | Assigned project only | | **Member** | Yes (Read-only) | No | No | No | As per IAM policy | ### Common Access Scenarios - **I am a Project Manager and want to free GPUs from a project** → Not allowed - **I am a Project Lead and want to resize the cluster** → Not allowed - **I am a Member and want to view cluster usage** → Allowed - **I am an Admin and want to allocate nodes to a project** → Allowed ## 4. Node Allocation Node allocation controls how GPU resources are distributed across your projects. ### Allocation Flow ``` Cluster Created │ ▼ Nodes: Free ──► Allocate to Project ──► Nodes: Allocated │ ▼ Launch Workload ──► Nodes: Occupied ``` ### Key Rules - **Free nodes** can be allocated to any project by users with the appropriate permissions. - **Allocated nodes** can be deallocated as long as no workload is running on them. - **Occupied nodes** cannot be deallocated — stop or delete the workload first. - Allocation and deallocation can be done from both the **Project Allocation** and **Cluster Nodes** views. ## 5. Multi-Project Sharing A single Private Cluster can serve multiple projects simultaneously. Each project gets its own slice of the cluster without sharing workload environments. ### Example A 10-node cluster shared across three projects: | Project | Nodes Allocated | Workloads Running | |---------|----------------|-------------------| | Project A (model training) | 4 | 4 Training Clusters | | Project B (inference) | 3 | 3 Inference Endpoints | | Free pool | 3 | — | :::info Billing remains fixed regardless of how nodes are distributed. GPUs can be reallocated at any time without redeploying infrastructure. ::: ---