TIR Nodes

TIR Nodes are fully collaborative environments that make AI development possible. They combine the power of containers, Jupyter Labs, and AI/ML frameworks to create a readily usable workspace for you and your entire team.

Some of the most common use cases are:

Run a script or notebook to fine-tune a Large Language Model (LLM) on a single GPU using PyTorch or Hugging Face train.
Run a script or notebook to tokenize and fine-tune LLMs or Diffusion models with multiple GPUs (single machine) using DeepSpeed and Accelerate.
Open and run a Jupyter notebook (.ipynb) from platforms like GitHub, Kaggle, or Colab.
Download and review datasets stored on TIR or other platforms like Hugging Face.
Download and test models like Stable Diffusion or any LLM.

Note

A TIR node is a fully functional coding environment. If you prefer to work with the command line (shell) over Jupyter Labs, you can configure SSH on a notebook (node). This way, you can upload your data using SFTP or sync your code with Git tools and run the scripts as you would on your local system.

Getting Started with Nodes on TIR

1. User needs to log in using myaccount credentials and select the TIR AI platform to use resources on TIR.
2. User will land on the TIR landing page and create a project using the "Create project" button.
3. NOTE: The project will be created in the user's private workspace.
4. Once you are in the newly created project, click on "Nodes" in the side panel to launch the resource.
5. User can select from the available images to create a node.

Note

User can also filter the nodes based on the need using TIR pre-built, Container Registry, and Base OS.

1. Next, you can choose a CPU or GPU plan. Feel free to choose the Free Tier plan for this exercise.
2. User needs to select the GPU or CPU required plan. In the case of CPU, the user can always select the Free Tier Plan in the exploration stage. In the case of GPU, if the required plan is not available, the user can always request the plan, and we will notify the user once it is available via email.
3. Choose an appropriate name for your node and select the type of nodes as New Notebook or Import Notebook. If Import Notebook is selected, then the user needs to provide the URL for the node.

Note

Choosing New Notebook will open an empty JupyterLab, while Import Notebook will give the user the flexibility to seamlessly pull publicly available nodes from GitHub or Colab.

1. Select the workspace size you want to create. Please note, by default, 30 GB of free workspace is provided with each node.
2. (Optional) Set Enable SSH Access switch to enabled to add or select your SSH key.
3. When the node is ready, you will see both Jupyter Labs and SSH options (if configured). Choose any of these to access the node environment.

Node Options

TIR nodes are extremely powerful and flexible. While most configurations have a default to make the user's life easier, sometimes you may need to tweak the knobs. The following are the configurations that you can tweak in a node environment:

Enable SSH: You can enable SSH access on the node using a public key or password (not recommended). If you decide to enable SSH after starting a node, you will have to first stop the node before making changes.
Disk Size: Each TIR node can have a disk size of up to 5000GB. The default is 30GB. The selected disk will be mounted at /home/jovyan in your node environment. We recommend using this path as your workspace so in case of restarts, your content will be persistent. Since TIR is container-native, the changes that you make to any other paths on the node will not be persisted on restarts. You can extend the disk size after starting the container as well. This workspace will be deleted when the associated node is deleted.

Note

Please raise a support ticket if you need more than 5TB of disk workspace.

Local NVME Storage: Only available for H100 plans. This fast local storage will be available at /mnt/local and only for the duration of the run. We recommend using this path when you need faster writes (e.g., save model checkpoints) or reads. Be sure to move this data to the EOS bucket or under /home/jovyan before shutting down the node. This type of storage is fixed and cannot be expanded at any time during the node cycle.
Plan (Pricing): You can choose between an hourly or committed plan. We recommend using committed plans as they offer discounts and may also offer access to local NVME storage (for H100 plans only).
Node Image: TIR environments are container-native. You can use pre-built images with well-known frameworks like PyTorch, Transformers, or customize the pre-built images. You can make your own images TIR-compatible using the image builder utility. We recommend starting with pre-built images. In case you need to install packages from pip or apt-get, we recommend doing so from a jupyter notebook (.ipynb) or maintaining requirements.txt.
Configuration: TIR offers a variety of CPU and GPU options. We recommend using A100 or H100 for the best performance.
Update Node: You can upgrade or downgrade both the configuration (e.g., upgrade from CPU to GPU) and Plan (e.g., hourly to committed) of a node if desired. This is a useful option when restarting nodes and the original hardware plan (GPU) on the node is no longer available.
Stop Node: If the plan and configuration allow, you can stop a node and restart it. In the case of an hourly plan, you will not be billed for the GPU or CPU when the node is in a stopped state. However, if your disk usage is beyond the free tier, you will be charged for it.
Delete Node: When a node is deleted, all the resources associated with it will be deleted, including the workspace (disk).

Node Statuses

Waiting: The node instance is being deployed on the hardware of your choice.
Running: The node is active, and you can use either Jupyter Labs or SSH (if enabled) to access it.
Stopped: The node is not assigned to any machine. However, the workspace (disk mounted at /home/jovyan) will continue to exist until you delete the node. Depending on the size of the disk, you will be charged for the usage.

How to Create Node?

To create a Node, you have to click on Create Node, which is at the right corner of the page.

Create Node

After clicking on the Create Node button, a page will appear. Now select the Node image option from TIR PRE-BUILT, BASE OS, and CONTAINER REGISTRY. Additionally, you can also perform the search on the Node Images.

Node Image Selection

The Base OS node image does not come with JupyterLab pre-installed.

Base OS Node

When installing an image from the Container Registry, the user must specify whether the selected image includes JupyterLab pre-installed or not.

Container Registry Node

After selecting the machine, the Resource page will appear. At this stage, choose a plan based on either CPU or GPU requirements.

Node Resource Selection

Additionally, you can filter CPU and GPU resources based on your specific requirements for a more tailored selection.

CPU Filter

At this step, the user can also add the required dataset to the node being created.

Node Storage

The user is prompted to provide the essential details before the node is created.

Node Details

After all steps are completed, the Node Summary details will be displayed.

Node Summary

After clicking on the 'Create' button, the page will redirect to the 'Manage Nodes' page and display all details there.

Manage Nodes

Node Details

Overview

You can see the Node Details and Plan Details under the Overview tab.

Node Overview

Node Events

Monitoring

You can see the monitor graph in CPU Utilization, Memory Utilization & Interval.

Metrics Graph

You can see the one-month activity as per your requirement in days & hours.

One Month Activity

Workspace Size

You can see the details disk size and also You can change the Disk size as per your requirements.

Disk Size

For updating the disk size you have to change the disk size and then click on update button.

Update Disk Size

Associated Datasets

You can also see the Associated Datasets with two different datasets- Mounted & Unmounted. You can also Unmount.

Associated Datasets

Associated File System

Network & security

You can configure ssh key under Network and security tab.

Launch Node

Start script

In this section, users can select the desired script to attach to the Node. The attached script enables seamless execution on the Node, ensuring that all necessary dependencies are installed and configured for optimal performance.

Node Details

Users can create a new script by clicking on the Click Here button.

Node Details

Users have the flexibility to either upload a script file directly from their local machine or manually write/copy the script code within the interface. Before saving and utilizing the script, users are required to provide a name for the script file.

Node Details

Users can select a script from the list of available scripts across the project. Additionally, users have the ability to update an existing script or delete any script as needed.

Note

Any changes made to the script will only take effect once the Node, to which the script is attached, is restarted.

Note

Users can only delete a script if it is not currently attached to any Node.

Node Details

When a script is added to or removed from the Node, a confirmation dialog box appears to verify the action.

Node Details

Node Actions

You can see the actions like Launch Notebook, Stop Node,Restart Node,Update Image,Update Plan, Delete.

Node Actions

Launch Notebook

After clicking on Launch Node, Node will be launched and it should be visible like this.

Node Launched 1

Node Launched 2

Stop Node

For Stopping the Notebook you have to click on Stop button and the Node will be stopped.

Stop Node

Restart Node

For Restarting Notebook you have to click on restart button and notebook will be restarted.

Restart Node

Update Image

You can update notebook image, For updating image you have to click on update image.

Update Node

Select image and click on update button.

Update Node

Update Plan

You can update Node, For updating the Node You have to click on Update button.

Update Node

Note

Node must be in Stop state before updating the Node.

Convert to Committed

After creating a node, it can be converted to a committed node if the committed SKU is available by selecting the "Convert to Committed" option.

Nodes on an hourly plan without an associated SKU will display a grayed-out "Convert to Committed" button.

Nodes on the Private Cluster or already Committed does not show this option.

Note

This feature allows a node to be converted to a committed node without the need to stop or restart it, ensuring seamless operation.

Convert to Committed Option

Save Image

When the node is restarted, any packages you manually installed will need to be re-installed. By saving an image, you can preserve all dependencies, allowing you to restore them later or create new nodes from the saved image.

Create Image

To save an image, click the 'Save Image' button, enter a valid name, and select the container registry where the image should be saved. The saved image will then also appear in the chosen container registry.

Create Saved Image

To store all saved dependencies and packages, click Restore Image icon under Images tab.

Restore Image

Delete Node

For Deleting the Node you have to click on Delete button.

Delete Node

You can launch the node from the left side of the Node name.

Launch from Sidebar

Advance Filter on Node

You can locate the node by entering its name in the search bar.

Node Search

You can access advanced filter options by clicking on this button.

Search Button

You can apply the advanced filter configurations and then click the search button.

Advanced Search

Getting Started with Nodes on TIR​

Node Options​

Node Statuses​

How to Create Node?​

Node Details​

Overview​

Node Events​

Monitoring​

Workspace Size​

Associated Datasets​

Associated File System​

Network & security​

Start script​

Node Actions​

Launch Notebook​

Stop Node​

Restart Node​

Update Image​

Update Plan​

Convert to Committed​

Save Image​

Create Image​

Delete Node​

Launch Node from Sidebar​

Advance Filter on Node​

Getting Started with Nodes on TIR

Node Options

Node Statuses

How to Create Node?

Node Details

Overview

Node Events

Monitoring

Workspace Size

Associated Datasets

Associated File System

Network & security

Start script

Node Actions

Launch Notebook

Stop Node

Restart Node

Update Image

Update Plan

Convert to Committed

Save Image

Create Image

Delete Node

Launch Node from Sidebar

Advance Filter on Node