Nodes

TIR nodes are fully collaborative environments that make AI development possible. They combine power of nodes, jupyter labs and AI/ML frameworks to create a readily usable workspace for you and your entire team.

Some of the most common use cases are:

Run a script or Node to fine-tune a Large Language Model (LLM) on single GPU using pytorch or huggingface train
Run a script or Node to tokenize and fine-tune LLMs or Diffusion models with with multiple GPUs (single machine) using deepspeed and accelerate
Open and run a jupyter Node (.ipynb) from the platforms like github, kaggle or collab
Download and review datasets stored on TIR or other platforms like huggingface
Download and test models like stable diffusion, or any LLM

Note

A TIR Node is fully functional coding environment.

If you prefer to work with command line (shell) over jupyter labs, you can configure ssh on a Node. This way you can upload your data using sftp or sync your code with git tools and run the scripts as you would on your local system.

Getting Started

Go to TIR Dashboard
Create or Select a project
Click on Node in side-bar section
Click CREATE Node
Choose an appropriate name for your Node
Select a Node type as NEW NOTEBOOK. If you want to open a Node found on other platforms like github or google collab, you can choose IMPORT NOTEBOOK option which allows entering the URL of the target Node.
Next, you can select from pre-built or one of your own images. For the sake of simplicity, Click on Pytorch 2 option
Next, you can choose a CPU or GPU plan. Feel free to choose Free Tier plan for this exercise
(Optional) Set Enable SSH Access switch to enabled and add or select your SSH key.
Leave the rest of the options as-is and click CREATE
Next, you will see a new Node appear in the list of conatiner. Wait for the Node to come to a ready state.
When the Node is ready, you will see both Jupyter labs and SSH options (if configured). Choose any of these to access the Node environment and work on your magic.

Node options

TIR nodes are extremely powerful and flexible. While most configurations have a default to make our life eaiser, sometimes you may need to tweak the knobs. The following are the configurations that you can tweak in a Node environments:

Enable SSH: You can enable SSH access on the Node using public key or password (not recommended). If you decide to enable ssh after starting a Node, you will have to first stop the Node before you make changes.
Disk Size: Each TIR Node can have a disk size upto 5000GB. The default is 30GB. The selected disk will be mounted at /home/jovyan in your Node environment. We recommend using this path as your workspace so in case of restarts, your content will be persistent. Since, TIR is Node-native, the changes that you make to any other paths on the Node will not be persisted on restarts. You can extend the disk size after the start of conatiner as well. This workspace will be deleted when the associated Node is deleted.

Note

Please raise a support ticket if you need more than 1TB of disk workspace.

Local NVME Storage: Only available for H100 plans. This fast local storage will be available at /mnt/local andonly for the duration of run. We recommend using this path when you need faster writes (e.g. save model checkpoints) or reads. Be sure to move this data to EOS bucket or under /home/jovyan before shutting down the Node. This type of storage is fixed and can not be expanded at anytime during the Node cycle.
Plan (Pricing): You can choose between an hourly or committed plan. We recommend using committed plans as they offer discounts and also may offer access to local NVME storage (for H100 plans only).
Node Image: TIR environments are Node-native. You can use pre-built images with well known frameworks like pytorch, tranformers or customise the pre-built images. You can make your own images TIR-compatible using image builder utility. We recommend starting with pre-built images. In case you need to install packages from pip or apt-get, we recommend doing so from a jupyter notebook (.ipynb) or maintaining requirements.txt.
Configuration: TIR offers a variety of cpu and gpu options. We recommend using A100 or H100 for best performance.
Update Node: You can upgrade or downgrade both the configuration (e.g. upgrade from cpu to gpu) and Plan (e.g. hourly to commited) of a Node if desired. This is useful option when restarting nodes and the original hardware plan (gpu) on the Node is no longer available.
Stop Node: If the plan and configuration allows, you can stop a Node and restart. In case of hourly plan, you will not be billed for the GPU or GPU when node is in a stopped state. However, if your disk usage is beyond free tier, you will be charged for it.
Delete Node: When a Node is deleted, all the resources associated with it will be deleted including the workspace (disk).

Node Status

Waiting: The Node instance is being deployed on the hardware of your choise.
Running: The Node is active and you can use either jupyter labs or ssh (if enabled) to access it.
Stopped: The Node is not assigned to any machine. However, the workspace (disk mounted at /home/jovyan) will continue to exist until you delete the Node. Depending on the size of the disk, you will charged for the usage.

How to create Node ?

To create a Node you have to click on Create Node which is at the right corner of the page.

After clicking on the Create Node button a page will appear, now select Node image option from TIR PRE-BUILT , BASE OS and CONTAINER REGISTRY. Additionally you can also perform the search on the Node Images.

The Base OS node image does not come with JupyterLab pre-installed.

When installing an image from the Container Registry, the user must specify whether the selected image includes JupyterLab pre-installed or not.

../_images/node_create_container_reg.png

After selecting the image, the you have the option to select from TIR Cluster & Private Cluster. When TIR Cluster is chosen the Resource page will appear. At this stage, choose a plan based on either CPU or GPU requirements.

Additionally, you can filter CPU and GPU resources based on your specific requirements for more tailored selection.

../_images/node_create_tir_cluster_filter.png

When selecting a Private Cluster, you can either choose from the available options or create a new one. For the chosen Private Cluster, you can specify the quantity of vCPUs, RAM, and GPU.

../_images/node_create_private_cluster.png

At this step, the user can also add the required dataset to the node being created.

The user is prompted to provide the essential details before the node is created.

After all steps are completed, the Node Summary details will be displayed.

After clicking on the ‘Create’ button, the page will redirect to the ‘Manage Nodes’ page and display all details there.

Node Details

Overview

You can see the Node Details, Plan Details and connection details under Overview tab.

Disk Size

You can see the details disk size and also You can change the Disk size as per your requirements.

For updating the disk size you have to change the disk size and then click on update button.

Metrics

You can see the Metrics graph in CPU Utilization , Memory Utilization & Interval.

You can see the one month activity as per your requirement in days & hours.

Associated Datasets

You can view the Associated Datasets, categorized into Mounted and Unmounted datasets. To mount an unmounted dataset, simply select the dataset and click Update.

Similarly, to unmount a dataset, unselect the desired dataset and click Update.

Configure SSH

You can see the SSH Key Details under ssh key tab.

Update SSH Key

Note

Only one SSH key can be added to a Launch Notebook from Sidebar

Add SSH Key After Node Creation

Note

When user want to add ssh key after Node creation, kindly first stop Node then add ssh key

Application ports

Application Ports allows users to set their own custom port, enabling them to run services on their preferred IP and port.

To Add custom port on notebook click on add ports button under Application ports tab.

After clicking that you can see the below screen, in that you can add multiple port.

After successfully added the port You will get a public ip.

To delete ports click on delete icon.

To reset port click on reset button.

Node Actions

You can see the actions like Launch Notebook, Stop, Update Node, Delete.

Launch Notebook

After clicking on Launch Notebook, Notebook will be launched and it should be visible like this.

Restart Node

For Restarting the Node you have to click on Restart Node button and the Notebook will be restarted.

Stop Node

For Stopping the Node you have to click on Stop button and the Notebook will be stopped.

Update Node

You can update Node, For updating the Node You have to click on Update button.

Note

Node must be in Stop state before updating the Node.

Delete Node

For Deleting the Node you have to click on Delete button.