Skip to main content

Troubleshooting and FAQs

FAQs

1. If an instance is deleted, are attached resources (PFS, SFS, Datasets) also deleted?

No. Deleting an instance does not remove attached resources such as PFS, SFS, or Datasets. These resources are independent of the instance lifecycle and remain intact after deletion. You can attach them to a new instance at any time.


2. Can I run multiple containers on a single instance?

No. Each instance is designed to run only one container. This ensures predictable performance, dedicated resource allocation, and better workload isolation.


3. Can I change the image of an instance?

Yes. You can update the container image of an existing instance using the Update Image action. Changing the image replaces the currently running container with the new image while keeping the instance configuration intact. This is useful for switching frameworks or deploying updated environments.


4. Can I change the GPU or CPU configuration?

For On-Demand instances, yes. You can scale up, scale down, or switch between CPU and GPU configurations at any time using Update Plan. For example, you can move from a CPU instance to a GPU instance or from 1 GPU to 4 GPUs based on your workload needs. Committed instances cannot be modified during the commitment period.


5. Can I stop a committed instance?

No. Committed instances cannot be stopped during the commitment period. Resources are exclusively reserved for you for the full duration. If you no longer need the instance, you can delete it, but no refund will be issued for the remaining period.


6. Can I delete a committed instance?

Yes. You can delete a committed instance at any time. Once deleted, it is removed from the cluster and its compute resources are released. However, no refund is issued for the remaining commitment period upon deletion.


7. Can I delete a running instance?

Yes. Running instances can be deleted at any time. Deleting an instance immediately removes it from the cluster and releases all associated compute resources. Ensure all important data is saved before deletion, as the action is irreversible.


8. Can I change the plan of an instance?

Yes, but only for On-Demand instances. Changing the plan updates the allocated compute resources such as CPU, GPU, or RAM. This allows you to scale resources up or down without recreating the instance. Committed instances do not support plan changes during the commitment period.


9. What is the average time to create an instance?

Instance creation typically takes 30 seconds to 1 minute under normal conditions. The exact time may vary depending on resource availability, hardware inventory, and configuration complexity. For faster provisioning with minimal inventory dependency, using a Private Cluster is recommended.


10. What is the average time to save an image?

Saving an image usually takes 1 to 2 minutes. The duration depends on image size, storage performance, and system load at the time of saving.


11. What happens to my instances if a project or team is deleted?

If a project or team is deleted, all instances associated with it are automatically and permanently deleted. This action is irreversible. Ensure all important data is backed up before deleting a project or team.


12. What happens to my instances if my account is deprovisioned?

If your account is deprovisioned, all associated instances and active compute resources are permanently deleted.


13. What happens to my instances if my account is suspended?

If your account is suspended, your instances will not be deleted immediately. However, access may be restricted and operations such as starting, stopping, or modifying instances may be temporarily disabled until the account is reactivated.


14. Will I be charged if my instance is in a Stopped state?

No. On-Demand compute billing pauses when the instance is stopped. However, workspace disk and any attached storage resources continue to be billed until explicitly deleted.


15. Why do my manually installed packages disappear after a restart?

Packages installed directly in the terminal are not persisted by default across restarts. Use a Start Script to reinstall them automatically on every boot, or use Save Image to capture your environment as a reusable image.


16. Is JupyterLab available on all images?

No. JupyterLab is only available on images that include it, such as TIR prebuilt images or images built using the TIR Image Builder. Always verify compatibility before enabling the JupyterLab Supported option during instance creation.


17. What is the difference between Restart and Stop?

Restart refreshes the instance session while preserving saved data and configurations. Stop shuts down the instance and releases compute resources while keeping your data intact. Use Restart for a fresh session and Stop when you want to pause usage and billing.


Troubleshooting

This section provides guidance on handling resource constraints and transitional states. Most automated processes involve complex hardware orchestration and specific statuses may require several minutes to resolve.


  1. Inventory Not Available

Cause: The required hardware inventory (e.g., a specific GPU model such as H100) was not available at the time of the operation.

Resolution: Retry after a brief interval. Resources are dynamically deallocated as other workloads conclude. We recommend waiting 15 to 20 minutes before attempting to relaunch or update your plan.


  1. Transitional Status Delays

It is normal for an instance to remain in a transitional state while the underlying infrastructure completes its operations. The table below describes common transitional states and the recommended actions.

Action PerformedStatusSystem ActivityWhat to Do
Create, Restart, or Update PlanWAITINGProvisioning hardware and mounting network storage volumesAvoid refreshing the browser. The instance will typically reach Running state within 2 to 3 minutes
Stop InstanceSTOPPINGDeallocating compute resources while preserving attached Datasets, SFS, and PFSNo action needed. The instance will reach Stopped state within a few minutes
Save ImageSAVING IMAGECompressing the environment and transferring snapshots to the Container RegistryWait for the success notification before restarting. Duration depends on disk size

System Health Best Practices

  • Monitor Transitional States: If an instance remains in a transitional state for more than 15 minutes, it may indicate a backend timeout. Note your Instance ID and contact the support team.
  • Avoid Redundant Actions: Repeatedly clicking Restart or Update while an instance is in a Waiting or Saving state can cause command queuing and extend wait times.
  • Check Registry Capacity: Before saving an image, ensure your Container Registry has sufficient available capacity to prevent stuck upload states.