Skip to main content
Effective Date — May 7, 2026
As part of the IAM hierarchy simplification, a new format of the TIR APIs is now available. All endpoints that previously included /teams/:team_id segment will now follow a service-centric structure, with project_id now passed as a query parameter instead of a path segment. You can read more about this change

Create Endpoints

Create a new model inference endpoint. The following models and frameworks are supported — select the corresponding example from the request body:

| # | Model / Framework | framework value | Model source | |---|---|---|---| | 1 | vLLM | vllm | HuggingFace model ID | | 2 | SGLang | sglang | HuggingFace model ID | | 3 | Nvidia Dynamo | dynamo | HuggingFace model ID | | 4 | PyTorch / TorchServe | pytorch | Fine-tuned TIR model (model_id) | | 5 | Triton Inference Server | triton | Fine-tuned TIR model (model_id) | | 6 | Custom Container | custom | Any Docker image | | 7 | LLaMA 2 Chat 7B | llma | HuggingFace model ID | | 8 | CodeLlama 7B Instruct | codellama | HuggingFace model ID | | 9 | Mistral 7B Instruct | mistral-7b-instruct | HuggingFace model ID | | 10 | Mixtral 8x7B Instruct | mixtral-8x7b-instruct | HuggingFace model ID | | 11 | Gemma 2B IT | gemma-2b-it | HuggingFace model ID | | 12 | LLaMA 3 8B Instruct | llama-3-8b-instruct | HuggingFace model ID | | 13 | Phi-3 Mini 128K Instruct | Phi-3-mini-128k-instruct | HuggingFace model ID | | 14 | LLaMA 3.1 8B Instruct | llama-3_1-8b-instruct | HuggingFace model ID | | 18 | BGE Large EN v1.5 | bge-large-en-v1_5 | Built-in (via args) | | 19 | BGE Reranker Large | bge-reranker-large | Built-in (via args) | | 20 | Stable Diffusion XL | stable_diffusion_xl | E2E registry | | 21 | YOLOv8 | yolov8 | E2E registry | | 22 | Stable Video Diffusion | stable-video-diffusion-img2vid-xt | E2E registry | | 23 | NVIDIA NV-Embed v1 | nvidia-nv-embed-v1 | E2E registry | | 24 | TensorRT-LLM | tensorrt | Fine-tuned TIR model (model_id) | | 25 | Nemotron 3 8B Chat | nemotron-3-8b-chat-4k-rlhf | E2E registry | | 26 | Nemotron 3 Nano 30B A3B | nemotron-3-nano-30b-a3b | HuggingFace model ID | | 27 | Nemotron Nano 12B v2 VL | nemotron-nano-12b-v2-vl | HuggingFace model ID | | 28 | Nemotron Speech Streaming EN 0.6B | nemotron-speech-streaming-en-0.6b | E2E registry |

To launch on TIR Cluster

Replace "cluster_type", "sku_id", and "sku_item_price_id" with the values for your location:

| Location | sku_id | sku_item_price_id | |----------|----------|---------------------| | Delhi | 162 | 755 | | Chennai | 40 | 1628 |


"cluster_type": "tir-cluster",
"sku_id": 162,
"sku_item_price_id": 755

To launch on Private Cluster

Replace "cluster_type","sku_id" and "sku_item_price_id" with the following data.


"cluster_type": "private-cluster",
"custom_sku": {
  "cpu": 1,
  "gpu": 0,
  "memory": 1
},
"private_cloud_id": < Private Cluster Id>
POST/serving/inference/

Query parameters

  • project_idQueryintegerrequired

    Project ID

  • active_iamQueryintegeroptional

    Active IAM ID (To access contact person account) Find your Active IAM ID here

  • locationQuerystringrequired

    Location

Request body

application/json

object
namestringrequired

Name of the deployment

custom_endpoint_detailsobjectrequired
model_idstring
replicaintegerrequired
committed_replicasinteger
pathstring
frameworkstringrequired
is_auto_scale_enabledbooleanrequired
auto_scale_policyobjectrequired
detailed_infoobjectrequired
model_load_integration_idintegerrequired
dataset_idstring
dataset_pathstring
cluster_typestring
sku_idintegerrequired
sku_item_price_idintegerrequired

Responses

201Model deployment created successfully.
object
codeinteger

HTTP response status code.

example201
dataobject

Deployment creation response data.

errorsobject
messagestring

Success or error message.

exampleCreated Successfully