Create Endpoints
Create a new model inference endpoint. The following models and frameworks are supported — select the corresponding example from the request body:
| # | Model / Framework | framework value | Model source |
|---|---|---|---|
| 1 | vLLM | vllm | HuggingFace model ID |
| 2 | SGLang | sglang | HuggingFace model ID |
| 3 | Nvidia Dynamo | dynamo | HuggingFace model ID |
| 4 | PyTorch / TorchServe | pytorch | Fine-tuned TIR model (model_id) |
| 5 | Triton Inference Server | triton | Fine-tuned TIR model (model_id) |
| 6 | Custom Container | custom | Any Docker image |
| 7 | LLaMA 2 Chat 7B | llma | HuggingFace model ID |
| 8 | CodeLlama 7B Instruct | codellama | HuggingFace model ID |
| 9 | Mistral 7B Instruct | mistral-7b-instruct | HuggingFace model ID |
| 10 | Mixtral 8x7B Instruct | mixtral-8x7b-instruct | HuggingFace model ID |
| 11 | Gemma 2B IT | gemma-2b-it | HuggingFace model ID |
| 12 | LLaMA 3 8B Instruct | llama-3-8b-instruct | HuggingFace model ID |
| 13 | Phi-3 Mini 128K Instruct | Phi-3-mini-128k-instruct | HuggingFace model ID |
| 14 | LLaMA 3.1 8B Instruct | llama-3_1-8b-instruct | HuggingFace model ID |
| 18 | BGE Large EN v1.5 | bge-large-en-v1_5 | Built-in (via args) |
| 19 | BGE Reranker Large | bge-reranker-large | Built-in (via args) |
| 20 | Stable Diffusion XL | stable_diffusion_xl | E2E registry |
| 21 | YOLOv8 | yolov8 | E2E registry |
| 22 | Stable Video Diffusion | stable-video-diffusion-img2vid-xt | E2E registry |
| 23 | NVIDIA NV-Embed v1 | nvidia-nv-embed-v1 | E2E registry |
| 24 | TensorRT-LLM | tensorrt | Fine-tuned TIR model (model_id) |
| 25 | Nemotron 3 8B Chat | nemotron-3-8b-chat-4k-rlhf | E2E registry |
| 26 | Nemotron 3 Nano 30B A3B | nemotron-3-nano-30b-a3b | HuggingFace model ID |
| 27 | Nemotron Nano 12B v2 VL | nemotron-nano-12b-v2-vl | HuggingFace model ID |
| 28 | Nemotron Speech Streaming EN 0.6B | nemotron-speech-streaming-en-0.6b | E2E registry |
To launch on TIR Cluster
Replace "cluster_type", "sku_id", and "sku_item_price_id" with the values for your location:
| Location | sku_id | sku_item_price_id |
|----------|----------|---------------------|
| Delhi | 162 | 755 |
| Chennai | 40 | 1628 |
"cluster_type": "tir-cluster",
"sku_id": 162,
"sku_item_price_id": 755
To launch on Private Cluster
Replace "cluster_type","sku_id" and "sku_item_price_id" with the following data.
"cluster_type": "private-cluster",
"custom_sku": {
"cpu": 1,
"gpu": 0,
"memory": 1
},
"private_cloud_id": < Private Cluster Id>
/serving/inference/Query parameters
project_idQueryintegerrequiredProject ID
active_iamQueryintegeroptionalActive IAM ID (To access contact person account) Find your Active IAM ID here
locationQuerystringrequiredLocation
Request body
application/json
Name of the deployment
Responses
201Model deployment created successfully.
HTTP response status code.
201Deployment creation response data.
Success or error message.
Created Successfully