Create Endpoints

Create a new model inference endpoint. The following models and frameworks are supported — select the corresponding example from the request body:

| # | Model / Framework | framework value | Model source | |---|---|---|---| | 1 | vLLM | vllm | HuggingFace model ID | | 2 | SGLang | sglang | HuggingFace model ID | | 3 | Nvidia Dynamo | dynamo | HuggingFace model ID | | 4 | PyTorch / TorchServe | pytorch | Fine-tuned TIR model (model_id) | | 5 | Triton Inference Server | triton | Fine-tuned TIR model (model_id) | | 6 | Custom Container | custom | Any Docker image | | 7 | LLaMA 2 Chat 7B | llma | HuggingFace model ID | | 8 | CodeLlama 7B Instruct | codellama | HuggingFace model ID | | 9 | Mistral 7B Instruct | mistral-7b-instruct | HuggingFace model ID | | 10 | Mixtral 8x7B Instruct | mixtral-8x7b-instruct | HuggingFace model ID | | 11 | Gemma 2B IT | gemma-2b-it | HuggingFace model ID | | 12 | LLaMA 3 8B Instruct | llama-3-8b-instruct | HuggingFace model ID | | 13 | Phi-3 Mini 128K Instruct | Phi-3-mini-128k-instruct | HuggingFace model ID | | 14 | LLaMA 3.1 8B Instruct | llama-3_1-8b-instruct | HuggingFace model ID | | 18 | BGE Large EN v1.5 | bge-large-en-v1_5 | Built-in (via args) | | 19 | BGE Reranker Large | bge-reranker-large | Built-in (via args) | | 20 | Stable Diffusion XL | stable_diffusion_xl | E2E registry | | 21 | YOLOv8 | yolov8 | E2E registry | | 22 | Stable Video Diffusion | stable-video-diffusion-img2vid-xt | E2E registry | | 23 | NVIDIA NV-Embed v1 | nvidia-nv-embed-v1 | E2E registry | | 24 | TensorRT-LLM | tensorrt | Fine-tuned TIR model (model_id) | | 25 | Nemotron 3 8B Chat | nemotron-3-8b-chat-4k-rlhf | E2E registry | | 26 | Nemotron 3 Nano 30B A3B | nemotron-3-nano-30b-a3b | HuggingFace model ID | | 27 | Nemotron Nano 12B v2 VL | nemotron-nano-12b-v2-vl | HuggingFace model ID | | 28 | Nemotron Speech Streaming EN 0.6B | nemotron-speech-streaming-en-0.6b | E2E registry |

To launch on TIR Cluster

Replace "cluster_type", "sku_id", and "sku_item_price_id" with the values for your location:

| Location | sku_id | sku_item_price_id | |----------|----------|---------------------| | Delhi | 162 | 755 | | Chennai | 40 | 1628 |


"cluster_type": "tir-cluster",
"sku_id": 162,
"sku_item_price_id": 755

To launch on Private Cluster

Replace "cluster_type","sku_id" and "sku_item_price_id" with the following data.


"cluster_type": "private-cluster",
"custom_sku": {
  "cpu": 1,
  "gpu": 0,
  "memory": 1
},
"private_cloud_id": < Private Cluster Id>

POST/serving/inference/

Query parameters

project_idQueryintegerrequired
Project ID
active_iamQueryintegeroptional
Active IAM ID (To access contact person account) Find your Active IAM ID here
locationQuerystringrequired
Location

Request body

application/json

object

namestringrequired

Name of the deployment

custom_endpoint_detailsobjectrequired

model_idstring

replicaintegerrequired

committed_replicasinteger

pathstring

frameworkstringrequired

is_auto_scale_enabledbooleanrequired

auto_scale_policyobjectrequired

detailed_infoobjectrequired

model_load_integration_idintegerrequired

dataset_idstring

dataset_pathstring

cluster_typestring

sku_idintegerrequired

sku_item_price_idintegerrequired

Responses

201Model deployment created successfully.

object

codeinteger

HTTP response status code.

example201

dataobject

Deployment creation response data.

errorsobject

messagestring

Success or error message.

exampleCreated Successfully

Request

curl -X POST 'https://api.e2enetworks.com/myaccount/api/v1/gpu/serving/inference/?project_id=%3Cproject_id%3E&active_iam=%3Cactive_iam%3E&location=%3Clocation%3E&apikey=%3Capikey%3E' \
  -H 'Authorization: Bearer <JWT>' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "custom_endpoint_details": {
    "service_port": false,
    "metric_port": false,
    "container": {
      "container_name": "string",
      "container_type": "string",
      "private_image_details": {},
      "advance_config": {
        "image_pull_policy": "string",
        "is_readiness_probe_enabled": false,
        "is_liveness_probe_enabled": false,
        "readiness_probe": {
          "protocol": "string",
          "initial_delay_seconds": 0,
          "success_threshold": 0,
          "failure_threshold": 0,
          "port": 0,
          "period_seconds": 0,
          "timeout_seconds": 0,
          "path": "string",
          "grpc_service": "string",
          "commands": "string"
        },
        "liveness_probe": {
          "protocol": "string",
          "initial_delay_seconds": 0,
          "success_threshold": 0,
          "failure_threshold": 0,
          "port": 0,
          "period_seconds": 0,
          "timeout_seconds": 0,
          "path": "string",
          "grpc_service": "string",
          "commands": "string"
        }
      }
    },
    "resource_details": {
      "disk_size": 0,
      "mount_path": "string",
      "env_variables": [
        "string"
      ]
    },
    "public_ip": "string"
  },
  "model_id": "string",
  "replica": 0,
  "committed_replicas": 0,
  "path": "string",
  "framework": "string",
  "is_auto_scale_enabled": false,
  "auto_scale_policy": {
    "min_replicas": 0,
    "max_replicas": 0,
    "rules": [
      {
        "metric": "string",
        "condition_type": "string",
        "value": 0,
        "watch_period": 0,
        "granularity": 0,
        "window": 0
      }
    ],
    "stability_period": 0
  },
  "detailed_info": {
    "commands": "string",
    "args": "string",
    "hugging_face_id": "string",
    "tokenizer": "string",
    "server_version": "string",
    "world_size": 0,
    "error_log": false,
    "info_log": false,
    "warning_log": false,
    "log_verbose_level": 0,
    "model_serve_type": "string"
  },
  "model_load_integration_id": 0,
  "dataset_id": "string",
  "dataset_path": "string",
  "cluster_type": "string",
  "sku_id": 0,
  "sku_item_price_id": 0
}'

Response · 201

{
  "code": 201,
  "data": {
    "id": 3036,
    "model": null,
    "model_type": null,
    "sku_details": {
      "specs": {
        "sku_id": 162,
        "name": "GDC.1xH200-30.375GB_SXM",
        "series": "GPU",
        "cpu": 30,
        "gpu": 1,
        "memory": 375,
        "is_free": false,
        "is_active": true,
        "gpu_switch_type": null,
        "local_storage_in_gb": 0
      },
      "plan": {
        "sku_item_price_id": 755,
        "sku_type": "hourly",
        "committed_days": 0,
        "unit_price": 470.4,
        "currency": "INR",
        "is_active": true,
        "description": "Hourly Compute Instance: inference_service"
      }
    },
    "sku_item_price_id": 755,
    "name": "tir-endpoint-1115165asd83333",
    "storage_url": null,
    "status": "waiting",
    "created_by": {
      "id": 3573,
      "name": "Nipun",
      "email": "nipun.arora@e2enetworks.com",
      "username": "nipun.arora@e2enetworks.com"
    },
    "created_at": "2024-11-15T11:32:35.527296Z",
    "updated_at": "2024-11-15T11:32:35.527371Z",
    "detailed_info": {
      "commands": [],
      "args": [],
      "hugging_face_id": "",
      "tokenizer": "",
      "server_version": "",
      "world_size": 1,
      "error_log": true,
      "info_log": true,
      "warning_log": true,
      "log_verbose_level": 1,
      "model_serve_type": ""
    },
    "framework": "stable_diffusion_xl",
    "custom_endpoint_details": {
      "public_ip": "no",
      "container": {
        "container_name": "registry.e2enetworks.net/aimle2e/stable-diffusion-xl-base-1.0:hf",
        "container_type": "public",
        "advance_config": {
          "image_pull_policy": "Always",
          "is_readiness_probe_enabled": false,
          "is_liveness_probe_enabled": false,
          "readiness_probe": {
            "initial_delay_seconds": 10,
            "period_seconds": 10,
            "timeout_seconds": 10,
            "success_threshold": 1,
            "failure_threshold": 3,
            "protocol": "http",
            "port": "8080",
            "path": "/v2/health/ready"
          },
          "liveness_probe": {}
        }
      },
      "resource_details": {
        "disk_size": 100,
        "mount_path": "",
        "env_variables": []
      }
    },
    "replica": 1,
    "auto_scale_policy": {
      "min_replicas": 1,
      "max_replicas": 1,
      "rules": [
        {
          "metric": "",
          "condition_type": "limit",
          "value": 100,
          "watch_period": 60,
          "granularity": 1,
          "window": 1,
          "custom_metric_name": ""
        }
      ],
      "stability_period": "300"
    },
    "is_auto_scale_enabled": false,
    "desired_replica": 1,
    "model_load_integration_id": 428,
    "dataset": null,
    "dataset_storage_url": null,
    "committed_replicas": 0,
    "private_cloud_id": null,
    "custom_sku": null
  },
  "errors": {},
  "message": "Created Successfully"
}