Update/Actions (Start/Stop/Restart/Autoscale/Async)

Use PUT to update model endpoint configuration or perform actions.

| action value | Purpose | |--------------------------------|----------------------------------------------| | patch | Update endpoint configuration | | restart | Restart the running endpoint | | stop | Stop the endpoint | | start | Start a stopped endpoint | | update_auto_scale | Create or update the autoscaling policy | | update_async_configuration | Enable or disable async invocation |

Autoscaling metrics

The metric field in each rule supports:

concurrency — scales based on the number of concurrent in-flight requests.
requestRate — scales based on incoming requests per second.
custom — scales based on a framework-specific metric; requires custom_metric_name.

Async Invocation

Use action: "update_async_configuration" to enable or disable async invocation.

| async_enabled | Effect | |---|---| | true | Enables async invocation with the specified worker count and dataset destination | | false | Disables async invocation |

When enabling, provide:

async_concurrent_workers — number of parallel async workers
async_dataset_id — ID of the dataset to store async results
async_destination_type — destination type (e.g. "dataset")
async_target_routes — list of route mappings (route_name → route_value)

PUT/serving/inference/{inference_id}/

Path parameters

inference_idPathintegerrequired

Query parameters

project_idQueryintegerrequired
Project ID
active_iamQueryintegeroptional
Active IAM ID (To access contact person account) Find your Active IAM ID here
locationQuerystringrequired
Location

Request body

application/json

object

namestring

The name of the model configuration.

pathstring

The path for the model configuration. Can be empty.

custom_endpoint_detailsobject

model_idstring

The model ID (null if not specified).

committed_replicasinteger

Number of committed replicas.

replicainteger

Number of replicas to run.

frameworkstring

The framework used (e.g., vllm).

is_auto_scale_enabledboolean

Whether auto-scaling is enabled.

auto_scale_policyobject

detailed_infoobject

model_load_integration_idinteger

Integration ID for model load.

dataset_idstring

Dataset ID associated with the model.

dataset_pathstring

Path to the dataset if available.

cluster_typestring

The type of cluster for deployment.

storage_typestring

The storage type used (e.g., disk).

sfs_pathstring

Path for the shared file system.

disk_pathstring

Path to the disk storage.

sku_idinteger

SKU ID for the model.

sku_item_price_idinteger

Price ID for the model.

actionstring

The action to be performed (e.g., patch, restart, stop, start, update_auto_scale, update_async_configuration).

async_enabledboolean

Set true to enable async invocation, false to disable. Used with action: update_async_configuration.

async_concurrent_workersinteger

Number of parallel async workers. Required when enabling async invocation.

async_dataset_idinteger

ID of the dataset to store async results. Required when enabling async invocation.

async_destination_typestring

Destination type for async results (e.g. "dataset").

async_target_routesarray

List of route mappings for async inference.

Responses

200Action performed successfully on the model endpoint.

object

codeinteger

HTTP status code.

example200

datastring

Message describing the result of the action performed.

exampleModel Endpoint restarted successfully

errorsobject

messagestring

exampleSuccess

Request

curl -X PUT 'https://api.e2enetworks.com/myaccount/api/v1/gpu/serving/inference/{inference_id}/?project_id=%3Cproject_id%3E&active_iam=%3Cactive_iam%3E&location=%3Clocation%3E&apikey=%3Capikey%3E' \
  -H 'Authorization: Bearer <JWT>' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "path": "string",
  "custom_endpoint_details": {
    "service_port": false,
    "metric_port": false,
    "container": {
      "advance_config": {
        "liveness_probe": {
          "commands": "string"
        },
        "readiness_probe": {
          "path": "string",
          "port": "string",
          "protocol": "string",
          "period_seconds": 0,
          "timeout_seconds": 0,
          "failure_threshold": 0,
          "success_threshold": 0,
          "initial_delay_seconds": 0,
          "commands": "string"
        },
        "image_pull_policy": "string",
        "is_liveness_probe_enabled": false,
        "is_readiness_probe_enabled": false
      },
      "container_name": "string",
      "container_type": "string",
      "private_image_details": {}
    },
    "public_ip": "string",
    "resource_details": {
      "disk_size": 0,
      "mount_path": "string",
      "env_variables": [
        "string"
      ]
    }
  },
  "model_id": "string",
  "committed_replicas": 0,
  "replica": 0,
  "framework": "string",
  "is_auto_scale_enabled": false,
  "auto_scale_policy": {
    "min_replicas": 0,
    "max_replicas": 0,
    "rules": [
      {
        "value": 0,
        "metric": "string",
        "window": 0,
        "granularity": 0,
        "watch_period": 0,
        "condition_type": "string",
        "custom_metric_name": "string"
      }
    ],
    "stability_period": "string",
    "initial_cooldown_period": "string",
    "is_hold_queue": false
  },
  "detailed_info": {
    "args": "string",
    "commands": "string",
    "info_log": false,
    "error_log": false,
    "tokenizer": "string",
    "world_size": 0,
    "engine_args": {},
    "warning_log": false,
    "server_version": "string",
    "hugging_face_id": "string",
    "model_serve_type": "string",
    "log_verbose_level": 0
  },
  "model_load_integration_id": 0,
  "dataset_id": "string",
  "dataset_path": "string",
  "cluster_type": "string",
  "storage_type": "string",
  "sfs_path": "string",
  "disk_path": "string",
  "sku_id": 0,
  "sku_item_price_id": 0,
  "action": "string",
  "async_enabled": false,
  "async_concurrent_workers": 0,
  "async_dataset_id": 0,
  "async_destination_type": "string",
  "async_target_routes": [
    {
      "route_name": "string",
      "route_value": "string"
    }
  ]
}'

Response · 200

{
  "code": 200,
  "data": "Model Endpoint restarted successfully",
  "errors": {},
  "message": "Success"
}