Skip to main content
Effective Date — May 7, 2026
As part of the IAM hierarchy simplification, a new format of the TIR APIs is now available. All endpoints that previously included /teams/:team_id segment will now follow a service-centric structure, with project_id now passed as a query parameter instead of a path segment. You can read more about this change

Update/Actions (Start/Stop/Restart/Autoscale/Async)

Use PUT to update model endpoint configuration or perform actions.

| action value | Purpose | |--------------------------------|----------------------------------------------| | patch | Update endpoint configuration | | restart | Restart the running endpoint | | stop | Stop the endpoint | | start | Start a stopped endpoint | | update_auto_scale | Create or update the autoscaling policy | | update_async_configuration | Enable or disable async invocation |

Autoscaling metrics

The metric field in each rule supports:

  • concurrency — scales based on the number of concurrent in-flight requests.
  • requestRate — scales based on incoming requests per second.
  • custom — scales based on a framework-specific metric; requires custom_metric_name.

Async Invocation

Use action: "update_async_configuration" to enable or disable async invocation.

| async_enabled | Effect | |---|---| | true | Enables async invocation with the specified worker count and dataset destination | | false | Disables async invocation |

When enabling, provide:

  • async_concurrent_workers — number of parallel async workers
  • async_dataset_id — ID of the dataset to store async results
  • async_destination_type — destination type (e.g. "dataset")
  • async_target_routes — list of route mappings (route_nameroute_value)
PUT/serving/inference/{inference_id}/

Path parameters

  • inference_idPathintegerrequired

Query parameters

  • project_idQueryintegerrequired

    Project ID

  • active_iamQueryintegeroptional

    Active IAM ID (To access contact person account) Find your Active IAM ID here

  • locationQuerystringrequired

    Location

Request body

application/json

object
namestring

The name of the model configuration.

pathstring

The path for the model configuration. Can be empty.

custom_endpoint_detailsobject
model_idstring

The model ID (null if not specified).

committed_replicasinteger

Number of committed replicas.

replicainteger

Number of replicas to run.

frameworkstring

The framework used (e.g., vllm).

is_auto_scale_enabledboolean

Whether auto-scaling is enabled.

auto_scale_policyobject
detailed_infoobject
model_load_integration_idinteger

Integration ID for model load.

dataset_idstring

Dataset ID associated with the model.

dataset_pathstring

Path to the dataset if available.

cluster_typestring

The type of cluster for deployment.

storage_typestring

The storage type used (e.g., disk).

sfs_pathstring

Path for the shared file system.

disk_pathstring

Path to the disk storage.

sku_idinteger

SKU ID for the model.

sku_item_price_idinteger

Price ID for the model.

actionstring

The action to be performed (e.g., patch, restart, stop, start, update_auto_scale, update_async_configuration).

async_enabledboolean

Set true to enable async invocation, false to disable. Used with action: update_async_configuration.

async_concurrent_workersinteger

Number of parallel async workers. Required when enabling async invocation.

async_dataset_idinteger

ID of the dataset to store async results. Required when enabling async invocation.

async_destination_typestring

Destination type for async results (e.g. "dataset").

async_target_routesarray

List of route mappings for async inference.

Responses

200Action performed successfully on the model endpoint.
object
codeinteger

HTTP status code.

example200
datastring

Message describing the result of the action performed.

exampleModel Endpoint restarted successfully
errorsobject
messagestring
exampleSuccess