Update/Actions (Start/Stop/Restart/Autoscale/Async)
Use PUT to update model endpoint configuration or perform actions.
| action value | Purpose |
|--------------------------------|----------------------------------------------|
| patch | Update endpoint configuration |
| restart | Restart the running endpoint |
| stop | Stop the endpoint |
| start | Start a stopped endpoint |
| update_auto_scale | Create or update the autoscaling policy |
| update_async_configuration | Enable or disable async invocation |
Autoscaling metrics
The metric field in each rule supports:
concurrency— scales based on the number of concurrent in-flight requests.requestRate— scales based on incoming requests per second.custom— scales based on a framework-specific metric; requirescustom_metric_name.
Async Invocation
Use action: "update_async_configuration" to enable or disable async invocation.
| async_enabled | Effect |
|---|---|
| true | Enables async invocation with the specified worker count and dataset destination |
| false | Disables async invocation |
When enabling, provide:
async_concurrent_workers— number of parallel async workersasync_dataset_id— ID of the dataset to store async resultsasync_destination_type— destination type (e.g."dataset")async_target_routes— list of route mappings (route_name→route_value)
/serving/inference/{inference_id}/Path parameters
inference_idPathintegerrequired
Query parameters
project_idQueryintegerrequiredProject ID
active_iamQueryintegeroptionalActive IAM ID (To access contact person account) Find your Active IAM ID here
locationQuerystringrequiredLocation
Request body
application/json
The name of the model configuration.
The path for the model configuration. Can be empty.
The model ID (null if not specified).
Number of committed replicas.
Number of replicas to run.
The framework used (e.g., vllm).
Whether auto-scaling is enabled.
Integration ID for model load.
Dataset ID associated with the model.
Path to the dataset if available.
The type of cluster for deployment.
The storage type used (e.g., disk).
Path for the shared file system.
Path to the disk storage.
SKU ID for the model.
Price ID for the model.
The action to be performed (e.g., patch, restart, stop, start, update_auto_scale, update_async_configuration).
Set true to enable async invocation, false to disable. Used with action: update_async_configuration.
Number of parallel async workers. Required when enabling async invocation.
ID of the dataset to store async results. Required when enabling async invocation.
Destination type for async results (e.g. "dataset").
List of route mappings for async inference.
Responses
200Action performed successfully on the model endpoint.
HTTP status code.
200Message describing the result of the action performed.
Model Endpoint restarted successfullySuccess