Model Endpoints

TIR provides two methods for deploying containers that serve model API endpoints for AI inference services:

Deploy Using Pre-built Containers (Provided by TIR)

Before launching a service with TIR's pre-built containers, you need to create a TIR Model and upload the necessary model files. These containers are configured to automatically download model files from an EOS (E2E Object Storage) Bucket and start the API server. Once the endpoint is ready, you can make synchronous requests to the endpoint for inference. Deploy using your own container

You can launch an inference service using your own Docker image, either public or private. Once the endpoint is ready, you can make synchronous requests for inference. Optionally, you can attach a TIR model to automate the download of model files from an EOS bucket to the container.

Model Endpoints Plans

Let's Explore the TIR Model Endpoint Plans for various frameworks

Example Usage

data "tir_model_endpoint_plans" "model_endpoint_plans" {
  active_iam = <active_iam : string>
  framework = "VLLM"
}

Schema

Required

active_iam (String)
framework (String)

Read-Only

id (String) : The ID of this resource.
plans (List of Object) (see below for nested schema)

Nested Schema for plans

Read-Only:

committed_days (Number)
cpu (String)
currency (String)
gpu (String)
memory (String)
name (String)
sku_type (String)
unit_price (Number)

Supported Frameworks

Use left side value in strings such as "TRITON"

TRITON = 'triton'
LLAMA = 'llma'
PYTORCH = 'pytorch'
CODELAMA = 'codellama'
STABLE_DIFFUSION = 'stable_diffusion'
STABLE_DIFFUSION_XL = 'stable_diffusion_xl'
MPT = 'mpt'
CUSTOM = 'custom'
MIXTRAL8X7B = 'mixtral-8x7b-instruct'
MIXTRAL7B = 'mistral-7b-instruct'
TENSOR_RT = 'tensorrt'
GEMMA_2B = 'gemma-2b'
GEMMA_2B_IT = 'gemma-2b-it'
GEMMA_7B = 'gemma-7b'
GEMMA_7B_IT = 'gemma-7b-it'
LLAMA_3 = 'llama-3-8b-instruct'
LLAMA_3_1 = 'llama-3_1-8b-instruct'
LLAMA_3_2 = 'llama-3_2-3b-instruct'
LLAMA_3_2_VISION = 'llama-3_2-11b-vision-instruct'
VLLM = 'vllm'
STARCODER = 'starcoder2-7b'
PHI_3_MINI = 'Phi-3-mini-128k-instruct'
NEMO = 'nemo-rag'
STABLE_VIDEO_DIFFUSION = 'stable-video-diffusion-img2vid-xt'
YOLO_V8 = 'yolov8'
NEMOTRON = 'nemotron-3-8b-chat-4k-rlhf'
NV_EMBED = 'nvidia-nv-embed-v1'
BAAI_LARGE = 'bge-large-en-v1_5'
BAAI_RERANKER = 'bge-reranker-large'
PIXTRAL = 'pixtral-12b-2409'
SGLANG = 'sglang'
DYNAMO = 'dynamo'

Model Endpoint Resource

Example Usage

 resource "tir_model_endpoint" "name:string" {

  name                     = "name"
  sku_name                 = "C3.2"
  sku_type                 = "hourly"
  committed_instance_policy = ""
  committed_days           = 0
  model_path               = ""
  framework                = "PYTORCH"
  #model_id                 = tir_model_repository.<name as in state file>.id   // You have to chose either model_id or model_load_integration_id
   model_load_integration_id = tir_integration.name.id
  cluster_type             = "tir-cluster"
  storage_type             = "disk"
  disk_path                = "/mnt/models"
  image_pull_policy        = "Always"
  is_auto_scale_enabled    = false
  replica                  = 1
  committed_replicas      = 0

  auto_scale_policy {

    rules {

    }
  }

  detailed_info {
    # commands          = "[\"first\",\"second\"]"
    # args              = ""
    # hugging_face_id   = "BAAI/Aquila-7B" #this is the supported model for VLLM, SGLANG and DYNAMO
    server_version = "v0.9.0" // this is for TRITON, PYTORCH, NEMO, TENSORRT
    tokenizer         = ""
    world_size        = 1
    error_log         = true
    info_log          = true
    warning_log       = true
    log_verbose_level = 1
    model_serve_type  = "" // this is required for vllm its value is "full-model" / "peft-model"
    engine_args =  {
      # Define engine_args as you needed.
    }

  }

  is_readiness_probe_enabled  = false
  is_liveness_probe_enabled   = false

  readiness_probe {
    # Define readiness as you need.
  }

  liveness_probe {
   # Define liveness probe as you need.
  }

  resource_details {
    disk_size = 100
    mount_path = ""
    env_variables {
      key   = "HF_HdfsfOME"
      value = "ENV_VALUE"
      required = true
      disabled = {
        key   = true
        value = false
      }
    }
    env_variables {
      key   = "HF_HOMsE"
      value = "ENV_VALsUE"
      required = true
      disabled = {
        key   = true
        value = false
      }
    }
  }
  container_type = "public"
  project_id = <project_id:string>
  active_iam = <active_iam:string>
  location = "Delhi"
  currency = "INR"
}

Schema

Required

active_iam (String) : The IAM (Identity and Access Management) role associated with the resource.
cluster_type (String) : The type of cluster the resource is deployed on.
container_type (String) : The type of container used for the resource (e.g., public, private).
currency (String) : The currency used for billing the resource.
framework (String) : The framework used for the model. This could be TensorFlow, PyTorch, etc.
location (String) : The location or region where the resource is deployed.
name (String) : The name of the resource. This is a required field and must be unique within the project.
project_id (String) : The ID of the project where the resource is deployed.
sku_name (String) : The SKU (Stock Keeping Unit) name for the resource. This defines the type of resource being deployed.
sku_type (String) : The SKU type for the resource. This defines the category or classification of the SKU.
storage_type (String) : The type of storage used for the resource.

Optional

auto_scale_policy (Block List) : The policy for auto-scaling the resource. This includes min/max replicas and scaling rules. (see below for nested schema)
committed_days (Number) : The number of days the instance is committed for. This is used for billing and resource allocation.
committed_instance_policy (String) : The policy for committed instances. This defines how committed instances are managed and billed.
committed_replicas (Number) : The number of replicas that are committed for the resource.
custom_sku (Map of Number) : A map of custom SKU configurations for the private cloud .
dataset_id (String) : The ID of the dataset associated with the resource.
dataset_path (String) : The path to the dataset used by the resource.
detailed_info (Block List) : Detailed information about the resource, including commands, args, and logging settings. (see below for nested schema)
disk_path (String) : The path where the disk is mounted. This is used to specify the location for model storage.
image_pull_policy (String) : The policy for pulling container images. Options are 'Always' or 'IfNotPresent'.
is_auto_scale_enabled (Boolean) : Indicates whether auto-scaling is enabled for the resource.
is_liveness_probe_enabled (Boolean) : Enable or disable the liveness probe for the resource.
is_readiness_probe_enabled (Boolean) : Enable or disable the readiness probe for the resource.
liveness_probe (Block List) : Configuration for the liveness probe. (see below for nested schema)
metric_port (Boolean) : Indicates whether a metric port is exposed for the resource.
model_id (String) : The unique identifier for the model. This is used to reference the model in the system.
model_load_integration_id (String) : The integration ID used for loading the model. This is typically used for custom model loading workflows.
model_path (String) : The path to the model file or directory. This is used to specify the location of the model to be deployed.
private_cloud_id (String) : The ID of the private cloud where the resource is deployed.
public_ip (String) : Indicates whether a public IP address is assigned to the resource.
readiness_probe (Block List) : Configuration for the readiness probe. (see below for nested schema)
replica (Number) : The number of replicas to deploy for the resource.
resource_details (Block List) : Additional details about the resource, such as disk size, mount path, and environment variables. (see below for nested schema)
server_options (String) : Specifies the server options for the resource. This is typically used for server types like TRITON, PYTORCH, NEMO, and TENSOR RT.
service_port (Boolean) : Indicates whether a service port is exposed for the resource.
sfs_id (String) : The ID of the shared file storage. This is used to reference the shared storage resource.
sfs_path (String) : The path for shared file storage. This is used for caching and shared resources.
stop_inference (String) : Indicates whether to stop or start inference for the resource. Default is 'start'.

Read Only

container_name (String) : The name of the container associated with the resource. This is computed automatically.
created_at (String) : The timestamp when the resource was created. This is computed automatically.
id (String) : The ID of this resource.
status (String) : The current status of the resource. This is computed automatically.

Nested Schema for auto_scale_policy

Optional:

max_replicas (Number) : The maximum number of replicas to scale up to during auto-scaling.
min_replicas (Number) : The minimum number of replicas to maintain during auto-scaling.
rules (Block List) : The rules for auto-scaling based on metrics and conditions. (see below for nested schema)
stability_period (Number) : The period (in seconds) to wait after scaling before scaling again.

Nested Schema for auto_scale_policy.rules

Optional:

condition_type (String) : The type of condition to apply for scaling.
custom_metric_name (String) : The name of a custom metric to use for scaling.
granularity (Number) : The granularity of the metric data collection.
metric (String) : The metric to monitor for auto-scaling
value (Number) : The threshold value for the metric to trigger scaling.
watch_period (Number) : The period (in seconds) to watch the metric before scaling.
window (Number) : The time window (in seconds) for evaluating the metric.

Nested Schema for detailed_info

Optional:

args (String) : Arguments to pass to the commands when the resource is deployed.
commands (String) : Commands to execute when the resource is deployed.
engine_args (Map of String) : Additional engine-specific arguments for the model.
error_log (Boolean) : Enable or disable error logging.(see below for nested schema)
hugging_face_id (String) T: he Hugging Face model ID associated with the resource.
info_log (Boolean) : Enable or disable info logging.
log_verbose_level (Number) : The verbosity level for logging.
model_serve_type (String) : The type of model serving (e.g., real-time, batch).
server_version (String) : The version of the server being used.
tokenizer (String) : The tokenizer to use for the model.
warning_log (Boolean) : Enable or disable warning logging.
world_size (Number) : The world size for distributed training or inference.

Nested Schema for detailed_info.engine_args

Optional:

block_size
chat_template
data_type
disable_custom_all_reduce
disable_log_requests
disable_log_stats
disable_sliding_window
distributed_executor_backend
enable_auto_tool_choice
enable_chunked_prefill
enable_lora
enable_lora_bias
enable_prefix_caching
enforce_eager
fully_sharded_loras
gpu_memory_utilization
guided_decoding_backend
kv_cache_data_type
load_format
long_lora_scaling_factors
lora_data_type
lora_extra_vocab_size
max_cpu_loras
max_log_len
max_logprobs
max_lora_rank
max_loras
max_model_length
max_num_batched_tokens
max_num_seqs
max_parallel_loading_workers
max_seq_len_to_capture
model_loader_extra_config
ngram_prompt_lookup_max
ngram_prompt_lookup_min
num_gpu_blocks_override
num_lookahead_slots
num_speculative_tokens
preemption_mode
quantization
rope_scaling
rope_theta
scheduler_delay_factor
seed
skip_tokenizer_init
spec_decoding_acceptance_method
speculative_disable_by_batch_size
speculative_draft_tensor_parallel_size
speculative_max_model_len
speculative_model
swap_space
tokenizer
tokenizer_mode
tokenizer_pool_extra_config
tokenizer_pool_size
tokenizer_pool_type
tokenizer_revision
tool_call_parser
typical_acceptance_sampler_posterior_alpha
typical_acceptance_sampler_posterior_threshold

Nested Schema for liveness_probe

Optional:

commands (String) : Commands to execute for the liveness probe.
failure_threshold (Number) : The number of failed probes before the resource is marked as not live.
grpc_service (String) : The gRPC service to check for the liveness probe.
initial_delay_seconds (Number) : The initial delay (in seconds) before the liveness probe starts.
path (String) : The path to check for the liveness probe.
period_seconds (Number) : The period (in seconds) between liveness probe checks.
port (Number) : The port to use for the liveness probe.
protocol (String) : The protocol to use for the liveness probe (e.g., http, tcp).
success_threshold (Number) : The number of successful probes required to mark the resource as live.
timeout_seconds (Number) : The timeout (in seconds) for the liveness probe.

Nested Schema for readiness_probe

Optional:

commands (String) : Commands to execute for the readiness probe.
failure_threshold (Number) : The number of failed probes before the resource is marked as not ready.
grpc_service (String) : The gRPC service to check for the readiness probe.
initial_delay_seconds (Number) : The initial delay (in seconds) before the readiness probe starts.
path (String) : The path to check for the readiness probe.
period_seconds (Number) : The period (in seconds) between readiness probe checks.
port (Number) : The port to use for the readiness probe.
protocol (String) : The protocol to use for the readiness probe (e.g., http, tcp).
success_threshold (Number) : The number of successful probes required to mark the resource as ready.
timeout_seconds (Number) : The timeout (in seconds) for the readiness probe.

Nested Schema for resource_details

Optional:

disk_size (Number) : The size of the disk (in GB) allocated for the resource.
env_variables (Block List) : Environment variables to be set for the resource. (see below for nested schema)
mount_path (String) : The path where the disk is mounted.

Nested Schema for resource_details.env_variables

Optional:

disabled (Map of Boolean) : A map of disabled environment variables.
key (String) : The key for the environment variable.
required (Boolean) : Indicates whether the environment variable is required.
value (String) : The value for the environment variable.

Supported Frameworks

TRITON = 'triton'
PYTORCH = 'pytorch'
CODELAMA = 'codellama'
STABLE_DIFFUSION = 'stable_diffusion'
STABLE_DIFFUSION_XL = 'stable_diffusion_xl'
MPT = 'mpt'
CUSTOM = 'custom'
MIXTRAL8X7B = 'mixtral-8x7b-instruct'
MIXTRAL7B = 'mistral-7b-instruct'
TENSOR_RT = 'tensorrt'
GEMMA_2B = 'gemma-2b'
GEMMA_2B_IT = 'gemma-2b-it'
GEMMA_7B = 'gemma-7b'
GEMMA_7B_IT = 'gemma-7b-it'
LLAMA_3 = 'llama-3-8b-instruct'
LLAMA_3_1 = 'llama-3_1-8b-instruct'
LLAMA_3_2 = 'llama-3_2-3b-instruct'
LLAMA_3_2_VISION = 'llama-3_2-11b-vision-instruct'
VLLM = 'vllm'
STARCODER = 'starcoder2-7b'
PHI_3_MINI = 'Phi-3-mini-128k-instruct'
NEMO = 'nemo-rag'
STABLE_VIDEO_DIFFUSION = 'stable-video-diffusion-img2vid-xt'
YOLO_V8 = 'yolov8'
NEMOTRON = 'nemotron-3-8b-chat-4k-rlhf'
NV_EMBED = 'nvidia-nv-embed-v1'
BAAI_LARGE = 'bge-large-en-v1_5'
BAAI_RERANKER = 'bge-reranker-large'
PIXTRAL = 'pixtral-12b-2409'
SGLANG = 'sglang'
DYNAMO = 'dynamo'

Supported Models for SGLANG

deepseek-ai/DeepSeek-R1
google/gemma-2b
deepseek-ai/DeepSeek-V3
meta-llama/Llama-3.2-1B
microsoft/Phi-3-small-8k-instruct
meta-llama/Llama-3.2-1B-Instruct
custom

Supported Models for VLLM

custom
BAAI/Aquila-7B
BAAI/AquilaChat-7B
Snowflake/snowflake-arctic-base
Snowflake/snowflake-arctic-instruct
baichuan-inc/Baichuan-7B
baichuan-inc/Baichuan2-13B-Chat
bigscience/bloom
bigscience/bloomz
THUDM/chatglm2-6b
THUDM/chatglm3-6b
CohereForAI/c4ai-command-r-v01
databricks/dbrx-base
databricks/dbrx-instruct
Deci/DeciLM-7B
Deci/DeciLM-7B-instruct
tiiuae/falcon-7b
tiiuae/falcon-40b
tiiuae/falcon-rw-7b
google/gemma-2b
google/gemma-7b
gpt2
gpt2-xl
bigcode/starcoder
bigcode/gpt_bigcode-santacoder
WizardLM/WizardCoder-15B-V1.0
EleutherAI/gpt-j-6b
nomic-ai/gpt4all-j
EleutherAI/gpt-neox-20b
EleutherAI/pythia-12b
OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
databricks/dolly-v2-12b
stabilityai/stablelm-tuned-alpha-7b
internlm/internlm-7b
internlm/internlm-chat-7b
internlm/internlm2-7b
internlm/internlm2-chat-7b
core42/jais-13b
core42/jais-13b-chat
core42/jais-30b-v3
core42/jais-30b-chat-v3
openlm-research/open_llama_13b
meta-llama/Llama-2-13b-hf
meta-llama/Llama-2-70b-hf
meta-llama/Meta-Llama-3-8B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-3B
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-Guard-3-1B
meta-llama/Llama-3.2-11B-Vision
meta-llama/Llama-3.2-11B-Vision-Instruct
meta-llama/Llama-3.2-90B-Vision
meta-llama/Llama-3.2-90B-Vision-Instruct
meta-llama/Llama-Guard-3-11B-Vision
meta-llama/Llama-3.3-70B-Instruct
lmsys/vicuna-13b-v1.3
01-ai/Yi-6B
01-ai/Yi-34B
llava-hf/llava-1.5-7b-hf
llava-hf/llava-1.5-13b-hf
openbmb/MiniCPM-2B-sft-bf16
openbmb/MiniCPM-2B-dpo-bf16
mistralai/Mistral-7B-v0.1
mistralai/Mistral-7B-Instruct-v0.1
mistralai/Mixtral-8x7B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
mistral-community/Mixtral-8x22B-v0.1
mosaicml/mpt-7b
mosaicml/mpt-30b
mosaicml/mpt-7b-instruct
mosaicml/mpt-30b-instruct
mosaicml/mpt-7b-chat
mosaicml/mpt-30b-chat

Supported Models for Dynamo

custom: Custom (If model not present in list)
BAAI/Aquila-7B: Aquila-7B
BAAI/AquilaChat-7B: Aquila2-7B-Chat
Snowflake/snowflake-arctic-base: Arctic-Base
Snowflake/snowflake-arctic-instruct: Arctic-Instruct
baichuan-inc/Baichuan-7B: Baichuan-7B
baichuan-inc/Baichuan2-13B-Chat: Baichuan2-13B-Chat
bigscience/bloom: BLOOM
bigscience/bloomz: BLOOMZ
THUDM/chatglm2-6b: ChatGLM2-6B
THUDM/chatglm3-6b: ChatGLM3-6B
CohereForAI/c4ai-command-r-v01: Command-R
databricks/dbrx-base: DBRX-Base
databricks/dbrx-instruct: DBRX-Instruct
Deci/DeciLM-7B: DeciLM-7B
Deci/DeciLM-7B-instruct: DeciLM-7B-Instruct
tiiuae/falcon-7b: Falcon-7B
tiiuae/falcon-40b: Falcon-40B
tiiuae/falcon-rw-7b: Falcon-RW-7B
google/gemma-2b: Gemma-2B
google/gemma-7b: Gemma-7B
gpt2: GPT-2
gpt2-xl: GPT-2-XL
bigcode/starcoder: StarCoder
bigcode/gpt_bigcode-santacoder: SantaCoder
WizardLM/WizardCoder-15B-V1.0: WizardCoder-15B
EleutherAI/gpt-j-6b: GPT-J-6B
nomic-ai/gpt4all-j: GPT-J
EleutherAI/gpt-neox-20b: GPT-NeoX-20B
EleutherAI/pythia-12b: Pythia-12B
OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5: OpenAssistant-12B
databricks/dolly-v2-12b: Dolly-V2-12B
stabilityai/stablelm-tuned-alpha-7b: StableLM-Alpha-7B
internlm/internlm-7b: InternLM-7B
internlm/internlm-chat-7b: InternLM-7B-Chat
internlm/internlm2-7b: InternLM2-7B
internlm/internlm2-chat-7b: InternLM2-7B-Chat
core42/jais-13b: Jais-13B
core42/jais-13b-chat: Jais-13B-Chat
core42/jais-30b-v3: Jais-V3-30B
core42/jais-30b-chat-v3: Jais-V3-30B-Chat
openlm-research/open_llama_13b: LLaMA-13B
meta-llama/Llama-2-13b-hf: Llama-2-13B
meta-llama/Llama-2-70b-hf: Llama-2-70B
meta-llama/Meta-Llama-3-8B-Instruct: Llama-3-8B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct: Llama-3-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct: Llama-3.1-8B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct: Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct: Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8: Llama-3.1-405B-Instruct-FP8
meta-llama/Llama-3.2-1B: Llama-3.2-1B
meta-llama/Llama-3.2-3B: Llama-3.2-3B
meta-llama/Llama-3.2-1B-Instruct: Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct: Llama-3.2-3B-Instruct
meta-llama/Llama-Guard-3-1B: Llama-Guard-3-1B
meta-llama/Llama-3.2-11B-Vision: Llama-3.2-11B-Vision
meta-llama/Llama-3.2-11B-Vision-Instruct: Llama-3.2-11B-Vision-Instruct
meta-llama/Llama-3.2-90B-Vision: Llama-3.2-90B-Vision
meta-llama/Llama-3.2-90B-Vision-Instruct: Llama-3.2-90B-Vision-Instruct
meta-llama/Llama-Guard-3-11B-Vision: Llama-Guard-3-11B-Vision
meta-llama/Llama-3.3-70B-Instruct: Llama-3.3-70B-Instruct
lmsys/vicuna-13b-v1.3: Vicuna-V1-13B
01-ai/Yi-6B: Yi-6B
01-ai/Yi-34B: Yi-34B
llava-hf/llava-1.5-7b-hf: LLaVA-1.5-7B
llava-hf/llava-1.5-13b-hf: LLaVA-1.5-13B

Versions

Versions for TensorrtServerOptions

v24.02
v24.01
v23.12
v23.11
v23.10
v0.10.0
v0.9.0
v0.7.2
custom

Versions for PytorchServerOptions

v0.9.0
v0.8.2
v0.8.1
custom

Versions for TritonServerOptions

v24.02
v24.01
v23.12
v23.11
v23.10
custom

Versions for NemoServerOptions

v0.9.0
custom

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

Model Endpoints Plans​

Example Usage​

Schema​

Required​

Read-Only​

Nested Schema for plans​

Read-Only:​

Supported Frameworks​

Model Endpoint Resource​

Example Usage​

Schema​

Required​

Optional​

Read Only​

Nested Schema for auto_scale_policy​

Nested Schema for auto_scale_policy.rules​

Nested Schema for detailed_info​

Nested Schema for detailed_info.engine_args​

Nested Schema for liveness_probe​

Nested Schema for readiness_probe​

Nested Schema for resource_details​

Nested Schema for resource_details.env_variables​

Supported Frameworks​

Supported Models for SGLANG​

Supported Models for VLLM​

Supported Models for Dynamo​

Versions​

Versions for TensorrtServerOptions​

Versions for PytorchServerOptions​

Versions for TritonServerOptions​

Versions for NemoServerOptions​

Model Endpoints Plans

Example Usage

Schema

Required

Read-Only

Nested Schema for plans

Read-Only:

Supported Frameworks

Model Endpoint Resource

Example Usage

Schema

Required

Optional

Read Only

Nested Schema for auto_scale_policy

Nested Schema for auto_scale_policy.rules

Nested Schema for detailed_info

Nested Schema for detailed_info.engine_args

Nested Schema for liveness_probe

Nested Schema for readiness_probe

Nested Schema for resource_details

Nested Schema for resource_details.env_variables

Supported Frameworks

Supported Models for SGLANG

Supported Models for VLLM

Supported Models for Dynamo

Versions

Versions for TensorrtServerOptions

Versions for PytorchServerOptions

Versions for TritonServerOptions

Versions for NemoServerOptions