Chat Completions
Generate chat completions using a deployed model endpoint. Supports any model deployed via TIR Inference Service, following the OpenAI-compatible chat completions API format.
/project/p-{Project_Id}/endpoint/is-{Endpoint_Id}/v1/chat/completionsPath parameters
Endpoint_IdPathstringrequired
Query parameters
project_idQueryintegerrequiredProject ID
Request body
application/json
The model identifier to use for completion.
your-model-nameA list of messages comprising the conversation so far.
Maximum number of tokens to generate in the response.
512Controls randomness. Lower values make output more deterministic, higher values more creative.
0.7Nucleus sampling probability mass. Only tokens with cumulative probability up to top_p are considered.
1Limits the number of highest-probability tokens considered at each step.
50Penalizes tokens based on how frequently they have appeared in the text so far.
0If true, responses are streamed as server-sent events.
falseResponses
200Successful chat completion response.
Unique identifier for the completion.
chatcmpl-ab36a36c-2315-472f-9e58-0f68f19b8bf8Object type, always chat.completion.
chat.completionUnix timestamp of when the completion was created.
1776750085The model used for the completion.
meta-llama/Llama-3.2-1B-InstructList of generated completion choices.
nullnullnullnullnull