Generative AI API
Playground
Interactively test any GenAI model from the TIR UI with configurable parameters.
→Bulk Prompts
Upload a CSV file to process multiple prompts in parallel. Track progress in Async History.
→Authentication
Generate API tokens from the TIR UI to authenticate all GenAI API requests.
→REST API
Access any GenAI model via direct HTTP requests using cURL or any HTTP client.
→OpenAI SDK
Use the standard OpenAI Python or JS SDK with TIR LLMs by updating the base URL and API key.
→TIR SDK
Use the TIR Python SDK for specialized models such as Whisper speech-to-text.
→Usage & Pricing
View per-model consumption statistics and pricing from the model card in TIR.
→Overview
TIR's Generative AI (GenAI) API provides ready-to-use inference endpoints for interacting with advanced AI models. Supported task types include:
• Text-to-Text — Conversational and instruction-following language models • Speech-to-Text — Audio transcription and translation • Embeddings — Vector representations of text for semantic search and similarity • Text-to-Speech — Audio synthesis from text input
Billing parameters vary by model. Select a model and open the Usage tab to view detailed pricing.
Key Concepts
| Concept | Description |
|---|---|
| Pay-per-Request | You are billed only for the requests you make. There are no upfront costs, reserved capacity requirements, or long-term commitments. |
| Always Available | GenAI API endpoints are continuously available. No setup or deployment is required before use. |
| Playground | An interactive UI environment in TIR for selecting a model and testing prompts before API integration. |
| Bulk Prompts | A feature that processes multiple prompts in parallel via CSV upload. Progress is tracked in Async History. |
| API Token | A credential required to authenticate all requests to GenAI endpoints. Tokens are generated from the API Token section in TIR's UI. |
Playground
The Playground provides an interactive interface for testing GenAI models directly in the TIR UI. To access it, navigate to the GenAI section and select a model card. The Playground displays a chat interface along with inference configuration controls.
Advanced inference parameters are available in the Advanced Parameters panel, located at the bottom-right corner of the Playground view.
Advanced Parameters
| Parameter | Description |
|---|---|
| Top P (Nucleus Sampling) | Restricts token sampling to the top cumulative probability mass p. Lower values produce more focused responses; higher values increase variability. |
| Temperature | Controls response randomness. Values closer to 0 produce deterministic outputs; values closer to 1 produce more creative outputs. |
| Max Tokens | Maximum number of tokens the model may generate per response. Increase this value to allow longer outputs. |
| Presence Penalty | Discourages the model from repeating topics already covered. Higher values encourage introduction of new topics. |
| Frequency Penalty | Reduces repetition of specific words or phrases. Higher values promote lexical variety in the response. |
Bulk Prompts
The bulk prompts feature enables parallel processing of multiple prompts via CSV file upload. To access this feature, click the Click here option in the GenAI section. All bulk prompt activity can be monitored in Async History.
CSV File Requirements
| Requirement | Details |
|---|---|
| Column Header | The file must contain exactly one column with the header prompts. |
| Row Format | Each row must contain one prompt. |
| File Format | CSV only |
| Maximum File Size | 500 MB |
| Encoding | UTF-8 (required for special characters) |
Steps to upload a bulk prompt file:
- In the GenAI section, click Click here to open the bulk prompts interface.
- Select an existing dataset or create a new one.
- Upload a CSV file that meets the requirements listed above.
Async History
After upload, the system processes prompts asynchronously and tracks progress in the Async History section. Each entry displays its current processing status.
| Status | Description |
|---|---|
| Queued | The prompt batch is waiting to be processed. |
| Processing | Prompts are currently being processed. |
| Completed | All prompts have been processed successfully. |
| Failed | One or more prompts could not be processed. |
• Download and Delete actions are available per row and are enabled only when the row status is Completed. • Use Clear All to remove all entries with a Failed or Completed status simultaneously.
Authentication
All GenAI API requests require an API token for authentication.
Generating an API Token
- In the TIR UI, open the API Token section using the side navigation bar.
- Click Create Token to generate a new token.
- Copy the Auth Token and API Key and store them securely.
When using the OpenAI-compatible SDK, use the Auth Token value as the OPENAI_API_KEY.
Integration
GenAI models can be accessed through three integration methods. Choose the method that best fits your application and the model type.
| Method | Description | Compatible Models |
|---|---|---|
| REST API | Direct HTTP requests using cURL or any HTTP client. | All models |
| OpenAI SDK | Standard OpenAI Python or JavaScript SDK, configured with a TIR-specific base URL. | LLM text generation models (e.g., Llama, Mistral, DeepSeek) |
| TIR SDK | Python SDK for the TIR platform, used with specialized model types. | Speech-to-text and other non-LLM models (e.g., Whisper) |
All integration methods require a valid API token. See Generating an API Token.
Using the REST API
This example demonstrates REST API access using the Llama 4 Scout 17B 16E Instruct model.
Prerequisite: An API token from the Authentication section.
Steps:
-
In the GenAI section, select the Llama 4 Scout 17B 16E Instruct model card.
-
Click Get Code, then open the HTTP tab to view the cURL request for the model endpoint.
-
Copy the cURL command and open it in an HTTP client or API testing tool such as Postman.
-
Add your Auth Token to the
Authorizationheader using Bearer token format:Authorization: Bearer <your-auth-token> -
Modify the request payload as required by your use case.
-
Send the request. The model response is returned in the response body.
All GenAI API requests use Bearer token authentication in the Authorization header.
Using the OpenAI SDK
All LLM text generation models on TIR GenAI are OpenAI-compatible. You can integrate them by updating the base URL and API key in your existing OpenAI SDK setup to point to TIR.
This example demonstrates integration using the DeepSeek V3 model.
Prerequisite: An API token from the Authentication section.
Steps:
-
In the GenAI section, select the DeepSeek V3 model card.
-
Click Get Code, then open the Python tab to view the sample code.
-
Copy the sample Python script.
-
Install the OpenAI package:
pip install -U openai -
In the copied script, replace the
api_keyvalue with the Auth Token generated in the Authentication section. -
Run the script. The model response is printed to the console.
Using the TIR SDK
The TIR SDK is a Python SDK for the TIR platform. It is designed for models that require non-standard inputs, such as audio files for speech-to-text tasks.
This example demonstrates integration using the Whisper Large V3 speech-to-text model.
Prerequisite: An API token from the Authentication section.
Steps:
-
In the GenAI section, select the Whisper Large V3 model card.
-
Click Get Code, then open the API tab to view the sample Python code.
-
Copy the sample script. In the
datadictionary, set theinputfield to the path of your audio file and adjust any additional parameters as needed. -
Export the required environment variables. Replace the placeholder values with your actual token credentials:
export E2E_TIR_ACCESS_TOKEN=<your-access-token>
export E2E_TIR_API_KEY=<your-api-key> -
Run the script. The transcription response is printed to the console.
Usage and Pricing
Usage statistics and per-model pricing are accessible directly in the TIR UI.
Steps:
- In the GenAI section, select any model card (for example, Whisper Large V3).
- Open the Usage tab to view consumption details, or click Check Pricing to view the pricing breakdown for the selected model.
Billing parameters vary by model. Confirm pricing for a specific model before integration.