Skip to main content

Generative AI API

Overview

TIR's Generative AI (GenAI) API provides ready-to-use inference endpoints for interacting with advanced AI models. Supported task types include:

Text-to-Text — Conversational and instruction-following language models • Speech-to-Text — Audio transcription and translation • Embeddings — Vector representations of text for semantic search and similarity • Text-to-Speech — Audio synthesis from text input

info

Billing parameters vary by model. Select a model and open the Usage tab to view detailed pricing.


Key Concepts

ConceptDescription
Pay-per-RequestYou are billed only for the requests you make. There are no upfront costs, reserved capacity requirements, or long-term commitments.
Always AvailableGenAI API endpoints are continuously available. No setup or deployment is required before use.
PlaygroundAn interactive UI environment in TIR for selecting a model and testing prompts before API integration.
Bulk PromptsA feature that processes multiple prompts in parallel via CSV upload. Progress is tracked in Async History.
API TokenA credential required to authenticate all requests to GenAI endpoints. Tokens are generated from the API Token section in TIR's UI.

Playground

The Playground provides an interactive interface for testing GenAI models directly in the TIR UI. To access it, navigate to the GenAI section and select a model card. The Playground displays a chat interface along with inference configuration controls.

Advanced inference parameters are available in the Advanced Parameters panel, located at the bottom-right corner of the Playground view.

Advanced Parameters

ParameterDescription
Top P (Nucleus Sampling)Restricts token sampling to the top cumulative probability mass p. Lower values produce more focused responses; higher values increase variability.
TemperatureControls response randomness. Values closer to 0 produce deterministic outputs; values closer to 1 produce more creative outputs.
Max TokensMaximum number of tokens the model may generate per response. Increase this value to allow longer outputs.
Presence PenaltyDiscourages the model from repeating topics already covered. Higher values encourage introduction of new topics.
Frequency PenaltyReduces repetition of specific words or phrases. Higher values promote lexical variety in the response.

Bulk Prompts

The bulk prompts feature enables parallel processing of multiple prompts via CSV file upload. To access this feature, click the Click here option in the GenAI section. All bulk prompt activity can be monitored in Async History.

CSV File Requirements

RequirementDetails
Column HeaderThe file must contain exactly one column with the header prompts.
Row FormatEach row must contain one prompt.
File FormatCSV only
Maximum File Size500 MB
EncodingUTF-8 (required for special characters)

Steps to upload a bulk prompt file:

  1. In the GenAI section, click Click here to open the bulk prompts interface.
  2. Select an existing dataset or create a new one.
  3. Upload a CSV file that meets the requirements listed above.

Async History

After upload, the system processes prompts asynchronously and tracks progress in the Async History section. Each entry displays its current processing status.

StatusDescription
QueuedThe prompt batch is waiting to be processed.
ProcessingPrompts are currently being processed.
CompletedAll prompts have been processed successfully.
FailedOne or more prompts could not be processed.

Download and Delete actions are available per row and are enabled only when the row status is Completed. • Use Clear All to remove all entries with a Failed or Completed status simultaneously.


Authentication

All GenAI API requests require an API token for authentication.

Generating an API Token

  1. In the TIR UI, open the API Token section using the side navigation bar.
  2. Click Create Token to generate a new token.
  3. Copy the Auth Token and API Key and store them securely.
info

When using the OpenAI-compatible SDK, use the Auth Token value as the OPENAI_API_KEY.


Integration

GenAI models can be accessed through three integration methods. Choose the method that best fits your application and the model type.

MethodDescriptionCompatible Models
REST APIDirect HTTP requests using cURL or any HTTP client.All models
OpenAI SDKStandard OpenAI Python or JavaScript SDK, configured with a TIR-specific base URL.LLM text generation models (e.g., Llama, Mistral, DeepSeek)
TIR SDKPython SDK for the TIR platform, used with specialized model types.Speech-to-text and other non-LLM models (e.g., Whisper)

All integration methods require a valid API token. See Generating an API Token.


Using the REST API

This example demonstrates REST API access using the Llama 4 Scout 17B 16E Instruct model.

Prerequisite: An API token from the Authentication section.

Steps:

  1. In the GenAI section, select the Llama 4 Scout 17B 16E Instruct model card.

  2. Click Get Code, then open the HTTP tab to view the cURL request for the model endpoint.

  3. Copy the cURL command and open it in an HTTP client or API testing tool such as Postman.

  4. Add your Auth Token to the Authorization header using Bearer token format:

    Authorization: Bearer <your-auth-token>
  5. Modify the request payload as required by your use case.

  6. Send the request. The model response is returned in the response body.

info

All GenAI API requests use Bearer token authentication in the Authorization header.


Using the OpenAI SDK

All LLM text generation models on TIR GenAI are OpenAI-compatible. You can integrate them by updating the base URL and API key in your existing OpenAI SDK setup to point to TIR.

This example demonstrates integration using the DeepSeek V3 model.

Prerequisite: An API token from the Authentication section.

Steps:

  1. In the GenAI section, select the DeepSeek V3 model card.

  2. Click Get Code, then open the Python tab to view the sample code.

  3. Copy the sample Python script.

  4. Install the OpenAI package:

    pip install -U openai
  5. In the copied script, replace the api_key value with the Auth Token generated in the Authentication section.

  6. Run the script. The model response is printed to the console.


Using the TIR SDK

The TIR SDK is a Python SDK for the TIR platform. It is designed for models that require non-standard inputs, such as audio files for speech-to-text tasks.

This example demonstrates integration using the Whisper Large V3 speech-to-text model.

Prerequisite: An API token from the Authentication section.

Steps:

  1. In the GenAI section, select the Whisper Large V3 model card.

  2. Click Get Code, then open the API tab to view the sample Python code.

  3. Copy the sample script. In the data dictionary, set the input field to the path of your audio file and adjust any additional parameters as needed.

  4. Export the required environment variables. Replace the placeholder values with your actual token credentials:

    export E2E_TIR_ACCESS_TOKEN=<your-access-token>
    export E2E_TIR_API_KEY=<your-api-key>
  5. Run the script. The transcription response is printed to the console.


Usage and Pricing

Usage statistics and per-model pricing are accessible directly in the TIR UI.

Steps:

  1. In the GenAI section, select any model card (for example, Whisper Large V3).
  2. Open the Usage tab to view consumption details, or click Check Pricing to view the pricing breakdown for the selected model.
info

Billing parameters vary by model. Confirm pricing for a specific model before integration.


How-To Guides

Vector Embeddings Generation