---
title: Generative AI API
---

import { GenAINav } from './GenAINavCards';

# Generative AI API

<GenAINav />

## Overview

TIR's Generative AI (GenAI) API provides ready-to-use inference endpoints for interacting with advanced AI models. Supported task types include:

• **Text-to-Text** — Conversational and instruction-following language models
• **Speech-to-Text** — Audio transcription and translation
• **Embeddings** — Vector representations of text for semantic search and similarity
• **Text-to-Speech** — Audio synthesis from text input

:::info
Billing parameters vary by model. Select a model and open the **Usage** tab to view detailed pricing.
:::

---

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Pay-per-Request** | You are billed only for the requests you make. There are no upfront costs, reserved capacity requirements, or long-term commitments. |
| **Always Available** | GenAI API endpoints are continuously available. No setup or deployment is required before use. |
| **Playground** | An interactive UI environment in TIR for selecting a model and testing prompts before API integration. |
| **Bulk Prompts** | A feature that processes multiple prompts in parallel via CSV upload. Progress is tracked in **Async History**. |
| **API Token** | A credential required to authenticate all requests to GenAI endpoints. Tokens are generated from the **API Token** section in TIR's UI. |

---

## Playground

The Playground provides an interactive interface for testing GenAI models directly in the TIR UI. To access it, navigate to the GenAI section and select a model card. The Playground displays a chat interface along with inference configuration controls.

Advanced inference parameters are available in the **Advanced Parameters** panel, located at the bottom-right corner of the Playground view.

### Advanced Parameters

| Parameter | Description |
|-----------|-------------|
| **Top P (Nucleus Sampling)** | Restricts token sampling to the top cumulative probability mass *p*. Lower values produce more focused responses; higher values increase variability. |
| **Temperature** | Controls response randomness. Values closer to `0` produce deterministic outputs; values closer to `1` produce more creative outputs. |
| **Max Tokens** | Maximum number of tokens the model may generate per response. Increase this value to allow longer outputs. |
| **Presence Penalty** | Discourages the model from repeating topics already covered. Higher values encourage introduction of new topics. |
| **Frequency Penalty** | Reduces repetition of specific words or phrases. Higher values promote lexical variety in the response. |

---

## Bulk Prompts

The bulk prompts feature enables parallel processing of multiple prompts via CSV file upload. To access this feature, click the **Click here** option in the GenAI section. All bulk prompt activity can be monitored in **Async History**.

### CSV File Requirements

| Requirement | Details |
|-------------|---------|
| **Column Header** | The file must contain exactly one column with the header `prompts`. |
| **Row Format** | Each row must contain one prompt. |
| **File Format** | CSV only |
| **Maximum File Size** | 500 MB |
| **Encoding** | UTF-8 (required for special characters) |

**Steps to upload a bulk prompt file:**

1. In the GenAI section, click **Click here** to open the bulk prompts interface.
2. Select an existing dataset or create a new one.
3. Upload a CSV file that meets the requirements listed above.

### Async History

After upload, the system processes prompts asynchronously and tracks progress in the **Async History** section. Each entry displays its current processing status.

| Status | Description |
|--------|-------------|
| **Queued** | The prompt batch is waiting to be processed. |
| **Processing** | Prompts are currently being processed. |
| **Completed** | All prompts have been processed successfully. |
| **Failed** | One or more prompts could not be processed. |

• **Download** and **Delete** actions are available per row and are enabled only when the row status is **Completed**.
• Use **Clear All** to remove all entries with a **Failed** or **Completed** status simultaneously.

---

## Authentication

All GenAI API requests require an API token for authentication.

### Generating an API Token

1. In the TIR UI, open the **API Token** section using the side navigation bar.
2. Click **Create Token** to generate a new token.
3. Copy the **Auth Token** and **API Key** and store them securely.

:::info
When using the OpenAI-compatible SDK, use the **Auth Token** value as the `OPENAI_API_KEY`.
:::

---

## Integration

GenAI models can be accessed through three integration methods. Choose the method that best fits your application and the model type.

| Method | Description | Compatible Models |
|--------|-------------|-------------------|
| **REST API** | Direct HTTP requests using cURL or any HTTP client. | All models |
| **OpenAI SDK** | Standard OpenAI Python or JavaScript SDK, configured with a TIR-specific base URL. | LLM text generation models (e.g., Llama, Mistral, DeepSeek) |
| **TIR SDK** | Python SDK for the TIR platform, used with specialized model types. | Speech-to-text and other non-LLM models (e.g., Whisper) |

All integration methods require a valid API token. See [Generating an API Token](#generating-an-api-token).

---

## Using the REST API

This example demonstrates REST API access using the **Llama 4 Scout 17B 16E Instruct** model.

**Prerequisite:** An API token from the [Authentication](#authentication) section.

**Steps:**

1. In the GenAI section, select the **Llama 4 Scout 17B 16E Instruct** model card.
2. Click **Get Code**, then open the **HTTP** tab to view the cURL request for the model endpoint.
3. Copy the cURL command and open it in an HTTP client or API testing tool such as Postman.
4. Add your Auth Token to the `Authorization` header using Bearer token format:

   ```
   Authorization: Bearer <your-auth-token>
   ```

5. Modify the request payload as required by your use case.
6. Send the request. The model response is returned in the response body.

:::info
All GenAI API requests use Bearer token authentication in the `Authorization` header.
:::

---

## Using the OpenAI SDK

All LLM text generation models on TIR GenAI are OpenAI-compatible. You can integrate them by updating the base URL and API key in your existing OpenAI SDK setup to point to TIR.

This example demonstrates integration using the **DeepSeek V3** model.

**Prerequisite:** An API token from the [Authentication](#authentication) section.

**Steps:**

1. In the GenAI section, select the **DeepSeek V3** model card.
2. Click **Get Code**, then open the **Python** tab to view the sample code.
3. Copy the sample Python script.
4. Install the OpenAI package:

   ```bash
   pip install -U openai
   ```

5. In the copied script, replace the `api_key` value with the **Auth Token** generated in the Authentication section.
6. Run the script. The model response is printed to the console.

---

## Using the TIR SDK

The TIR SDK is a Python SDK for the TIR platform. It is designed for models that require non-standard inputs, such as audio files for speech-to-text tasks.

This example demonstrates integration using the **Whisper Large V3** speech-to-text model.

**Prerequisite:** An API token from the [Authentication](#authentication) section.

**Steps:**

1. In the GenAI section, select the **Whisper Large V3** model card.
2. Click **Get Code**, then open the **API** tab to view the sample Python code.
3. Copy the sample script. In the `data` dictionary, set the `input` field to the path of your audio file and adjust any additional parameters as needed.
4. Export the required environment variables. Replace the placeholder values with your actual token credentials:

   ```bash
   export E2E_TIR_ACCESS_TOKEN=<your-access-token>
   export E2E_TIR_API_KEY=<your-api-key>
   ```

5. Run the script. The transcription response is printed to the console.

---

## Usage and Pricing

Usage statistics and per-model pricing are accessible directly in the TIR UI.

**Steps:**

1. In the GenAI section, select any model card (for example, **Whisper Large V3**).
2. Open the **Usage** tab to view consumption details, or click **Check Pricing** to view the pricing breakdown for the selected model.

:::info
Billing parameters vary by model. Confirm pricing for a specific model before integration.
:::

---

## How-To Guides

• [Vector Embeddings Generation](vector_embedding_generation.md)


---