Generating Vector Embeddings using TIR GenAI API

Model Playground is a fully-managed, low-code environment for leveraging well-established open source and nvidia models. Customers will be able to directly access a ready-to-use API (provided by TIR) and don’t have to plan or worry about infrastructure. One of the services provided in it is the vector embeddings service. You can call the Vector Embeddings api to generate the vector embeddings of your input text. You can use the genearted vector embeddings to use it in your model inference or insert them in your vector database for similarity search.

Currently we provide the following models for generating vector embeddings:

  • e5-mistral-7b-instruct - Genearates 4096 dimentional vector embeddings for any given text.

Steps to use the API:

  1. Install our python client package

  2. Set the environment variables

  3. Generate the vector embeddings

1. Install our python client package

We provide our own python client to make it easy for you to use our APIs. You can install the client using the following command:

$ pip install e2enetworks

2. Set the required variables

You need to set the following variables in your python script to use the GenAI API:

E2E_TIR_ACCESS_TOKEN = <paste_your_token_here>
E2E_TIR_API_KEY = <paste_your_apikey_here>
E2E_TIR_PROJECT_ID = <paste_your_project_id_here>
E2E_TIR_TEAM_ID = <paste_your_team_id_here>
  • You will find all the necessary data by generating a config file from the API Tokens section in the TIR website. You can check out the API Token Documentation to see how to generate the config file.

3. Generate the vector embeddings

You can use the following code to generate vector embeddings using our API:

from import tir

tir.init(api_key=TIR_API_KEY, access_token=TIR_ACCESS_TOKEN)
tir_client = tir.ModelAPIClient(project=TIR_PROJECT_ID, team=TIR_TEAM_ID)

text = <enter_your_text_here>
data = {"prompt": text}
model_name = "vector-embedding-e5-mistral-7b-instruct"  # model name for generating vector embeddings

response = client.infer(model_name=model_name, data=data)
generated_vector_embeddings = response.outputs[0].data

How to insert the generated vector embeddings into a Vector database

Currently TIR provides the following vector databases:

  • Qdrant


This guide will show you how to use our Qdrant service and insert the generated vector embeddings into a collection in the Qdrant service.

How to:

Installing the Python Client

We are going to install the client using pip.

pip install qdrant-client

Using the Qdrant Client

Now that we have installed python qdrant client, lets see how we can use it.

Importing the Client

from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

These imports will be used to interact with the client in the following examples.

Connecting to the Qdrant Instance

To connect to the qdrant instance, you need to create a client object and pass the host and the API-Key to the client.

To get the Host and the API key:

  1. Go to TIR and head to the Vector Database section.

  2. There, you will find all your Vector Databases.

  3. Click on the database you want to connect to.

  4. You will find the endpoint host and the API-Key in the Overview tab.


For HTTP (REST) requests, use port no. 6333

For gRPC requests, use port no. 6334

Creating a http client object:

host = <your-qdrant-instance-url>
port = 6333
api_key = <your-api-key>

client = QdrantClient(host=host, port=port, api_key=api_key)

Creating a grpc client object:

host = <your-qdrant-instance-url>
port = 6334
api_key = <your-api-key>

client = QdrantClient(host=host, port=port, api_key=api_key, prefer_grpc=True)

Creating a Collection

We will create a collection with the name “example_collection”.

  • We are going to set the vector size to 4096. This means that each vector will have 4096 dimensions.

  • We are going to set the distance metric to DOT. This means that the distance between vectors will be calculated using the dot product.

  • We are going to set the shard_number to 3. This means that your data will be sharded into 3 indivdual shards.

  • We are also going to set the replication_factor to 3. This means 3 shard-replicas will be stored, i.e., 2 additional copy of each of your data will be maintained automatically.

For more detailed explanation on sharding and replication factor you can refer to the Basic Terminology section.

collection_name = "test_collection"
vectors_config=VectorParams(size=4096, distance=Distance.DOT)
shard_number = 3
replication_factor = 3


Adding Vectors

Now that we have created a collection, lets add some vectors to it. Qdrant allows you to add vectors to the collection using the upsert method. The upsert method takes a list of PointStruct objects as input.

  • The PointStruct object has the following fields:
    • id : The id of the vector. This is a unique identifier for the vector.

    • vector : The vector embeddings generated from the GenAI API.

    • payload : This is a dictionary that can contain the text data from which the vector was generated and any additional data you want to store with the vector.

vectors = [
                payload={"data": text}),

client.upsert(collection_name=collection_name, points=points, wait=True)

If the vectors are added successfully, it will return:

UpdateResult(operation_id=<some_id>, status=<UpdateStatus.COMPLETED: 'completed'>)

If there is an error, it will raise an exception.

Getting detailed info on a Collection

Now that we have inserted a vector in the collection lets see the detailed information about the collection.

collection_info = client.get_collection(collection_name)

In the response you will see that the vectors_count=1. This means that there is 1 vector in the collection.

Whats Next?

If you want to know more about the Qdrant service, you can check out the Qdrant Documentation.