DATASET SDK
Welcome to the E2E Networks Dataset SDK notebook! This guide will take you through the entire workflow of EOS Dataset using the E2E Networks SDK.
For detailed information about the available methods and how to use them, you can call the help()
function of the Datasets
class.
Overview
The Finetuning SDK offers a user-friendly interface to:
Create Dataset: Generate an EOS dataset by following the appropriate steps for dataset creation and configuration within the system.
Delete Dataset: Delete any unnecessary dataset by following the proper removal procedure.
List Datasets: Retrieve a list of datasets or view the details of a specific dataset as needed.
Actions on Dataset: Upload data to a dataset, download files from the dataset, and load the dataset file for further use.
Setup the SDK and Initialize a Dataset
First of all install the E2E Networks library
from e2enetworks.cloud import tir
from e2enetworks.cloud.tir import Datasets
There are two ways by which you can initialize the SDK with your credentials.
Using the
load_config
CONFIG_FILE_PATH = "path_to_config_file"
tir_client = tir.load_config(CONFIG_FILE_PATH)
By defining the credentials and project details
# Define your credentials and project details
TIR_API_KEY = 'your_tir_api_key'
TIR_ACCESS_TOKEN = 'your_tir_access_token'
TIR_PROJECT_ID = 'your_tir_project_id'
TIR_TEAM_ID = 'your_tir_team_id'
# Initialize the SDK
tir.init(
api_key=TIR_API_KEY,
access_token=TIR_ACCESS_TOKEN,
project=TIR_PROJECT_ID,
team=TIR_TEAM_ID
)
Set up and initialize the SDK file to start using its functionalities.
# Initialize the Datasets
dataset = Datasets()
# Optionally, display the available methods and their usage
dataset.help()
Configure and Create Dataset
Change the parameters as per your requirements. You can make use of the various functions provided at the end of the notebook to explore different options that suit your needs.
# Define your Dataset parameters
DATASET_NAME = "sample-dataset"
DESCRIPTION = "This is a sample description"
- There are 2 ways of creating the dataset:
Dataset with Encryption disabled - set the
encryption_enable
variable asFalse
to create the datasetDataset with Encryption enabled - set the
encryption_enable
variable asTrue
to create the dataset
Dataset with Encryption disabled
A dataset can be created without enabling encryption, allowing you to manage data without encryption requirements.
# Create Dataset
is_success, data = dataset.create(
name=DATASET_NAME,
encryption_enable=False # By default it is False
description=DESCRIPTION # Not mandate
)
# Check success
if is_success:
print("Your Dataset is created successfully.")
else:
print("Failed to create Dataset.")
Dataset with Encryption enabled
A dataset can be created with encryption enabled by specifying the desired encryption type during the creation process.
# Create Dataset
is_success, data = dataset.create(
name=DATASET_NAME,
encryption_enable=True
encryption_type="e2e-managed" # You can choose from ["e2e-managed","user-managed"]
description=DESCRIPTION # Not Mandate
)
# Check success
if is_success:
print("Your Dataset is created successfully.")
else:
print("Failed to create Dataset.")
List Dataset
Retrieve detailed information of specific Dataset and list of all Datasets.
# To get the list of Available Datasets
dataset.list()
# To get all deatils of any specific Dataset
dataset.get(DATASET_ID)
Delete Datasets
Remove any dataset that is no longer needed by using the appropriate deletion process.
# To Delete a Dataset
dataset.delete(DATASET_ID)
Upload & Download Dataset
Once the dataset is created, you can perform actions such as uploading data to Dataset or downloading files from the Dataset.
# It is the path of file on our machine which needs to be uploaded on the Dataset
UPLOAD_DATASET_PATH = "sample-file.pdf"
#It is the path on our machine where we want to download and store the data from the Dataset
LOCAL_PATH = "local_path"
# To Upload the data from our machine to Dataset
dataset.upload_dataset(dataset_id=DATASET_ID,upload_dataset_path=UPLOAD_DATASET_PATH)
# To download the data from the dataset to our machine
dataset.download_dataset(dataset_id=DATASET_ID , local_path=LOCAL_PATH)
Load Dataset
Once the dataset is created, you can load the data from the dataset.
# It is the path of file on the Dataset which we want to load. The accepted format which can be used in FILE_PATH are ['csv', 'json', 'jsonl', 'txt', 'parquet']
FILE_PATH = "/test.json"
# To load the accepted formats file from dataset
dataset.load_dataset_file(dataset_id=DATASET_ID, file_path=FILE_PATH)