Datasets

You can organize, share and easily access your data through notebooks, training code using datasets. At the moment, TIR supports EOS (Object Storage) backed datasets but we will soon introduce PVC (Disk) backed datasets as well.

How does it work?

The datasets allows you to mount and access EOS storage buckets as local file system. This will enable you to read objects in your bucket using standard file system semantics.

With datasets, you can load training data from local machine, other cloud providers and access the data from your notebook as a mounted file system (under /datasets directory).

Note

Define datasets to manage and control your data, even if you don’t plan to use it as mount it for use on notebooks or training jobs.

Benefits

Using datasets offers the following benefits:

  • You can create a shared EOS bucket for training data for your team. By using common data sources across team, you can improve reproducibility of results

  • Less configuration overhead. When you create dataset through TIR, we automatically create a new storage bucket and access credentials. You can straight away copy and run the mc (minio cli) commands (shown in the ui) to your local desktop or hosted notebooks to upload data.

  • You can access your training data without having to setup access information on hosted notebooks or training jobs

  • You can stream training data instead of downloading all of it to the disk. This is also useful in distributed training jobs.

Usage

WebUI

You can define dataset through TIR dashboard. The UI also allows you to browse the objects in the bucket and upload files. We recommend using mc client or any s3 compliant cli to upload large data. In case you have data with other cloud providers, visit the Import Data from Other Cloud Providers section.

SDK

Using TIR SDK, you can quickly setup mc cli and start importing or exporting data from EOS buckets.

Getting Started

Prequisites

Install minio cli on your desktop (local) from MinIO. Ignore this step, if you already have mc installed on your local machine.

Create a new dataset

  • Go to the TIR AI PLATFORM

  • Make sure you are on the right project or feel free to create a new project

  • Go to Datasets

../_images/ds1.png
  • Click on CREATE DATASET

../_images/ds2.png
  • After clicking create dataset, you will redirect on to that below screen.

../_images/ds3.png
  • Here is two options of storage type to create dataset.

1. EOS Bucket

2. DISK

1. EOS Bucket

  • In EOS BUCKET storage type, here is two options of bucket type to craete dataset.

1. New EOS Bucket

2. Existing EOS Bucket

1. New EOS Bucket : This will create a new EOS bucket tied to your account and also access keys for it.

  • Enter a name for your dataset.

  • Click on CREATE button.

../_images/neweosbucket.png
  • You will see a screen to configure EOS bucket to upload data.In that screen you will get Setup Minio CLI , Setup s3cmd and Dataset Details

../_images/ds6.png

Setup Minio CLI

In setup minio CLI tab, you will get setup host command and you will get a command to copy folder to a bucket.

../_images/setupminio.png

Setup S3cmd

In setup S3cmd tab, you will get command for set up endpoints, setup access keys and enable s3 v4 signature APIs.

../_images/ds9.png

Dataset Details

In Dataset Details tab, you will get dataset details and bucket details.

../_images/ds10.png

2. Existing EOS Bucket :

In Existing EOS Bucket, you can select the Existing bucket.

  • Enter a name for your dataset (for e.g. paws)

  • Click on CREATE button.

../_images/ds5.png
  • You will see a screen to configure EOS bucket to upload data.In that screen you will get Setup Minio CLI , Setup s3cmd and Bucket name , Dataset Details

../_images/ds6.png

Setup Minio CLI

In setup minio CLI tab, you will get setup host command and you will get a command to copy folder to a bucket.

../_images/setupminio.png

Setup S3cmd

In setup S3cmd tab, you will get command for set up endpoints, setup access keys and enable s3 v4 signature APIs.

../_images/ds9.png

Dataset Details

In Dataset Details tab, you will get dataset details and bucket details.

../_images/ds10.png

2. DISK

If you select , DISK storage type to create dataset. You will get the Disk Size field in that field you can set the disk size according to you.

  • Each GB wil be charged at 5 rupee per month.

  • The disk size cannot be reduced in future. However it can be increased anytime.

  • Enter a name for your dataset (for e.g. paws)

  • Click on CREATE button.

AI_ML/images/disk.png

After successfully creation of dataset. You will get Setup , overview and Data Objects.

Setup

In setup tab, you will get the details of Configure EOS Bucket to upload data , Setup minio client and setup s3cmd.

AI_ML/images/setuptab.png

Setup minio client

../_images/ds7.png

Setup s3cmd

../_images/ds8.png

Overview

In that section you will get the information of dataset details (Dataset name, created by and createde at) and storage details like storage tye, bucket name, bucket type, access key , secret key and EOS end points.

../_images/overview.png

Data Objects

In Data Objects you can upload data in your bucket by clicking UPLOAD DATA button.

../_images/dataset.png

After clicking on upload data button , you can choose or drag files from your system and click on UPLOAD button.

../_images/choosefile.png ../_images/upload.png

After successfully uploading files, you can see the list of files/data on list.

../_images/afteruploading.png

You can download file/data by selecting the particular file and then clicking download button.

../_images/download.png

You can delete file/data by selecting the particular file and then clicking delete button.

../_images/deletefile.png

Delete Dataset

Select the particular dataset from the list and click on the Delete button to delete the dataset.

../_images/deleteds.png

After clicking on the Delete button it will show one popup to delete the dataset.you can click delete button to delete dataset.

../_images/deleteds2.png