---
title: "Features"
description: "Explore Dataset features and capabilities"
---

import { DatasetsFeaturesNav, DatasetsBestPracticesCard } from './DatasetsFeaturesCards'

# Features

<DatasetsFeaturesNav />

## Feature Overview

### 1. Unified Storage Access
Datasets can be mounted directly to your Instances (Nodes) and Training Jobs.
*   **Path:** `/datasets/<dataset-name>`
*   **Benefit:** Access cloud storage as if it were a local folder.

### 2. S3 Compatibility (EOS)
Seamlessly integrate with any tool that supports the S3 API.
*   **Tools:** `s3cmd`, `minio-client`

### 3. Encryption Options
Secure your data at rest with flexible encryption choices.
*   **E2E Managed:** Hassle-free, platform-managed keys.
*   **User Managed:** Full control over your keys (Caveat: Key loss = Data loss).
    *   For detailed instructions on reading data encrypted with user-managed keys, see [User-Managed Encryption Guide](https://docs.e2enetworks.com/docs/myaccount/storage/object_storage/EOSEncryption/create_encrypted_eos/#option-2-encryption-through-user-managed-keys).

### 4. Data Importing
Easily migrate data from other cloud providers or local machines using the **Data Syncer** or CLI tools.

## How to Use Each Feature

### Mounting Datasets in Notebooks
When launching a Notebook or Training Job, simply select the datasets you wish to mount. They will appear under the `/datasets` directory.

```python
# Example: Accessing a file in a mounted dataset
import pandas as pd

df = pd.read_csv('/datasets/my-dataset/train.csv')
print(df.head())
```

### Uploading Data (Web UI)
1.  Go to the **Data Objects** tab of your dataset.
2.  Click **UPLOAD DATA**.
3.  Drag and drop files or select from your system.

<br/>

### Uploading Data (MinIO CLI)
The UI provides ready-to-use commands for configuration.

1.  **Configure Alias:**
    ```bash
    mc alias set <alias-name> <endpoint> <access-key> <secret-key>
    ```
2.  **Copy Files:**
    ```bash
    mc cp -r ./local-data/ <alias-name>/my-dataset/
    ```

![Setup Minio CLI](dataset_images/ds6.png)

### Uploading Data (s3cmd)
You can also use `s3cmd` to manage your datasets. For setup instructions, see the [s3cmd configuration guide](https://docs.e2enetworks.com/docs/myaccount/storage/object_storage/setting_up_s3cmd/).

**Upload Files:**
```bash
# Upload a single file
s3cmd put local-file.txt s3://my-dataset/

# Upload a directory
s3cmd put -r ./local-data/ s3://my-dataset/

# List bucket contents
s3cmd ls s3://my-dataset/
```

![Setup s3cmd](dataset_images/dtss3cmd.png)


### Managing Lifecycle Rules
Lifecycle rules allow you to automatically delete objects in your EOS bucket after a specified period, helping you manage storage costs and maintain data hygiene.

#### What are Lifecycle Rules?
Lifecycle rules enable automatic deletion of objects based on:
- **Time-based expiration**: Objects are deleted after a specified number of days
- **Prefix-based filtering**: Apply rules to all objects or only those matching a specific prefix pattern

#### Creating a Lifecycle Rule

1. **Navigate to Bucket Lifecycle:**
   - Go to your dataset's **Bucket Lifecycle** tab
   - Click **Configure Lifecycle Rule**


2. **Configure the Rule:**
   - **Selected Dataset**: The EOS bucket for which the rule is being created (auto-populated)
   - **Apply To**: Choose the scope of the rule:
     - **All Objects**: Apply to every object in the dataset without filtering
     - **Objects with Prefix**: Apply only to objects matching a specific prefix pattern (e.g., `temp/`, `logs/`)
   - **Expiration Days**: Set the number of days before objects are automatically deleted (minimum: 1 day)

3. **Review and Create:**
   - Review the **Irreversible Action** warning: Objects will be permanently deleted after the specified period
   - Click **CREATE RULE** to activate the lifecycle policy

#### Important Notes

- Lifecycle deletion is **irreversible**. Deleted objects cannot be recovered.
- Rules apply to objects based on their creation/modification date.
- Multiple rules can be created with different prefixes to manage different data types.


#### Use Cases
- **Temporary Data**: Automatically clean up scratch files or intermediate processing results
- **Log Rotation**: Delete old log files after a retention period
- **Experiment Cleanup**: Remove outdated experiment data while preserving important results
- **Cost Optimization**: Reduce storage costs by removing data that's no longer needed

## Best Practices

#### Performance
TIR provides datasets through two storage options:
*   **EOS Bucket-based:** Cloud object storage ideal for large-scale training with high throughput and parallel data access.
*   **Disk-based:** Local storage for workloads requiring low-latency random access.

#### Cost Optimization
*   **Lifecycle Management:** Delete temporary datasets or intermediate checkpoints that are no longer needed.
*   **Choose Right Storage:** Use Disk only when low-latency random access is strictly required, as it is generally more expensive per GB than object storage.

#### Security
*   **Least Privilege:** Share access keys only with those who need them.
*   **Encryption:** Always use encryption for sensitive data. prefer **E2E Managed** for ease of use unless you have strict compliance requirements for key ownership.

---

<DatasetsBestPracticesCard />


---