---
title: Quick Start Guide
---

# Quick Start Guide

> Create your first fine-tuning job and train a model on your own dataset. This guide covers the complete flow from job creation through monitoring and accessing your trained model.

---

## What you need before you start

| Requirement | Details |
|-------------|---------|
| **Account** | Active E2E AI Cloud account with access to Foundation Studio |
| **Dataset** | A dataset in `.jsonl` format uploaded to EOS, or a Hugging Face dataset name |
| **Hugging Face token** | Required if using a gated model (e.g. Llama 3) or a private Hugging Face dataset |
| **GPU plan** | Decide which GPU to use (H100, A100) based on your model size and training budget |

---

:::info Hugging Face Integration
If you're fine-tuning a gated model (e.g. Llama 3, Mistral), you need a Hugging Face access token added as an integration. For setup instructions, see [External Integrations — Hugging Face](/docs/tir/external_integrations/intro#hugging-face).
:::

---

## Step 1: Navigate to Fine-Tune Models

1. In the TIR Dashboard sidebar, click **Foundation Studio** under Labs Experimental.
2. From the dropdown, select **Fine-Tune Models**.
3. You will land on the **Manage Fine-Tuning Jobs** page.

---

## Step 2: Create a fine-tuning job

1. Click **Create Fine-Tuning Job** or the **Click Here** button.
2. Select a base model from the available options.
3. Choose a GPU plan — H100 and A100 are available. Use the filter to narrow down by GPU type.
<br></br>
:::tip
- For LLMs with 7B+ parameters, choose A100 or H100 for acceptable training times.
- For Stable Diffusion models, A100 is typically sufficient.
:::

---

## Step 3: Configure the job model

On the **Job Model Configuration** page:

1. **Enter a name** for your fine-tuned model.
2. **Choose your training start point:**
   - **Start Training from Scratch** (default) — trains from the base model weights.
   - **Continue training from previous checkpoint** — resumes from an existing checkpoint.
3. If resuming, click **Choose** to select the model repository and checkpoint.
4. **Select a Hugging Face integration** from the dropdown, or click **Create New** to add your token.
<br></br>
:::info Note
Some models require access granted by their administrator. Visit the model card on Hugging Face to request access.
:::

---

## Step 4: Prepare your dataset

On the **Dataset Preparation** page:

1. **Select a task** that matches your training objective.
2. **Choose a dataset type:**
   - **CUSTOM** — Upload your own `.jsonl` files to an EOS bucket.
   - **HUGGING FACE** — Use a dataset from the Hugging Face Hub.
3. **Set a validation split ratio** (e.g. `0.1` for 10% validation).
4. **Configure prompt settings** as needed.

### Using a CUSTOM dataset

Click **CHOOSE** to select an existing EOS dataset, or **click here** to create a new one. After creating a dataset, click **UPLOAD DATASET** to add your files, then click **SUBMIT**.

**For text models:**

Your dataset should contain records with fields that map to your selected task and prompt configuration. The exact fields depend on the task you choose — the UI shows an **Example Dataset** preview once a task is selected, which you can use as a reference for the expected structure.

```json
[
  {
    "input": "Artificial Intelligence is a branch of computer science...",
    "output": "AI is a field focused on creating machines that mimic human intelligence.",
    "instruction": "Summarize the following text."
  }
]
```

The **Prompt Configuration** is auto-generated based on the selected task and defines how the fields are presented to the model during training.

**For image generation models (e.g. Stable Diffusion):**

Instead of a text schema, you configure dataset columns and validation settings directly in the UI:

| Field | Description | Example |
|-------|-------------|---------|
| **Target Image Column** | Column in your dataset containing the images | `image` |
| **Target Caption Column** | Column containing the text captions | `text` |
| **Validation Prompt** | A prompt used to generate sample images during training to track progress | `A photo of a man with green eyes` |
| **Num Validation Images** | Number of sample images to generate at each validation step | `2` |

:::info Note
Uploading a dataset with incorrect field names or structure will cause the fine-tuning job to fail. Use the **Example Dataset** shown in the UI as a reference for the expected format.
:::

### Using a Hugging Face dataset

Select **HUGGING FACE** as the dataset type and choose a dataset from the available collection.

---

## Step 5: Set hyperparameters

During Hyperparameter Configuration, you fine-tune settings like learning rate, batch size, and optimization algorithms to optimize model performance. This step is crucial for balancing training speed, accuracy, and resource usage. Experimenting with different hyperparameter combinations helps in finding the best configuration that improves model accuracy while avoiding overfitting or underfitting.

On the **Hyperparameter Configuration** page, the following parameters are available:

| Parameter | Description |
|-----------|-------------|
| **Training Type** | The fine-tuning method to use (e.g. Parameter-Efficient Fine-Tuning, full fine-tuning) |
| **Stop Training When** | The condition that ends training (e.g. when epoch count has reached a set number) |
| **Learning Rate** | Step size during optimization — influences convergence speed and training stability |
| **Epochs** | Number of complete passes over the entire dataset during training |
| **Max Steps** | Maximum number of training steps; if set, epochs are ignored |
| **Max Context Length** | Maximum length of input sequences during training |
| **Peft Lora R** | LoRA attention dimension (rank) |
| **Peft Lora Alpha** | Alpha parameter for LoRA scaling |
| **Lora Dropout** | Dropout probability applied to LoRA layers to reduce overfitting |
| **Lora Bias** | Specifies which biases are updated during training (`none`, `all`, or `lora_only`) |
| **Target Module** | Specifies which model layers LoRA is applied to |

**Quantization (optional):** Reduce GPU memory usage during training. Options include Load in 4Bit and DoubleQuant.

**Advanced settings (optional):** Configure batch size and gradient accumulation steps.

**WandB tracking (optional):** Enable Weights & Biases (WandB) to monitor training metrics in real time. WandB is a platform for experiment tracking, model visualization, and team collaboration. To enable, add your WandB API key via **External Integrations** and select it in this step.

**Debug Options (optional):** Allows you to limit the amount of data used during training and evaluation runs, useful for quick validation before a full run.

---

## Step 6: Review and launch

Review your configuration on the **Summary** page, then click **Launch**.

The job appears in the **Manage Fine-Tuning Jobs** list. Training time depends on model size, dataset size, and GPU plan.

---

## Step 7: Monitor your job

Click on the job to view details:

| Tab | What it shows |
|-----|---------------|
| **Overview** | Job configuration, status, and resource details |
| **Events** | Pod scheduling, container start, and lifecycle events |
| **Logs** | Real-time training logs to diagnose errors or monitor progress |
| **Training Metrics** | Loss curves and other training metrics |
| **Metrics** | GPU utilization, GPU memory usage, and other resource metrics |

---

## Step 8: Access your fine-tuned model

When training completes, your fine-tuned model appears in the **Models** section at the bottom of the job page. The model repository contains:

- All training checkpoints
- Any LoRA adapters built during training

From here, navigate to the **Inference** section to deploy your fine-tuned model as an API endpoint.

---

## Next steps

- [Features](../Features) — Explore all fine-tuning capabilities in detail.
- [Pricing](../Pricing) — Understand GPU billing for fine-tuning jobs.
- [Guides](../guides/) — Model-specific tutorials for Llama, Mistral, Stable Diffusion, and more.
- [FAQs](../FAQs) — Troubleshoot common issues.


---