Quick Start Guide

Create your first fine-tuning job and train a model on your own dataset. This guide covers the complete flow from job creation through monitoring and accessing your trained model.

What you need before you start

Requirement	Details
Account	Active E2E AI Cloud account with access to Foundation Studio
Dataset	A dataset in `.jsonl` format uploaded to EOS, or a Hugging Face dataset name
Hugging Face token	Required if using a gated model (e.g. Llama 3) or a private Hugging Face dataset
GPU plan	Decide which GPU to use (H100, A100) based on your model size and training budget

Hugging Face Integration

If you're fine-tuning a gated model (e.g. Llama 3, Mistral), you need a Hugging Face access token added as an integration. For setup instructions, see External Integrations — Hugging Face.

Step 1: Navigate to Fine-Tune Models

In the TIR Dashboard sidebar, click Foundation Studio under Labs Experimental.
From the dropdown, select Fine-Tune Models.
You will land on the Manage Fine-Tuning Jobs page.

Step 2: Create a fine-tuning job

Click Create Fine-Tuning Job or the Click Here button.
Select a base model from the available options.
Choose a GPU plan — H100 and A100 are available. Use the filter to narrow down by GPU type.

tip

For LLMs with 7B+ parameters, choose A100 or H100 for acceptable training times.
For Stable Diffusion models, A100 is typically sufficient.

Step 3: Configure the job model

On the Job Model Configuration page:

Enter a name for your fine-tuned model.
Choose your training start point:
- Start Training from Scratch (default) — trains from the base model weights.
- Continue training from previous checkpoint — resumes from an existing checkpoint.
If resuming, click Choose to select the model repository and checkpoint.
Select a Hugging Face integration from the dropdown, or click Create New to add your token.

Note

Some models require access granted by their administrator. Visit the model card on Hugging Face to request access.

Step 4: Prepare your dataset

On the Dataset Preparation page:

Select a task that matches your training objective.
Choose a dataset type:
- CUSTOM — Upload your own .jsonl files to an EOS bucket.
- HUGGING FACE — Use a dataset from the Hugging Face Hub.
Set a validation split ratio (e.g. 0.1 for 10% validation).
Configure prompt settings as needed.

Using a CUSTOM dataset

Click CHOOSE to select an existing EOS dataset, or click here to create a new one. After creating a dataset, click UPLOAD DATASET to add your files, then click SUBMIT.

For text models:

Your dataset should contain records with fields that map to your selected task and prompt configuration. The exact fields depend on the task you choose — the UI shows an Example Dataset preview once a task is selected, which you can use as a reference for the expected structure.

[
  {
    "input": "Artificial Intelligence is a branch of computer science...",
    "output": "AI is a field focused on creating machines that mimic human intelligence.",
    "instruction": "Summarize the following text."
  }
]

The Prompt Configuration is auto-generated based on the selected task and defines how the fields are presented to the model during training.

For image generation models (e.g. Stable Diffusion):

Instead of a text schema, you configure dataset columns and validation settings directly in the UI:

Field	Description	Example
Target Image Column	Column in your dataset containing the images	`image`
Target Caption Column	Column containing the text captions	`text`
Validation Prompt	A prompt used to generate sample images during training to track progress	`A photo of a man with green eyes`
Num Validation Images	Number of sample images to generate at each validation step	`2`

Note

Uploading a dataset with incorrect field names or structure will cause the fine-tuning job to fail. Use the Example Dataset shown in the UI as a reference for the expected format.

Using a Hugging Face dataset

Select HUGGING FACE as the dataset type and choose a dataset from the available collection.

Step 5: Set hyperparameters

During Hyperparameter Configuration, you fine-tune settings like learning rate, batch size, and optimization algorithms to optimize model performance. This step is crucial for balancing training speed, accuracy, and resource usage. Experimenting with different hyperparameter combinations helps in finding the best configuration that improves model accuracy while avoiding overfitting or underfitting.

On the Hyperparameter Configuration page, the following parameters are available:

Parameter	Description
Training Type	The fine-tuning method to use (e.g. Parameter-Efficient Fine-Tuning, full fine-tuning)
Stop Training When	The condition that ends training (e.g. when epoch count has reached a set number)
Learning Rate	Step size during optimization — influences convergence speed and training stability
Epochs	Number of complete passes over the entire dataset during training
Max Steps	Maximum number of training steps; if set, epochs are ignored
Max Context Length	Maximum length of input sequences during training
Peft Lora R	LoRA attention dimension (rank)
Peft Lora Alpha	Alpha parameter for LoRA scaling
Lora Dropout	Dropout probability applied to LoRA layers to reduce overfitting
Lora Bias	Specifies which biases are updated during training (`none`, `all`, or `lora_only`)
Target Module	Specifies which model layers LoRA is applied to

Quantization (optional): Reduce GPU memory usage during training. Options include Load in 4Bit and DoubleQuant.

Advanced settings (optional): Configure batch size and gradient accumulation steps.

WandB tracking (optional): Enable Weights & Biases (WandB) to monitor training metrics in real time. WandB is a platform for experiment tracking, model visualization, and team collaboration. To enable, add your WandB API key via External Integrations and select it in this step.

Debug Options (optional): Allows you to limit the amount of data used during training and evaluation runs, useful for quick validation before a full run.

Step 6: Review and launch

Review your configuration on the Summary page, then click Launch.

The job appears in the Manage Fine-Tuning Jobs list. Training time depends on model size, dataset size, and GPU plan.

Step 7: Monitor your job

Click on the job to view details:

Tab	What it shows
Overview	Job configuration, status, and resource details
Events	Pod scheduling, container start, and lifecycle events
Logs	Real-time training logs to diagnose errors or monitor progress
Training Metrics	Loss curves and other training metrics
Metrics	GPU utilization, GPU memory usage, and other resource metrics

Step 8: Access your fine-tuned model

When training completes, your fine-tuned model appears in the Models section at the bottom of the job page. The model repository contains:

All training checkpoints
Any LoRA adapters built during training

From here, navigate to the Inference section to deploy your fine-tuned model as an API endpoint.

Next steps

Features — Explore all fine-tuning capabilities in detail.
Pricing — Understand GPU billing for fine-tuning jobs.
Guides — Model-specific tutorials for Llama, Mistral, Stable Diffusion, and more.
FAQs — Troubleshoot common issues.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

What you need before you start​

Step 1: Navigate to Fine-Tune Models​

Step 2: Create a fine-tuning job​

Step 3: Configure the job model​

Step 4: Prepare your dataset​

Using a CUSTOM dataset​

Using a Hugging Face dataset​

Step 5: Set hyperparameters​

Step 6: Review and launch​

Step 7: Monitor your job​

Step 8: Access your fine-tuned model​

Next steps​

What you need before you start

Step 1: Navigate to Fine-Tune Models

Step 2: Create a fine-tuning job

Step 3: Configure the job model

Step 4: Prepare your dataset

Using a CUSTOM dataset

Using a Hugging Face dataset

Step 5: Set hyperparameters

Step 6: Review and launch

Step 7: Monitor your job

Step 8: Access your fine-tuned model

Next steps