---
title: FAQs
---

# Frequently Asked Questions

---

## Dataset and Data Preparation


**Q: Can I use a private Hugging Face dataset?**

A: Yes. Select **HUGGING FACE** as the dataset type, then add your Hugging Face token via **External Integration**. The token needs **Read** scope. If the dataset belongs to an organization, the token must belong to an account with access to that organization's datasets.

---

**Q: Why did my job fail with a dataset error?**

A: Common causes:
- Incorrect `.jsonl` format (missing required fields, invalid JSON syntax, or trailing commas)
- Wrong dataset type for the model (e.g. Stable Diffusion format used with a text model)
- Empty dataset file
- EOS bucket permissions blocking access

Validate your `.jsonl` file locally before uploading. Each line must be valid JSON with the expected fields.

---

**Q: Where is my custom dataset stored?**

A: Custom datasets are stored in **EOS (E2E Object Storage) Buckets**. You can manage datasets directly from the dataset selection step during job creation.

---

**Q: How do I upload a dataset to EOS?**

A: During job creation, click **CHOOSE** to select an existing dataset, or **click here** to create a new EOS dataset. After creating the dataset, use the **UPLOAD DATASET** button to add your `.jsonl` files.

---

## Hugging Face Integration

**Q: Why can't I access a Hugging Face model even with a valid token?**

A: For gated models (e.g. Llama 3, Mistral), you must:
1. Visit the model page on Hugging Face.
2. Accept the model license using the **same account that owns your token**.
3. Ensure the token has **Read** scope.

Downloads will fail regardless of token validity if the license has not been accepted.

---

**Q: Can I use multiple Hugging Face integrations?**

A: Yes. You can create multiple integrations (one per token or organization) and select the appropriate one when creating each fine-tuning job.

---

**Q: Do I always need a Hugging Face token?**

A: Only if you are using:
- A **gated model** (e.g. Llama, some Mistral variants) that requires license acceptance
- A **private Hugging Face dataset**

For public models and datasets that are not gated, a token is not required.

---

## Training and Configuration

**Q: Can I resume training from a checkpoint?**

A: Yes. When creating a job, select **Continue training from previous checkpoint** and choose the repository and checkpoint to resume from. This is useful for extending training or recovering from an interrupted run.

---

**Q: What is quantization and should I use it?**

A: Quantization reduces model precision (e.g. to 4-bit) to lower GPU memory requirements during training. Use it when:
- Your model is too large for the available GPU in full precision
- You want to reduce training cost by using a smaller or less expensive GPU

Quantization may slightly reduce model quality compared to full-precision training.

---

**Q: How many epochs should I train for?**

A: This depends on dataset size and task. General guidelines:

| Dataset size | Suggested epochs |
|-------------|-----------------|
| Small (< 1,000 samples) | 3–10 |
| Medium (1,000–50,000 samples) | 1–5 |
| Large (50,000+ samples) | 1–3 |

Monitor validation loss in the **Training Metrics** tab to detect overfitting early.

---

**Q: What is gradient accumulation and when should I use it?**

A: Gradient accumulation simulates larger batch sizes by accumulating gradients across multiple steps before updating model weights. Use it when GPU memory limits your per-step batch size. It lets you effectively train with larger batches without running out of memory.

---

## Job Management


**Q: Can I change hyperparameters after a job has been created?**

A: No. Hyperparameters are fixed at job creation time. Use the **Clone** feature to create a copy of an existing job and modify the desired parameters.

---

**Q: How long does fine-tuning take?**

A: Training time depends on model size, dataset size, GPU type, and the number of epochs configured. 

---

**Q: What does the Clone action do?**

A: Clone creates a new fine-tuning job pre-filled with the same configuration as the selected job. You can modify any parameters before launching. Useful for hyperparameter experiments, different dataset versions, or continuing from a new checkpoint.

---

## Model Output

**Q: Where are the fine-tuned model files stored?**

A: In the **model repository** associated with the job. After training completes, go to the **Models** tab on the job detail page to access all checkpoints and adapters.

---

**Q: Can I deploy my fine-tuned model as an inference endpoint?**

A: Yes. Navigate from the model repository to **Inference → Model Endpoints** and select your fine-tuned model repository as the model source. See the [Inference documentation](/docs/tir/Inference/) for full deployment instructions.

---

**Q: What happens to my model if I delete the fine-tuning job?**

A: Deleting a job removes the job record and metadata. The **model repository** and checkpoints may persist depending on your storage configuration. Verify the state of your model repository before deleting a job if you need to retain the trained weights.


---