--- title: FAQs --- # Frequently Asked Questions --- ## Dataset and Data Preparation **Q: Can I use a private Hugging Face dataset?** A: Yes. Select **HUGGING FACE** as the dataset type, then add your Hugging Face token via **External Integration**. The token needs **Read** scope. If the dataset belongs to an organization, the token must belong to an account with access to that organization's datasets. --- **Q: Why did my job fail with a dataset error?** A: Common causes: - Incorrect `.jsonl` format (missing required fields, invalid JSON syntax, or trailing commas) - Wrong dataset type for the model (e.g. Stable Diffusion format used with a text model) - Empty dataset file - EOS bucket permissions blocking access Validate your `.jsonl` file locally before uploading. Each line must be valid JSON with the expected fields. --- **Q: Where is my custom dataset stored?** A: Custom datasets are stored in **EOS (E2E Object Storage) Buckets**. You can manage datasets directly from the dataset selection step during job creation. --- **Q: How do I upload a dataset to EOS?** A: During job creation, click **CHOOSE** to select an existing dataset, or **click here** to create a new EOS dataset. After creating the dataset, use the **UPLOAD DATASET** button to add your `.jsonl` files. --- ## Hugging Face Integration **Q: Why can't I access a Hugging Face model even with a valid token?** A: For gated models (e.g. Llama 3, Mistral), you must: 1. Visit the model page on Hugging Face. 2. Accept the model license using the **same account that owns your token**. 3. Ensure the token has **Read** scope. Downloads will fail regardless of token validity if the license has not been accepted. --- **Q: Can I use multiple Hugging Face integrations?** A: Yes. You can create multiple integrations (one per token or organization) and select the appropriate one when creating each fine-tuning job. --- **Q: Do I always need a Hugging Face token?** A: Only if you are using: - A **gated model** (e.g. Llama, some Mistral variants) that requires license acceptance - A **private Hugging Face dataset** For public models and datasets that are not gated, a token is not required. --- ## Training and Configuration **Q: Can I resume training from a checkpoint?** A: Yes. When creating a job, select **Continue training from previous checkpoint** and choose the repository and checkpoint to resume from. This is useful for extending training or recovering from an interrupted run. --- **Q: What is quantization and should I use it?** A: Quantization reduces model precision (e.g. to 4-bit) to lower GPU memory requirements during training. Use it when: - Your model is too large for the available GPU in full precision - You want to reduce training cost by using a smaller or less expensive GPU Quantization may slightly reduce model quality compared to full-precision training. --- **Q: How many epochs should I train for?** A: This depends on dataset size and task. General guidelines: | Dataset size | Suggested epochs | |-------------|-----------------| | Small (< 1,000 samples) | 3–10 | | Medium (1,000–50,000 samples) | 1–5 | | Large (50,000+ samples) | 1–3 | Monitor validation loss in the **Training Metrics** tab to detect overfitting early. --- **Q: What is gradient accumulation and when should I use it?** A: Gradient accumulation simulates larger batch sizes by accumulating gradients across multiple steps before updating model weights. Use it when GPU memory limits your per-step batch size. It lets you effectively train with larger batches without running out of memory. --- ## Job Management **Q: Can I change hyperparameters after a job has been created?** A: No. Hyperparameters are fixed at job creation time. Use the **Clone** feature to create a copy of an existing job and modify the desired parameters. --- **Q: How long does fine-tuning take?** A: Training time depends on model size, dataset size, GPU type, and the number of epochs configured. --- **Q: What does the Clone action do?** A: Clone creates a new fine-tuning job pre-filled with the same configuration as the selected job. You can modify any parameters before launching. Useful for hyperparameter experiments, different dataset versions, or continuing from a new checkpoint. --- ## Model Output **Q: Where are the fine-tuned model files stored?** A: In the **model repository** associated with the job. After training completes, go to the **Models** tab on the job detail page to access all checkpoints and adapters. --- **Q: Can I deploy my fine-tuned model as an inference endpoint?** A: Yes. Navigate from the model repository to **Inference → Model Endpoints** and select your fine-tuned model repository as the model source. See the [Inference documentation](/docs/tir/Inference/) for full deployment instructions. --- **Q: What happens to my model if I delete the fine-tuning job?** A: Deleting a job removes the job record and metadata. The **model repository** and checkpoints may persist depending on your storage configuration. Verify the state of your model repository before deleting a job if you need to retain the trained weights. ---