# Fine-tuning LLaMA-3 with LLaMA Factory on TIR

**LLaMA Factory** is an easy-to-use platform for fine-tuning large language models. With E2E GPU Nodes on TIR, you can train models like LLaMA-3 using either CLI or WebUI.

## Step 1: Setup Environment

Open JupyterLab (Python 3 Notebook) on your GPU Node and run:

```bash
%cd ~/
!rm -rf LLaMA-Factory
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
!pip install -e .[torch,bitsandbytes]
!pip install bitsandbytes
```

Verify GPU:

```python
import torch
assert torch.cuda.is_available(), "GPU not detected"
```

## Step 2: Select GPUs

```python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # or "0,1" for multi-GPU
```

## Step 3: Create Training Config

LLaMA Factory CLI requires a YAML/JSON config file, not just a dataset path.

Example: `train_llama3.yaml`

```yaml
model_name_or_path: unsloth/llama-3-8b-Instruct-bnb-4bit
dataset: alpaca
output_dir: ./output/llama3-lora
per_device_train_batch_size: 2
num_train_epochs: 1
learning_rate: 2e-5
fp16: true
finetuning_type: lora
```

## Step 4: Run Training

```bash
llamafactory-cli train train_llama3.yaml
```

:::info Note
For Meta's official LLaMA-3 models, you must request access on Hugging Face and then log in:
```bash
huggingface-cli login
```
:::

## Step 5: Inference / Chat

```bash
llamafactory-cli chat train_llama3.yaml
```

## Step 6: Merge LoRA and Export (Optional)

```bash
llamafactory-cli export merge_llama3.yaml
```

:::info Note
Merging 8B models requires around 18GB RAM.
:::

## Step 7: WebUI Option

You can also fine-tune via LlamaBoard (Gradio):

```bash
!GRADIO_SHARE=0 llamafactory-cli webui
```

## References

* [LLaMA-Factory GitHub](https://github.com/hiyouga/LLaMA-Factory)
* [LLaMA-Factory CLI examples](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README.md)


---