Fine-tuning LLaMA-3 with LLaMA Factory on TIR
LLaMA Factory is an easy-to-use platform for fine-tuning large language models. With E2E GPU Nodes on TIR, you can train models like LLaMA-3 using either CLI or WebUI.
Step 1: Setup Environment
Open JupyterLab (Python 3 Notebook) on your GPU Node and run:
%cd ~/
!rm -rf LLaMA-Factory
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
!pip install -e .[torch,bitsandbytes]
!pip install bitsandbytes
Verify GPU:
import torch
assert torch.cuda.is_available(), "GPU not detected"
Step 2: Select GPUs
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # or "0,1" for multi-GPU
Step 3: Create Training Config
LLaMA Factory CLI requires a YAML/JSON config file, not just a dataset path.
Example: train_llama3.yaml
model_name_or_path: unsloth/llama-3-8b-Instruct-bnb-4bit
dataset: alpaca
output_dir: ./output/llama3-lora
per_device_train_batch_size: 2
num_train_epochs: 1
learning_rate: 2e-5
fp16: true
finetuning_type: lora
Step 4: Run Training
llamafactory-cli train train_llama3.yaml
Note
For Meta's official LLaMA-3 models, you must request access on Hugging Face and then log in:
huggingface-cli login
Step 5: Inference / Chat
llamafactory-cli chat train_llama3.yaml
Step 6: Merge LoRA and Export (Optional)
llamafactory-cli export merge_llama3.yaml
Note
Merging 8B models requires around 18GB RAM.
Step 7: WebUI Option
You can also fine-tune via LlamaBoard (Gradio):
!GRADIO_SHARE=0 llamafactory-cli webui