Skip to main content

Fine-tuning LLaMA-3 with LLaMA Factory on TIR

LLaMA Factory is an easy-to-use platform for fine-tuning large language models. With E2E GPU Nodes on TIR, you can train models like LLaMA-3 using either CLI or WebUI.

Step 1: Setup Environment

Open JupyterLab (Python 3 Notebook) on your GPU Node and run:

%cd ~/
!rm -rf LLaMA-Factory
!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
!pip install -e .[torch,bitsandbytes]
!pip install bitsandbytes

Verify GPU:

import torch
assert torch.cuda.is_available(), "GPU not detected"

Step 2: Select GPUs

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # or "0,1" for multi-GPU

Step 3: Create Training Config

LLaMA Factory CLI requires a YAML/JSON config file, not just a dataset path.

Example: train_llama3.yaml

model_name_or_path: unsloth/llama-3-8b-Instruct-bnb-4bit
dataset: alpaca
output_dir: ./output/llama3-lora
per_device_train_batch_size: 2
num_train_epochs: 1
learning_rate: 2e-5
fp16: true
finetuning_type: lora

Step 4: Run Training

llamafactory-cli train train_llama3.yaml
Note

For Meta's official LLaMA-3 models, you must request access on Hugging Face and then log in:

huggingface-cli login

Step 5: Inference / Chat

llamafactory-cli chat train_llama3.yaml

Step 6: Merge LoRA and Export (Optional)

llamafactory-cli export merge_llama3.yaml
Note

Merging 8B models requires around 18GB RAM.

Step 7: WebUI Option

You can also fine-tune via LlamaBoard (Gradio):

!GRADIO_SHARE=0 llamafactory-cli webui

References