Skip to main content

Features

Foundation Studio Fine-Tune Models gives you full control over every aspect of the training process — from dataset preparation and model selection to hyperparameter tuning, monitoring, and job lifecycle management.


1. Supported Models

Fine-tune a variety of open-source foundation models:

ModelHuggingFace IDTypeNotes
Llama 2 7Bmeta-llama/Llama-2-7b-hfLLMGated — requires Hugging Face token and license acceptance
Meta Llama 3 8Bmeta-llama/Meta-Llama-3-8BLLMGated — requires Hugging Face token and license acceptance
Meta Llama 3 8B Instructmeta-llama/Meta-Llama-3-8B-InstructLLMGated — requires Hugging Face token and license acceptance
Meta Llama 3.1 8Bmeta-llama/Llama-3.1-8BLLMGated — requires Hugging Face token and license acceptance
Meta Llama 3.1 8B Instructmeta-llama/Llama-3.1-8B-InstructLLMGated — requires Hugging Face token and license acceptance
Meta Llama 3.2 11B Visionmeta-llama/Llama-3.2-11B-VisionMultimodal LLMVision-language fine-tuning; gated model
Meta Llama 3.2 11B Vision Instructmeta-llama/Llama-3.2-11B-Vision-InstructMultimodal LLMVision-language fine-tuning; gated model
Gemma 7Bgoogle/gemma-7bLLMGated — requires Hugging Face token and license acceptance
Gemma 7B Instructgoogle/gemma-7b-itLLMInstruction-tuned variant; gated model
Stable Diffusion 2.1stabilityai/stable-diffusion-2-1Image generationText-to-image fine-tuning
Stable Diffusion XLstabilityai/stable-diffusion-xl-base-1.0Image generationHigher-resolution text-to-image fine-tuning

2. Hyperparameter Configuration

Foundation Studio gives you full control over the training process through a dedicated Hyperparameter Configuration step. You can tune core training settings such as learning rate, epochs, and max context length, as well as LoRA/PEFT-specific parameters for parameter-efficient fine-tuning. Advanced options like quantization, batch size, gradient accumulation, and debug limits are also available to help balance training speed, accuracy, and resource usage.

The right combination of hyperparameters directly impacts model quality — experimenting with these settings helps avoid overfitting or underfitting and ensures the model generalizes well to your task.

For a full list of available parameters, see the Quick Start Guide — Step 5.


3. Quantization

Quantization reduces model size and lowers GPU memory requirements during training. Useful when fine-tuning large models on GPUs with limited VRAM.

OptionDescription
Load in 4BitLoad model weights in 4-bit precision to reduce memory footprint
Compute DatatypeData type for computations (e.g. float16, bfloat16)
QuantTypeQuantization algorithm (e.g. NF4)
Use DoubleQuantApply a second quantization pass for further memory savings

4. Advanced Training Settings

Additional options for fine-grained performance tuning:

ParameterDescription
Batch sizeNumber of samples processed in each training step
Gradient accumulation stepsAccumulate gradients over multiple steps before updating weights — simulates larger batch sizes when GPU memory is limited

5. Model Checkpoints

  • Start from scratch — Begin training from the base model weights (default behavior).
  • Resume from checkpoint — Continue training from a previously saved checkpoint. Useful for iterating on a partially trained model or recovering from an interrupted job.

All training checkpoints are automatically saved and accessible from the model repository after training completes.


6. Experiment Tracking with WandB

Integrate with Weights & Biases (WandB) to track training runs in real time. WandB is a platform for experiment tracking, model visualization, and team collaboration that lets you monitor and compare your fine-tuning runs.

  • Monitor loss curves, learning rate schedules, and custom metrics in the WandB dashboard.
  • Compare multiple fine-tuning runs side by side.
  • Access full training history and model version records.

To enable, add your WandB API key via External Integrations and select it during the Hyperparameter Configuration step when creating a job.

Debug options are also available for additional runtime visibility into the training process.


7. Job Monitoring

Once a job is running, view the following tabs on the job detail page:

TabWhat it shows
OverviewJob configuration, assigned GPU plan, current status, and resource summary
EventsPod lifecycle events — scheduling, container start, and termination. Useful for diagnosing startup failures
LogsReal-time streaming logs from the training process — monitor convergence, diagnose errors, and audit training steps
Training MetricsVisual charts for training loss, validation loss, and other model-specific metrics
MetricsHardware resource utilization — GPU utilization (%), GPU memory usage, and CPU utilization

8. Job Actions

Manage active and historical fine-tuning jobs with the following actions:

ActionWhen to useNotes
CloneCreate a new job with the same configuration, with the option to modify parametersUseful for hyperparameter sweeps and dataset iterations
RetryRestart a job that ended in a failed stateRetains original configuration; no re-configuration needed
TerminateStop a job that is currently runningUse when you want to cancel training early
DeleteRemove a job and its metadata permanentlyDoes not automatically delete the model repository

9. Model Repository

After a successful training run, the fine-tuned model is stored in a model repository containing:

  • All checkpoints saved during training
  • LoRA adapters (if applicable)
  • Model configuration and tokenizer files

Navigate to the Models tab on the job detail page to access the repository.


10. One-Click Deployment to Inference

Once your fine-tuned model is ready, you can deploy it as a live API endpoint directly from Foundation Studio — no additional setup required. The fine-tuned model repository is automatically linked to TIR Inference, so you can go from a completed training job to a running endpoint in just a few clicks.

  • Navigate to the Models tab on your fine-tuning job page.
  • Click Deploy to create an Inference endpoint using your fine-tuned model.
  • Select a serving framework, GPU plan, and scaling configuration.
  • Once deployed, you receive a live endpoint URL that your applications can call immediately.

The endpoint is OpenAI-compatible, so it works with any tool or SDK that supports the OpenAI API format.

For full deployment instructions, see the Inference — Model Endpoints documentation.