Skip to main content

Pricing

Fine-tuning job costs depend on the GPU plan you select and the duration of training. Pricing is based on compute usage — you are charged for the time your training job runs on the selected GPU.


How billing works

FactorDescription
GPU typeThe GPU selected (H100, A100, etc.) determines the per-hour rate
Training durationYou are charged for the actual time the job runs on that GPU
Billing startBegins when the job starts executing on the GPU
Billing stopEnds when the job completes, fails, or is terminated

Fine-tuning jobs are billed hourly based on the GPU compute resources used. There are no charges while a job is queued or pending.


What affects cost

  • GPU type — H100 and A100 have different per-hour rates
  • Training duration — More epochs and larger datasets increase training time
  • Quantization — Enabling 4-bit quantization may reduce training time by lowering memory overhead, potentially allowing use of a smaller GPU

Pricing examples

Note: Values below are illustrative. Use the E2E Calculator for current rates.

Example : Fine-tuning Llama 3 13B on H100

Scenario: Fine-tune Llama 3 13B for 2 epochs on a 50,000-row dataset. Estimated training time: ~8 hours.

FactorValue
GPUH100
Training duration~8 hours

Billing: Cost = 8 hours × (price per H100 hour)

Recommendation: Use H100 for larger models where training speed matters; the higher per-hour rate is often offset by shorter job duration.


For detailed pricing, visit the E2E Calculator.


Frequently Asked Questions

Am I charged if my job fails?

Yes. You are charged for the compute time used up to the point of failure. Use Retry to restart a failed job without losing configuration.


When does billing start?

Billing starts when the training job begins executing on the GPU — not when it enters the queue. There is no charge while the job is in a pending or queued state.


How can I reduce training costs?

  • Enable 4-bit quantization to reduce memory usage and potentially run on a smaller, lower-cost GPU.
  • Start with fewer epochs to validate your dataset and configuration before running the full job.
  • Use the Clone feature to reuse configurations and iterate without re-entering settings.
  • Terminate a job early if training metrics indicate overfitting or an incorrect configuration.

Is there a minimum charge per job?

Billing is prorated to the billing period. You are charged only for actual GPU compute time used, with no minimum duration.