Pricing
Fine-tuning job costs depend on the GPU plan you select and the duration of training. Pricing is based on compute usage — you are charged for the time your training job runs on the selected GPU.
How billing works
| Factor | Description |
|---|---|
| GPU type | The GPU selected (H100, A100, etc.) determines the per-hour rate |
| Training duration | You are charged for the actual time the job runs on that GPU |
| Billing start | Begins when the job starts executing on the GPU |
| Billing stop | Ends when the job completes, fails, or is terminated |
Fine-tuning jobs are billed hourly based on the GPU compute resources used. There are no charges while a job is queued or pending.
What affects cost
- GPU type — H100 and A100 have different per-hour rates
- Training duration — More epochs and larger datasets increase training time
- Quantization — Enabling 4-bit quantization may reduce training time by lowering memory overhead, potentially allowing use of a smaller GPU
Pricing examples
Note: Values below are illustrative. Use the E2E Calculator for current rates.
Example : Fine-tuning Llama 3 13B on H100
Scenario: Fine-tune Llama 3 13B for 2 epochs on a 50,000-row dataset. Estimated training time: ~8 hours.
| Factor | Value |
|---|---|
| GPU | H100 |
| Training duration | ~8 hours |
Billing: Cost = 8 hours × (price per H100 hour)
Recommendation: Use H100 for larger models where training speed matters; the higher per-hour rate is often offset by shorter job duration.
For detailed pricing, visit the E2E Calculator.
Frequently Asked Questions
Am I charged if my job fails?
Yes. You are charged for the compute time used up to the point of failure. Use Retry to restart a failed job without losing configuration.
When does billing start?
Billing starts when the training job begins executing on the GPU — not when it enters the queue. There is no charge while the job is in a pending or queued state.
How can I reduce training costs?
- Enable 4-bit quantization to reduce memory usage and potentially run on a smaller, lower-cost GPU.
- Start with fewer epochs to validate your dataset and configuration before running the full job.
- Use the Clone feature to reuse configurations and iterate without re-entering settings.
- Terminate a job early if training metrics indicate overfitting or an incorrect configuration.
Is there a minimum charge per job?
Billing is prorated to the billing period. You are charged only for actual GPU compute time used, with no minimum duration.