Pricing

Fine-tuning job costs depend on the GPU plan you select and the duration of training. Pricing is based on compute usage — you are charged for the time your training job runs on the selected GPU.

How billing works

Factor	Description
GPU type	The GPU selected (H100, A100, etc.) determines the per-hour rate
Training duration	You are charged for the actual time the job runs on that GPU
Billing start	Begins when the job starts executing on the GPU
Billing stop	Ends when the job completes, fails, or is terminated

Fine-tuning jobs are billed hourly based on the GPU compute resources used. There are no charges while a job is queued or pending.

What affects cost

GPU type — H100 and A100 have different per-hour rates
Training duration — More epochs and larger datasets increase training time
Quantization — Enabling 4-bit quantization may reduce training time by lowering memory overhead, potentially allowing use of a smaller GPU

Pricing examples

Note: Values below are illustrative. Use the E2E Calculator for current rates.

Example : Fine-tuning Llama 3 13B on H100

Scenario: Fine-tune Llama 3 13B for 2 epochs on a 50,000-row dataset. Estimated training time: ~8 hours.

Factor	Value
GPU	H100
Training duration	~8 hours

Billing: Cost = 8 hours × (price per H100 hour)

Recommendation: Use H100 for larger models where training speed matters; the higher per-hour rate is often offset by shorter job duration.

For detailed pricing, visit the E2E Calculator.

Frequently Asked Questions

Am I charged if my job fails?

Yes. You are charged for the compute time used up to the point of failure. Use Retry to restart a failed job without losing configuration.

When does billing start?

Billing starts when the training job begins executing on the GPU — not when it enters the queue. There is no charge while the job is in a pending or queued state.

How can I reduce training costs?

Enable 4-bit quantization to reduce memory usage and potentially run on a smaller, lower-cost GPU.
Start with fewer epochs to validate your dataset and configuration before running the full job.
Use the Clone feature to reuse configurations and iterate without re-entering settings.
Terminate a job early if training metrics indicate overfitting or an incorrect configuration.

Is there a minimum charge per job?

Billing is prorated to the billing period. You are charged only for actual GPU compute time used, with no minimum duration.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

How billing works​

What affects cost​

Pricing examples​

Example : Fine-tuning Llama 3 13B on H100​

Frequently Asked Questions​

Am I charged if my job fails?​

When does billing start?​

How can I reduce training costs?​

Is there a minimum charge per job?​

How billing works

What affects cost

Pricing examples

Example : Fine-tuning Llama 3 13B on H100

Frequently Asked Questions

Am I charged if my job fails?

When does billing start?

How can I reduce training costs?

Is there a minimum charge per job?