--- title: Pricing --- # Pricing Fine-tuning job costs depend on the **GPU plan** you select and the **duration of training**. Pricing is based on compute usage — you are charged for the time your training job runs on the selected GPU. --- ## How billing works | Factor | Description | |--------|-------------| | **GPU type** | The GPU selected (H100, A100, etc.) determines the per-hour rate | | **Training duration** | You are charged for the actual time the job runs on that GPU | | **Billing start** | Begins when the job starts executing on the GPU | | **Billing stop** | Ends when the job completes, fails, or is terminated | Fine-tuning jobs are billed **hourly** based on the GPU compute resources used. There are no charges while a job is queued or pending. --- ## What affects cost - **GPU type** — H100 and A100 have different per-hour rates - **Training duration** — More epochs and larger datasets increase training time - **Quantization** — Enabling 4-bit quantization may reduce training time by lowering memory overhead, potentially allowing use of a smaller GPU --- ## Pricing examples *Note: Values below are illustrative. Use the [E2E Calculator](https://calculator.e2enetworks.com/estimate-pricing) for current rates.* ### Example : Fine-tuning Llama 3 13B on H100 **Scenario:** Fine-tune Llama 3 13B for 2 epochs on a 50,000-row dataset. Estimated training time: ~8 hours. | Factor | Value | |--------|-------| | GPU | H100 | | Training duration | ~8 hours | **Billing:** Cost = 8 hours × (price per H100 hour) **Recommendation:** Use H100 for larger models where training speed matters; the higher per-hour rate is often offset by shorter job duration. --- For detailed pricing, visit the [E2E Calculator](https://calculator.e2enetworks.com/estimate-pricing). --- ## Frequently Asked Questions #### Am I charged if my job fails? Yes. You are charged for the compute time used up to the point of failure. Use **Retry** to restart a failed job without losing configuration. --- #### When does billing start? Billing starts when the training job begins executing on the GPU — not when it enters the queue. There is no charge while the job is in a pending or queued state. --- #### How can I reduce training costs? - Enable **4-bit quantization** to reduce memory usage and potentially run on a smaller, lower-cost GPU. - Start with **fewer epochs** to validate your dataset and configuration before running the full job. - Use the **Clone** feature to reuse configurations and iterate without re-entering settings. - **Terminate** a job early if training metrics indicate overfitting or an incorrect configuration. --- #### Is there a minimum charge per job? Billing is prorated to the billing period. You are charged only for actual GPU compute time used, with no minimum duration. --- ---