Step by Step Guide to Fine Tune Models

Introduction

Fine-tuning a model refers to the process of taking a pre-trained machine learning model and further training it on a specific task or dataset to adapt it to the nuances of that particular domain. The term is commonly used in the context of transfer learning, where a model trained on a large and diverse dataset (pre-training) is adjusted to perform a specific task or work with a specific dataset (fine-tuning).

What is Fine-Tuning ?

Fine-tuning refers to the process of modifying a pre-existing, pre-trained model to cater to a new, specific task by training it on a smaller dataset related to the new task. This approach leverages the existing knowledge gained from the pre-training phase, thereby reducing the need for extensive data and resources.

In the Context of Neural Networks and Deep Learning: In the specific context of neural networks and deep learning, fine-tuning is typically executed by adjusting the parameters of a pre-trained model. This adjustment is made using a smaller, task-specific dataset. The pre-trained model, having already learned a set of features from a large dataset, is further trained on the new dataset to adapt these features to the new task.

How to Create a Fine-Tuning Job ?

To initiate the Fine-Tuning Job process, first, the user should navigate to the sidebar section and select Foundation Studio Upon selecting Foundation Studio, a dropdown menu will appear, featuring an option labeled Fine-Tune Models.

Upon clicking the Fine-Tune Models option, the user will be directed to the “Manage Fine-Tuning Jobs” page.

../_images/Fine_Tuning.png

After redirect to the “Manage Fine-Tuning-Jobs”, On this page, users can locate and click on the Create Fine-Tuning Job button or Click-here button for create Fine-Tune-models.

../_images/create_fine_tuning.png

After clicking the ‘Create Fine-Tuning Job’ button, the ‘Create Fine-Tuning Job’ page will open. On this page, there are several option such as Job-Name, Model and Hugging Face Token.

../_images/jobmodel_config.png

If the user already has an integration with Hugging Face token, they can select it from the dropdown options. If the user does not have any integration setup with Hugging Face, they can click on the Create New link, and the Create Integration page will open. After adding a token, user can move to next stage by clicking on next button.

If user doesn’t have a Hugging Face token, They would not be able to access certain services In this case, I would need to sign up for an account on Hugging Face and obtain an API token to use their services.

To obtain a Hugging Face token, you can follow these steps:

  1. Go to the Hugging Face website and create an account if you haven’t already.

  2. Once you have created an account, log in and go to your account settings.

  3. Click on the Tokens tab.

  4. Click on the “New Access Token” button.

  5. Give your token a name and select the permissions you want to grant to the token.

  6. Click on the “Create” button.

  7. Your new token will be displayed. Make sure to copy it and store it in a safe place, as you will not be able to see it again after you close the window.

../_images/create_integration.png

Note

Some model are available for commercial use but requires access granted by their Custodian/Administrator (creator/maintainer of this model). You can visit the model card on huggingface to initiate the process.

How to define Dataset-Preparation ?

After defining the Job Model configuration, the users can move on to next section for Dataset Preparation. The Dataset page will open, providing several options such as Select Task, Dataset Type, Choose a Dataset, Validation Split Ratio and Prompt Configuration. Once these options are filled, the dataset preparation configuration will be set and the user can move to next section.

../_images/dataset_preparation.png

Dataset Type

In the Dataset Type, you can select either CUSTOM or HUGGING FACE as the dataset type. The CUSTOM Dataset Type allows training models with user-provided data, offering flexibility for unique tasks. Alternatively, the HuggingFace option provides a variety of pre-existing datasets, enabling convenient selection and utilization for model training.

../_images/datasettype1.png

Choose a Dataset

CUSTOM

If you select dataset type as CUSTOM, you have to choose a user-defined dataset by clicking CHOOSE button.

../_images/customdataset.png

After clicking on CHOOSE button, you will see the below screen if you have already objects in that particular selected dataset To ensure dataset compatibility, it is recommended to maintain your data in the .jsonl file format. This line-oriented JSON format enhances readability and facilitates seamless data processing, making it a professional choice for machine learning tasks. Please verify and convert your dataset to the .jsonl extension prior to model training.

Note

It is crucial to pass the appropriate labels during prompt configuration.

../_images/select_object.png

or you can create a new dataset by clicking click here link.

Note

The listed datasets here use in EOS Bucket for data storage.

../_images/select_dataset1.png

If you click on the click here link, you can create a new dataset. After clicking, you’ll be able to create a new dataset and click on the CREATE button.

../_images/create_new_dataset.png
  • For Stable Diffusion Model ,the dataset provided should be of below format. One metadata file mapping images with the context text and the list of images in png format

folder_name/metadata.jsonl
folder_name/0001.png
folder_name/0002.png
folder_name/0003.png
  • structure of metadata.jsonl file

{"file_name": "0001.png", "text": "This is a first value of a text feature you added to your image"}
{"file_name": "0002.png", "text": "This is a second value of a text feature you added to your image"}
{"file_name": "0003.png", "text": "This is a third value of a text feature you added to your image"}

Note

Uploading incorrect dataset format will result into finetuning run failure

  • For Text Models like llama and mistral ,the dataset provided should be of below format.One metadata file mapping images with the context text and the list of images in png format

{"input": "What color is the sky?", "output": "The sky is blue."}
{"input": "Where is the best place to get cloud GPUs?", "output": "E2E Networks"}

Note

Uploading incorrect dataset format will result into finetuning run failure.

Note

Here, eg: labels “input” and “output” will be provided in prompt configuration

eg:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

###Instruction:[input]

###Response:[output]

UPLOAD DATASET

After selecting dataset ,You can upload objects in a particular dataset by selecting dataset and clicking on UPLOAD DATASET button.

../_images/upload_dataset1.png

Click on UPLOAD DATASET button and upload objects and click on UPLOAD button.

../_images/click_upload.png

Click on OK button

../_images/click_ok.png

After uploading objects to a specific dataset, choose a particular file to continue and then click on SUBMIT button.

../_images/select_object.png

HUGGING FACE

When opting for the predefined dataset type HUGGING FACE, users can conveniently select a dataset from the available collection. Subsequently, the model training process can be initiated using the chosen dataset, streamlining the workflow and enhancing efficiency.

../_images/hugging_face.png

How to define a Hyperparameter Configuration ?

Upon providing the dataset preparation details, users are directed to the Hyperparameter Configuration page. This interface allows users to customize the training process by specifying desired hyperparameters, thereby facilitating effective hyperparameter tuning. The form provided enables the selection of various hyperparameters, including but not limited to training type, epoch, learning rate, and max steps. Please fill out the form meticulously to optimize the model training process.

../_images/hyperparameter.png

In addition to the standard hyperparameters, the configuration page offers advanced options such as batch size and gradient accumulation steps. These settings can be utilized to further refine the training process. Users are encouraged to explore and employ these advanced options as needed to achieve optimal model performance.

../_images/Advance.png

Upon specifying the advanced settings, users are advised to leverage the WandB Integration feature for comprehensive job tracking. This involves proceeding to fill in the necessary details in the provided interface. By doing so, users can effectively monitor and manage the model training process, ensuring transparency and control throughout the lifecycle of the job. Also they can describe the ‘debug’ option as desired.

../_images/Tranck-fine-tuning.png ../_images/create-fine-tuning2.png

Once the debug option has been thoroughly addressed, users are required to select their preferred machine configuration for the finetuning job. Subsequently, clicking on the LAUNCH button will initiate or schedule the job, depending on the chosen settings. To ensure fast and precise training, a variety of high-performance GPUs, such as Nvidia H100 and A100, are available for selection. This allows users to optimize their resources and accelerate the model training process.

../_images/machine.png

Viewing your Job parameters and Finetuned models

On completion of job, a Fine-Tuned model will be created and will be shown in models section in lower section of the page. This finetuned model repo will contain all checks-points of model training as well as adapters built during training. Users if they desire, can also directly go to model repo page under inference to view it.

Models

../_images/fine_tune_models.png

Overview

In Overview section, you can see the fine-tuning job details.

../_images/fine_tuned_model_repo.png

Events

Under the event section, you can view recent pod activities such as scheduling, container start, and more.

../_images/fine_tuning_events.png

Logs

Fine-tuning logs contain detailed information about the training process, enabling users to monitor progress, diagnose issues, and optimize performance effectively. They serve as a comprehensive record of the training process.

../_images/finetuning_logs.png

Metrics

Under the metrics section, you can view the resource utilization of the pod, such as GPU utilization, GPU memory usage, and more.

../_images/fine_tuned_model_repo.png

Actions

Retry

If your fine-tuning job fails, you can perform retry actions to restart the fine-tuning process.

../_images/finetuning_click_retry.png

After clicking on retry icon, you can see the retry popup and click on continue button to restart the process.

../_images/fine_tunning_popup_retry.png

Terminate

Select the particular fine-tuning model from the list and click on the Terminate button to terminate the finfine-tuning model.

../_images/fine_tuning_click_terminate.png

After clicking on the Terminate button it will show one popup to terminate the fine-tuning model.you can click continue button to terminate fine-tuning model.

../_images/fine_tuning_terminate_popup.png

Delete

Select the particular fine-tuning model from the list and click on the Delete button to delete the fine-tuning model.

../_images/fine_tuning_click_delete.png

After clicking on the Delete button it will show one popup to delete the fine-tuning model.you can click delete button to delete fine-tuning model.

../_images/fine_tuning_delete_popup.png