Deploy Model Endpoint for Stable Video Diffusion xt

Stable Video Diffusion (SVD) is a powerful image-to-video generation model that can generate 2-4 second high resolution (576x1024) videos conditioned on an input image.
In this tutorial, we will create a model endpoint against Stability AI's Stable Video Diffusion xt model.

Input Image

Output Video

The tutorial will mainly focus on the following:

For the scope of this tutorial, we will use the pre-built container (Stable Video Diffusion xt) for the model endpoint, but you may choose to create your own custom container by following this tutorial.

In most cases, the pre-built container would work for your use case. The advantage is - you won't have to worry about building an API handler. The API handler will be automatically created for you.

So let's get started!

A guide on Model Endpoint creation and Image generation

Step 1: Create a Model Endpoint for Stable Video Diffusion xt on TIR

Go to TIR AI Platform
Choose a project.
Go to Model Endpoints section.
Click on Create Endpoint button on the top-right corner.
Choose Stable Video Diffusion xt model card in the Choose Framework section.
Pick any suitable GPU plan of your choice. You can proceed with the default values for replicas, disk-size, and endpoint details.
Add your environment variables, if any. Else, proceed further.
Model Details: For now, we will skip the model details and continue with the default model weights.

If you wish to load your custom model weights (fine-tuned or not), select the appropriate model. (See the Creating Model endpoint with custom model weights section below).
Complete the endpoint creation.

Step 2: Generate your API TOKEN

The model endpoint API requires a valid auth token which you'll need to perform further steps. So, let's generate one.

Go to API Tokens section under the project.
Create a new API Token. by clicking on the Create Token button on the top-right corner. You can also use an existing token, if already created.
Once created, you'll be able to see the list of API Tokens containing the API Key and Auth Token. You will need this Auth Token in the next step.

Step 3: Generate Videos using a Prompt Image

The final step is to send API requests to the created model endpoint and generate a video using an image prompt. We will use TIR Notebook to do the same.

Once your model endpoint is Ready, visit the Sample API Request section of that model endpoint and copy the Python code.
Launch a TIR Notebook with PyTorch or any appropriate image with any basic machine plan. Once it is in Running state, launch it, and start a new notebook untitled.ipynb in the Jupyter labs.
Paste the Sample API Request code (for Python) in the notebook cell. Below is the sample code:

import requests
import json
import base64

# mandatory fields
auth_token = "<your-auth-token>"  
input_image_path = 'car.png'    # local path for providing input image
video_output_path = 'video_output.avi'    # local path for storing video output
video_fps = 10  # fps of the output video

url = "https://jupyterlabs.e2enetworks.net/project/p-681/endpoint/is-2242/v1/models/stable-video-diffusion-img2vid-xt:predict"
# Read image file and encode it to a base64 string
with open(input_image_path, 'rb') as image_file:
    image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

payload = json.dumps({
    "fps": video_fps,
    "image": image_base64
})

headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {auth_token}'
}

response = requests.request("POST", url, headers=headers, data=payload)  # response contains video in base64

video_bytes = base64.b64decode(response.json().get('predictions'))  # saving the video in output video path
with open(video_output_path, "wb") as video_file:
    video_file.write(video_bytes)

Copy the Auth Token generated in Step-2 & use it in place of $AUTH_TOKEN in the Sample API Request
Also mention appropriate input_image_path, video_output_path, video_fps in the above python script.

Note

Video output file should of .avi extension. As of now we are supporting only .avi files as output.

Execute the code & send request
You can view your video which has been downloaded in the path mentioned.

That's it! Your Stable Video Diffusion Xt model endpoint is up & ready for inference.

You can also try providing different prompts & see the generated images. Besides prompt, the model also supports various other parameters for video generation. Simply add new key and the values in the the payload of the above code. See the Supported parameters for image generation <#supported-parameters-for-image-generation>__ section below.

Creating Model Endpoint with Custom Model Weights

Overview

To create inference against the Stable Video Diffusion Xt model with custom model weights, we will:

Download the stable-video-diffusion-xt (by Stability AI) model from Hugging Face.
Upload the model to the Model Bucket (EOS).
Create an inference endpoint (model endpoint) in TIR to serve API requests.

Step 1.1: Define a Model in TIR Dashboard

Before we proceed with downloading or fine-tuning (optional) the model weights, let us first define a model in the TIR dashboard.

Instructions

Go to the TIR AI Platform.
Choose a project.
Navigate to the Model section.
Click on Create Model.
Enter a model name of your choosing (e.g., stable-video-diffusion).
Select Model Type as Custom.
Click on CREATE.

After completing these steps, you will see the details of the EOS (E2E Object Storage) bucket created for this model.

EOS provides a S3 compatible API to upload or download content. We will be using the MinIO CLI in this tutorial.

Next Steps

Copy the Setup Host command from the Setup Minio CLI tab to a notepad or leave it in the clipboard. We will soon use it to set up MinIO CLI.

Additional Information

Execute the code and send the request.
You can view your video, which has been downloaded in the path mentioned.

That's it! Your Stable Video Diffusion Xt model endpoint is up and ready for inference.

You can also try providing different prompts to see the generated images. Besides prompts, the model also supports various other parameters for video generation. Simply add new keys and values in the payload of the above code. See the Supported parameters for image generation section below.

Note

In case you forget to copy the setup host command for MinIO CLI, don't worry. You can always go back to model details and get it again.

Steps to Work with Stable Video Diffusion Xt Model

Step 1.2: Start a New Notebook

To work with the model weights, we will need to first download them to a local machine or a notebook instance.

Instructions

In the TIR Dashboard, go to Notebooks.
Launch a new Notebook with the Diffusers Image and a hardware plan (e.g., A10080). We recommend a GPU plan if you plan to test or fine-tune the model.
Click on the Notebook name or the Launch Notebook option to start the Jupyter Labs environment.
In the Jupyter Labs, click New Launcher and select Terminal.
Now, paste and run the command for setting up the MinIO CLI Host from Step 1.
If the command works, you will have the mc CLI ready for uploading our model.

Step 1.3: Download the Stable Video Diffusion Xt Model from Notebook

Now, our EOS bucket will store the model weights. Let us download the weights from Hugging Face.

Instructions

Start a new notebook named untitled.ipynb in Jupyter Labs.

Run the below commands. The model will be downloaded by the Hugging Face SDK in the $HOME/.cache folder.

import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Note

If you face any issues running above code in the notebook cell, you may be missing required libraries. This may happen if you did not launch the notebook with Diffusers image. In such situation, you can install the required libraries below:

Install Required Packages

To use the Stable Video Diffusion Xt model, you need to install the necessary packages. Run the following command in your notebook:

pip install diffusers transformers accelerate

Step 2: Upload the Model to Model Bucket (EOS)

Now that the model works as expected, you can fine-tune it with your own data or choose to serve the model as-is. This tutorial assumes you are uploading the model as-is to create an inference endpoint. In case you fine-tune the model, you can follow similar steps to upload the model to the EOS bucket.

Instructions

Go to the directory that has the Hugging Face model code:

    cd $HOME/.cache/huggingface/hub/models--stabilityai--stable-video-diffusion-img2vid-xt/snapshots

   # push the contents of the folder to EOS bucket. 
   # Go to TIR Dashboard >> Models >> Select your model >> Copy the cp command from **Setup MinIO CLI** tab. 
  
   # The copy command would look like this:
   # mc cp -r <MODEL_NAME> stable-video-diffusion/stable-video-diffusion-854588
  
   # here we replace <MODEL_NAME> with '*' to upload all contents of snapshots folder 

   mc cp -r * stable-video-diffusion/stable-video-diffusion-854588 

    $HOME/.cache/huggingface/hub

Step 3: Create an Endpoint for Our Model

With the model weights uploaded to the TIR Model's EOS Bucket, what remains is to launch the endpoint and serve API requests.

Head back to the section on a Guide on Model Endpoint Creation and Image Generation above and follow the steps to create the endpoint for your model.

While creating the endpoint, make sure you select the appropriate model in the model details sub-section, i.e., the EOS bucket containing your model weights. If your model is not in the root directory of the bucket, ensure to specify the path where the model is saved in the bucket.

Follow the Steps Below to Find the Model Path in the Bucket:

Go to MyAccount Object Storage.
Find your Model bucket (in this case: stable-video-diffusion-854588) and click on its Objects tab.
If the model_index.json file is present in the list of objects, then your model is in the root directory, and you need not provide any Model Path.
Otherwise, navigate to the folder and find the model_index.json file, copy its path, and paste it in the Model Path field.
You can click the Validate button to check the existence of the model at the given path.

Supported Parameters for Image Generation

Below is a brief description of the supported parameters. These parameters are passed in the payload dictionary as shown in the above script.

Required Parameters

image (base64): An image in base64 format.
fps (int): Frames per second of the video to be generated.

Advanced Parameters

The model supports additional optional parameters that you can include in the request payload to generate video:

vae (AutoencoderKLTemporalDecoder): Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations.
image_encoder (CLIPVisionModelWithProjection): Frozen CLIP image encoder (laion/CLIP-ViT-H-14-laion2B-s32B-b79K).
unet (UNetSpatioTemporalConditionModel): A UNetSpatioTemporalConditionModel to denoise the encoded image latents.
scheduler (EulerDiscreteScheduler): A scheduler to be used in combination with unet to denoise the encoded image latents.
feature_extractor (CLIPImageProcessor): A CLIPImageProcessor to extract features from generated images.

A guide on Model Endpoint creation and Image generation​

Step 1: Create a Model Endpoint for Stable Video Diffusion xt on TIR​

Step 2: Generate your API TOKEN​

Step 3: Generate Videos using a Prompt Image​

Creating Model Endpoint with Custom Model Weights​

Overview​

Step 1.1: Define a Model in TIR Dashboard​

Instructions​

Next Steps​

Additional Information​

Steps to Work with Stable Video Diffusion Xt Model

Step 1.2: Start a New Notebook​

Instructions​

Step 1.3: Download the Stable Video Diffusion Xt Model from Notebook​

Instructions​

Install Required Packages​

Step 2: Upload the Model to Model Bucket (EOS)​

Instructions​

Step 3: Create an Endpoint for Our Model​

Follow the Steps Below to Find the Model Path in the Bucket:​

Supported Parameters for Image Generation​

Required Parameters​

Advanced Parameters​

A guide on Model Endpoint creation and Image generation

Step 1: Create a Model Endpoint for Stable Video Diffusion xt on TIR

Step 2: Generate your API TOKEN

Step 3: Generate Videos using a Prompt Image

Creating Model Endpoint with Custom Model Weights

Overview

Step 1.1: Define a Model in TIR Dashboard

Instructions

Next Steps

Additional Information

Step 1.2: Start a New Notebook

Instructions

Step 1.3: Download the Stable Video Diffusion Xt Model from Notebook

Instructions

Install Required Packages

Step 2: Upload the Model to Model Bucket (EOS)

Instructions

Step 3: Create an Endpoint for Our Model

Follow the Steps Below to Find the Model Path in the Bucket:

Supported Parameters for Image Generation

Required Parameters

Advanced Parameters