Skip to main content

Deploy model endpoint for Stable Video Diffusion XT

Stable Video Diffusion (SVD) is an image-to-video generation model that creates 2–4 second high-resolution (576x1024) videos conditioned on an input image. In this tutorial, we’ll deploy a model endpoint for Stable Video Diffusion XT.

Input Image
Output Video

The tutorial covers:

  • Step-by-step guide to create a model endpoint and generate videos using Stable Video Diffusion XT
  • Creating a model endpoint with custom model weights
  • Supported parameters for video generation

For this tutorial, we’ll use the Stable Video Diffusion XT pre-built container. The pre-built container automatically handles the API setup, so no custom API handler is required.


A guide on model endpoint creation and video generation

Step 1: Create a model endpoint

  1. Go to the AI Platform.
  2. Select your project.
  3. Navigate to Model EndpointsCreate Endpoint.
  4. Choose the Stable Video Diffusion XT model card.
  5. Select an appropriate GPU plan.
  6. You can keep default values for replicas, disk size, and endpoint details.
  7. Add environment variables if needed, else continue.
  8. For now, skip Model Details to use default model weights. (See Creating model endpoint with custom model weights for using custom ones.)
  9. Complete the endpoint creation.

Step 2: Generate your API token

You need an Auth Token to access the endpoint.

  1. Go to API Tokens under your project.
  2. Click Create Token or use an existing one.
  3. Copy the generated Auth Token for use in API requests.

Step 3: Generate videos using a prompt image

Once your endpoint is Ready, test it using the Sample API Request.

  1. Launch a Notebook with PyTorch or any suitable image.
  2. Open a new notebook (untitled.ipynb) in Jupyter Labs.
  3. Paste the sample code from the endpoint or use the example below:
Click to expand code
import requests
import json
import base64

auth_token = "<your-auth-token>"
input_image_path = 'car.png'
video_output_path = 'video_output.avi'
video_fps = 10

url = "https://<your-endpoint-url>"

with open(input_image_path, 'rb') as image_file:
image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

payload = json.dumps({
"fps": video_fps,
"image": image_base64
})

headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {auth_token}'
}

response = requests.post(url, headers=headers, data=payload)

video_bytes = base64.b64decode(response.json().get('predictions'))
with open(video_output_path, "wb") as video_file:
video_file.write(video_bytes)

Execute the script, and your generated video will be saved to the specified output path.

That’s it — your Stable Video Diffusion XT endpoint is ready!

You can experiment with different prompts or adjust the payload parameters for diverse results.


Creating model endpoint with custom model weights

To create an endpoint with custom weights for Stable Video Diffusion XT:

  1. Download the stable-video-diffusion-img2vid-xt model.
  2. Upload it to your Model Bucket (EOS).
  3. Deploy a new endpoint using those weights.

Step 1.1: Define a model in the dashboard

  1. Go to the AI Platform.
  2. Choose your project.
  3. Navigate to ModelsCreate Model.
  4. Name it (e.g., stable-video-diffusion).
  5. Select Model Type: Custom.
  6. Click CREATE.
  7. Copy the Setup Host command from Setup MinIO CLI.

Step 1.2: Start a new notebook

  1. Launch a Diffusers Image Notebook with GPU (recommended).
  2. In Jupyter Labs, open Terminal.
  3. Run the MinIO setup command.
  4. Confirm mc CLI is configured successfully.

Step 1.3: Download the Stable Video Diffusion XT model

Click to expand code
import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)
pip install diffusers transformers accelerate

Step 2: Upload the model to EOS

Click to expand code
cd $HOME/.cache/huggingface/hub/models--stabilityai--stable-video-diffusion-img2vid-xt/snapshots
mc cp -r * stable-video-diffusion/stable-video-diffusion-854588

If upload fails, verify model directory using:

ls $HOME/.cache/huggingface/hub

Step 3: Create an endpoint for your model

After upload, return to the dashboard and create a new endpoint. Select the uploaded model in Model Details. If your model isn’t in the root directory, provide the correct path where model_index.json exists.


Supported parameters for video generation

Below are the supported parameters accepted in the request payload:

Required parameters

  • image (base64): Base64-encoded input image.
  • fps (int): Frames per second of the output video.

Advanced parameters

  • vae: Variational Auto-Encoder (VAE) used to encode and decode images.
  • image_encoder: CLIP image encoder.
  • unet: UNetSpatioTemporalConditionModel for denoising image latents.
  • scheduler: EulerDiscreteScheduler used with UNet.
  • feature_extractor: CLIPImageProcessor for feature extraction.

Troubleshooting and best practices

  • Ensure your Auth Token and endpoint URL are correct.
  • Use .avi as output format.
  • Validate GPU resources before large-scale runs.
  • Verify the EOS bucket path if model load fails.
  • Check API logs for troubleshooting failed requests.