# Deploy model endpoint for Stable Video Diffusion XT Stable Video Diffusion (SVD) is an image-to-video generation model that creates 2–4 second high-resolution (576x1024) videos conditioned on an input image. In this tutorial, we’ll deploy a model endpoint for **Stable Video Diffusion XT**.
Input Image
Output Video
The tutorial covers: * Step-by-step guide to create a model endpoint and generate videos using **Stable Video Diffusion XT** * Creating a model endpoint with custom model weights * Supported parameters for video generation For this tutorial, we’ll use the **Stable Video Diffusion XT** pre-built container. The pre-built container automatically handles the API setup, so no custom API handler is required. --- ## A guide on model endpoint creation and video generation ### Step 1: Create a model endpoint 1. Go to the [AI Platform](https://tir.e2enetworks.com). 2. Select your project. 3. Navigate to **Model Endpoints** → **Create Endpoint**. 4. Choose the **Stable Video Diffusion XT** model card. 5. Select an appropriate GPU plan. 6. You can keep default values for replicas, disk size, and endpoint details. 7. Add environment variables if needed, else continue. 8. For now, skip **Model Details** to use default model weights. (See [Creating model endpoint with custom model weights](#creating-model-endpoint-with-custom-model-weights) for using custom ones.) 9. Complete the endpoint creation. --- ### Step 2: Generate your API token You need an **Auth Token** to access the endpoint. 1. Go to **API Tokens** under your project. 2. Click **Create Token** or use an existing one. 3. Copy the generated **Auth Token** for use in API requests. --- ### Step 3: Generate videos using a prompt image Once your endpoint is **Ready**, test it using the **Sample API Request**. 1. Launch a **Notebook** with **PyTorch** or any suitable image. 2. Open a new notebook (`untitled.ipynb`) in Jupyter Labs. 3. Paste the sample code from the endpoint or use the example below:
Click to expand code ```python import requests import json import base64 auth_token = "" input_image_path = 'car.png' video_output_path = 'video_output.avi' video_fps = 10 url = "https://" with open(input_image_path, 'rb') as image_file: image_base64 = base64.b64encode(image_file.read()).decode('utf-8') payload = json.dumps({ "fps": video_fps, "image": image_base64 }) headers = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {auth_token}' } response = requests.post(url, headers=headers, data=payload) video_bytes = base64.b64decode(response.json().get('predictions')) with open(video_output_path, "wb") as video_file: video_file.write(video_bytes) ```
Execute the script, and your generated video will be saved to the specified output path. That’s it — your **Stable Video Diffusion XT** endpoint is ready! You can experiment with different prompts or adjust the payload parameters for diverse results. --- ## Creating model endpoint with custom model weights To create an endpoint with custom weights for **Stable Video Diffusion XT**: 1. Download the [stable-video-diffusion-img2vid-xt model](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). 2. Upload it to your **Model Bucket (EOS)**. 3. Deploy a new endpoint using those weights. ### Step 1.1: Define a model in the dashboard 1. Go to the [AI Platform](https://tir.e2enetworks.com). 2. Choose your project. 3. Navigate to **Models** → **Create Model**. 4. Name it (e.g., `stable-video-diffusion`). 5. Select **Model Type: Custom**. 6. Click **CREATE**. 7. Copy the **Setup Host** command from **Setup MinIO CLI**. --- ### Step 1.2: Start a new notebook 1. Launch a **Diffusers Image Notebook** with GPU (recommended). 2. In Jupyter Labs, open **Terminal**. 3. Run the **MinIO setup** command. 4. Confirm `mc` CLI is configured successfully. --- ### Step 1.3: Download the Stable Video Diffusion XT model
Click to expand code ```python import torch from diffusers import StableVideoDiffusionPipeline from diffusers.utils import load_image, export_to_video pipe = StableVideoDiffusionPipeline.from_pretrained( "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16" ) pipe.enable_model_cpu_offload() image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png") image = image.resize((1024, 576)) generator = torch.manual_seed(42) frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0] export_to_video(frames, "generated.mp4", fps=7) ```
```bash pip install diffusers transformers accelerate ``` --- ## Step 2: Upload the model to EOS
Click to expand code ```bash cd $HOME/.cache/huggingface/hub/models--stabilityai--stable-video-diffusion-img2vid-xt/snapshots mc cp -r * stable-video-diffusion/stable-video-diffusion-854588 ```
If upload fails, verify model directory using: ```bash ls $HOME/.cache/huggingface/hub ``` --- ## Step 3: Create an endpoint for your model After upload, return to the dashboard and create a new endpoint. Select the uploaded model in **Model Details**. If your model isn’t in the root directory, provide the correct path where `model_index.json` exists. --- ## Supported parameters for video generation Below are the supported parameters accepted in the request payload: ### Required parameters * **image** (base64): Base64-encoded input image. * **fps** (int): Frames per second of the output video. ### Advanced parameters * **vae**: Variational Auto-Encoder (VAE) used to encode and decode images. * **image_encoder**: CLIP image encoder. * **unet**: UNetSpatioTemporalConditionModel for denoising image latents. * **scheduler**: EulerDiscreteScheduler used with UNet. * **feature_extractor**: CLIPImageProcessor for feature extraction. --- ## Troubleshooting and best practices * Ensure your Auth Token and endpoint URL are correct. * Use `.avi` as output format. * Validate GPU resources before large-scale runs. * Verify the EOS bucket path if model load fails. * Check API logs for troubleshooting failed requests. ---