# Deploy model endpoint for Stable Video Diffusion XT

Stable Video Diffusion (SVD) is an image-to-video generation model that creates 2–4 second high-resolution (576x1024) videos conditioned on an input image. In this tutorial, we’ll deploy a model endpoint for **Stable Video Diffusion XT**.

<div style={{ display: 'flex', alignItems: 'flex-start', padding: '20px' }}>
  <div style={{ textAlign: 'center', marginTop: '50px' }}>
    <img width="320" height="240" src="/img/car.png" />
    <div>Input Image</div>
  </div>
  <div style={{ textAlign: 'center', marginTop: '50px' }}>
    <video width="320" height="240" controls style={{ objectFit: 'cover' }}>
      <source src="/videos/video_output.mp4" type="video/mp4" />
      Your browser does not support the video tag.
    </video>
    <div>Output Video</div>
  </div>
</div>

The tutorial covers:

* Step-by-step guide to create a model endpoint and generate videos using **Stable Video Diffusion XT**
* Creating a model endpoint with custom model weights
* Supported parameters for video generation

For this tutorial, we’ll use the **Stable Video Diffusion XT** pre-built container. The pre-built container automatically handles the API setup, so no custom API handler is required.

---

## A guide on model endpoint creation and video generation

### Step 1: Create a model endpoint

1. Go to the [AI Platform](https://tir.e2enetworks.com).
2. Select your project.
3. Navigate to **Model Endpoints** → **Create Endpoint**.
4. Choose the **Stable Video Diffusion XT** model card.
5. Select an appropriate GPU plan.
6. You can keep default values for replicas, disk size, and endpoint details.
7. Add environment variables if needed, else continue.
8. For now, skip **Model Details** to use default model weights. (See [Creating model endpoint with custom model weights](#creating-model-endpoint-with-custom-model-weights) for using custom ones.)
9. Complete the endpoint creation.

---

### Step 2: Generate your API token

You need an **Auth Token** to access the endpoint.

1. Go to **API Tokens** under your project.
2. Click **Create Token** or use an existing one.
3. Copy the generated **Auth Token** for use in API requests.

---

### Step 3: Generate videos using a prompt image

Once your endpoint is **Ready**, test it using the **Sample API Request**.

1. Launch a **Notebook** with **PyTorch** or any suitable image.
2. Open a new notebook (`untitled.ipynb`) in Jupyter Labs.
3. Paste the sample code from the endpoint or use the example below:

<details>
<summary>Click to expand code</summary>

```python
import requests
import json
import base64

auth_token = "<your-auth-token>"
input_image_path = 'car.png'
video_output_path = 'video_output.avi'
video_fps = 10

url = "https://<your-endpoint-url>"

with open(input_image_path, 'rb') as image_file:
    image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

payload = json.dumps({
    "fps": video_fps,
    "image": image_base64
})

headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {auth_token}'
}

response = requests.post(url, headers=headers, data=payload)

video_bytes = base64.b64decode(response.json().get('predictions'))
with open(video_output_path, "wb") as video_file:
    video_file.write(video_bytes)
```

</details>

Execute the script, and your generated video will be saved to the specified output path.

That’s it — your **Stable Video Diffusion XT** endpoint is ready!

You can experiment with different prompts or adjust the payload parameters for diverse results.

---

## Creating model endpoint with custom model weights

To create an endpoint with custom weights for **Stable Video Diffusion XT**:

1. Download the [stable-video-diffusion-img2vid-xt model](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt).
2. Upload it to your **Model Bucket (EOS)**.
3. Deploy a new endpoint using those weights.

### Step 1.1: Define a model in the dashboard

1. Go to the [AI Platform](https://tir.e2enetworks.com).
2. Choose your project.
3. Navigate to **Models** → **Create Model**.
4. Name it (e.g., `stable-video-diffusion`).
5. Select **Model Type: Custom**.
6. Click **CREATE**.
7. Copy the **Setup Host** command from **Setup MinIO CLI**.

---

### Step 1.2: Start a new notebook

1. Launch a **Diffusers Image Notebook** with GPU (recommended).
2. In Jupyter Labs, open **Terminal**.
3. Run the **MinIO setup** command.
4. Confirm `mc` CLI is configured successfully.

---

### Step 1.3: Download the Stable Video Diffusion XT model

<details>
<summary>Click to expand code</summary>

```python
import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)
```

</details>

```bash
pip install diffusers transformers accelerate
```

---

## Step 2: Upload the model to EOS

<details>
<summary>Click to expand code</summary>

```bash
cd $HOME/.cache/huggingface/hub/models--stabilityai--stable-video-diffusion-img2vid-xt/snapshots
mc cp -r * stable-video-diffusion/stable-video-diffusion-854588
```

</details>

If upload fails, verify model directory using:

```bash
ls $HOME/.cache/huggingface/hub
```

---

## Step 3: Create an endpoint for your model

After upload, return to the dashboard and create a new endpoint.
Select the uploaded model in **Model Details**.
If your model isn’t in the root directory, provide the correct path where `model_index.json` exists.

---

## Supported parameters for video generation

Below are the supported parameters accepted in the request payload:

### Required parameters

* **image** (base64): Base64-encoded input image.
* **fps** (int): Frames per second of the output video.

### Advanced parameters

* **vae**: Variational Auto-Encoder (VAE) used to encode and decode images.
* **image_encoder**: CLIP image encoder.
* **unet**: UNetSpatioTemporalConditionModel for denoising image latents.
* **scheduler**: EulerDiscreteScheduler used with UNet.
* **feature_extractor**: CLIPImageProcessor for feature extraction.

---

## Troubleshooting and best practices

* Ensure your Auth Token and endpoint URL are correct.
* Use `.avi` as output format.
* Validate GPU resources before large-scale runs.
* Verify the EOS bucket path if model load fails.
* Check API logs for troubleshooting failed requests.


---