# Custom Containers Deploy your own custom container image for model inference using an API handler of your choice. ## Overview Use a fully custom container when you need to: * Control how inference requests are processed * Add preprocessing/postprocessing logic * Serve a model not supported by prebuilt containers This guide covers: 1. Writing a custom API handler 2. Packaging it into a container 3. Launching as a Model Endpoint 4. Using uploaded model weights for faster startup --- ## Step 1: Create an API Handler Use KServe protocol for predict API structure. ```python from kserve import Model, ModelServer from typing import Dict class MyCustomModel(Model): def __init__(self, name: str): super().__init__(name) self.ready = False self.load() def load(self): self.model = ... # Load from disk or remote self.ready = True def predict(self, payload: Dict, headers: Dict = None) -> Dict: inputs = payload["instances"] text = inputs[0]["text"] output = ... # Inference return {"predictions": output} if __name__ == "__main__": model = MyCustomModel("custom-model") ModelServer().start([model]) ``` Save this file as: ``` model_server.py ``` --- ## Step 2: Package Into Docker Image Create a Dockerfile: ```dockerfile FROM pytorch/torchserve-kfs:0.8.1-gpu WORKDIR /app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt COPY model_server.py ./ CMD ["python", "model_server.py"] ``` Build & push image: ```bash docker build -t /custom-kserve-model . docker push /custom-kserve-model ``` --- ## Step 3: Launch Custom Container Endpoint 1. Create new Model Endpoint 2. Select **Custom Container** 3. Choose compute plan (GPU recommended) 4. Enter **Container Image URL** 5. Add environment variables if required 6. Launch and wait until status is **Running** When ready, call endpoint using the standard REST API. --- ## Step 4: Optional — Faster Startup Using Uploaded Weights Upload model weights via Object Storage: ```bash mc cp -r custom-model/ ``` Modify `load()` to fetch from `/mnt/models` instead of remote hosting. --- ## Example — API Request ```bash curl -X POST https://your-endpoint-url/v1/models/custom-model:predict \ -H "Authorization: Bearer YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"instances": [{"text": "Hello!"}]}' ``` --- ## Notes * Keep image size small to reduce startup time * GPU instances recommended for large transformer models * Always validate endpoint logs for readiness --- Your custom container model server is now operational and ready for API integration! ## Deploy from GitHub Repository AI Cloud now supports deploying model endpoints directly from GitHub repositories with custom frameworks. This feature enables you to: * Deploy your own custom models directly from GitHub repositories * Automatically restart inference when pull requests are merged --- ### Prerequisites Before deploying from a GitHub repository, you need to set up GitHub integration: 1. Navigate to **Labs Experimental → External Integrations** 2. Create a GitHub integration with your Personal Access Token 3. For detailed instructions, see [External Integrations documentation](/docs/tir/external_integrations/intro) --- ### Step 1: Create Model Endpoint from GitHub Repository Follow these steps to deploy a model endpoint directly from your GitHub repository: 1. **Navigate to Model Endpoints** Go to **Inference → Model Endpoints** from the sidebar. 2. **Initiate Endpoint Creation** Click **CREATE ENDPOINT** to begin the configuration process. 3. **Select Framework** Choose **Custom Framework** from the available options. 4. **Configure GitHub Integration** Set the download type to **Link with Github** to enable repository-based deployment. ![Github Integration](./../images/GithubIntegration.png) 5. **Specify Repository Details** Enter the complete GitHub repository URL, including the branch: ``` https://github.com///tree/ ``` Example: `https://github.com/myorg/ml-inference/tree/main` 6. **Select GitHub Integration** Choose your pre-configured GitHub integration from the dropdown menu. 7. **Select Image** Specify whether your container image is **Public** or **Private**. 8. **Define Startup Command** Configure the bash command to execute on container startup. This typically includes installing dependencies and launching your application: ```json ["sh", "-c", "pip3 install -r /mnt/models//requirements.txt && python3 /mnt/models//app.py"] ``` **Example:** ```json ["sh", "-c", "pip3 install -r /mnt/models/flask-app-inference/requirements.txt && python3 /mnt/models/flask-app-inference/app.py"] ``` 9. **Configure Resources and Scaling** Select appropriate compute resources (CPU/GPU) and configure scaling settings based on your workload requirements. 10. **Deploy Endpoint** Click **CREATE** and monitor the deployment status until the endpoint reaches **Running** state. --- ### Step 2: Set Up Auto-Restart on Pull Request Merge You can configure automatic inference restarts when pull requests are merged to your main branch. This ensures your model endpoint always runs the latest code. #### Add GitHub Workflow Create a workflow file in your repository: ``` .github/workflows/deploy.yaml ``` #### Workflow Configuration ```yaml name: Auto-restart E2E Inference on PR Merge on: pull_request: types: [closed] branches: - # Replace with your target branch (e.g., main, master, develop) jobs: restart-inference: if: github.event.pull_request.merged == true runs-on: ubuntu-latest steps: - name: Restart inference via E2E API env: API_KEY: ${{ secrets.E2E_API_KEY }} AUTH_TOKEN: ${{ secrets.E2E_AUTH_TOKEN }} run: | echo "Restarting inference after merge..." URL="https://api.e2enetworks.com/myaccount/api/v1/gpu/github-inferences/restart/?apikey=${API_KEY}" curl --fail --show-error -X PUT "$URL" \ -H "Authorization: Bearer ${AUTH_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "inference_uuids": [ "uuid-1", "uuid-2", "uuid-3" ] }' ``` #### Configuration Steps 1. **Add GitHub Secrets:** - Go to your GitHub repository → **Settings → Secrets and variables → Actions** - Add `E2E_API_KEY` with your E2E Cloud API key - Add `E2E_AUTH_TOKEN` with your E2E Cloud authentication token 2. **Update Inference UUIDs:** - Replace `"uuid-1"`, `"uuid-2"`, `"uuid-3"` with your actual model endpoint UUIDs - Find UUIDs in the E2E Cloud dashboard under **Inference → Model Endpoints** 3. **Commit and Push:** ```bash git add .github/workflows/deploy.yaml git commit -m "Add auto-restart workflow" git push origin main ``` --- ### How It Works 1. When a pull request is merged to the `main` branch, the GitHub Action triggers 2. The workflow calls the E2E Cloud API to restart specified inference endpoints 3. The model endpoints automatically pull the latest code from your repository 4. Your inference service restarts with the updated code --- ### Benefits * **Continuous Deployment:** Automatically deploy code changes without manual intervention * **Version Control:** Track all changes through Git commits and pull requests * **Team Collaboration:** Multiple developers can contribute with automated deployments * **Custom Frameworks:** Use any framework or custom code your project requires ---