Inference
Inference is how you use a trained AI model - you send it an input, and it returns a prediction or response. On E2E AI Cloud, inference lets you deploy models as live API endpoints that your applications can call.
You can serve models using popular frameworks like vLLM or SGLang, or bring your own container. Models can be sourced from Hugging Face or your own repository. Endpoints are OpenAI-compatible, so they work with tools you already use.
E2E AI Cloud handles the infrastructure - including automatic scaling and scale-to-zero for serverless deployments - so you can focus on your model, not the servers.
Model RepositoryModel EndpointsInference Tutorials