TorchServe
TorchServe allows you to serve PyTorch models with REST and gRPC APIs. It provides a built-in web server that can serve one or multiple models, with configurable ports, hosts, and logging options.
Our platform simplifies deploying PyTorch models. You can upload your TorchServe model archive to E2E Object Storage (EOS), and E2E will automatically create and manage the deployment — handling container creation, model download, and web server launch with monitoring and scaling features through the dashboard.
Key features of TorchServe
- Automated deployments from EOS
- Automatic restart on failures
- Managed TLS certificates
- Token-based authentication
- Manual or auto scaling
- Optional persistent disks (for faster restarts)
- REST (HTTP) and gRPC support
- Health checks (readiness and liveness)
Quick start
This section focuses on deploying and serving TorchServe models.
Step 1: Install dependencies
Our platform uses MinIO CLI (mc) to upload model archives to EOS.
If you’re using Instance, skip this step — mc is pre-installed.
macOS installation
brew install minio/stable/mc
Step 2: Create directories for model and config
mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config
cd mnist/model-store
Step 3: Download a trained model archive
wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar
Step 4: Download the TorchServe config file
cd ../config
wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties
Step 5: Create a model
- Go to the AI Platform.
- Navigate to Model Repository → Create Model.
- Name it
my-mnistand select New E2E Object Store Bucket. - Copy and run the
mc aliascommand provided:
mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>
mc ls my-mnist/
Step 6: Upload model and config to EOS
cd ..
mc cp -r * my-mnist/<your-bucket-name>
Step 7: Create an inference service
- Go to Deployments → Create Deployment.
- Choose framework TorchServe and select the
my-mnistmodel. - Use the Sample API request in the dashboard to test your service.
Developer workflow
Typical TorchServe workflow:
- Train and save the model (
.pt). - Optionally, write a custom handler.
- Create a model archive (
.mar) usingtorch-model-archiver. - Create a
config.propertiesfile. - Run TorchServe — automated in our platform
For examples, visit the TorchServe MNIST demo.
Creating a model archive
Use the Torch Model Archiver utility to generate .mar files.
torch-model-archiver --model-name mnist --version 1.0 \
--model-file mnist.py --serialized-file mnist.pt --handler image_classifier \
--export-path model-store
Key parameters
- --model-name: Name for your model (used in API endpoint).
- --version: Optional version tag.
- --model-file: Model definition file (e.g.,
mnist.py). - --serialized-file: Model weights file (e.g.,
mnist.pt). - --handler: Pre-built or custom handler.
See available handlers.
TorchServe config file
A sample config.properties file:
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
Package and upload model updates
- Create a model structure:
mkdir -p my-model/config my-model/model-store
cp config.properties my-model/config/
- Upload to EOS:
mc cp -r my-model my-mnist/<bucket-name>
Connecting to the service endpoint
E2E secures TorchServe endpoints with authentication tokens.
Check endpoint status
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist
Send prediction requests
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X POST \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d '{"instances": [{"data": "<base64-image>", "target": 0}]}'
Batch prediction example
import pathlib, base64
from tir_inference import endpoint
data_dir = pathlib.Path("mnist-images-directory")
files = list(data_dir.glob("*.png"))
with open(files[0], "rb") as f:
data = {"data": base64.b64encode(f.read()).decode("utf-8")}
response = endpoint.predict(instances=[data])
Monitoring
It provides real-time logs and metrics for all TorchServe deployments.
Logs
View logs under Deployments → Logs.
Metrics
To enable Prometheus metrics:
metrics_format=prometheus
Advanced use cases
Custom containers
Supports extending built-in containers. Learn more in Custom Inference Containers.
Large models and multi-GPU setups
E2E fully supports TorchServe’s multi-GPU and large model capabilities. Refer to TorchServe Large Model Inference.
Examples
For additional examples, visit the official TorchServe repository.