Skip to main content

TorchServe

TorchServe allows you to serve PyTorch models with REST and gRPC APIs. It provides a built-in web server that can serve one or multiple models, with configurable ports, hosts, and logging options.

Our platform simplifies deploying PyTorch models. You can upload your TorchServe model archive to E2E Object Storage (EOS), and E2E will automatically create and manage the deployment — handling container creation, model download, and web server launch with monitoring and scaling features through the dashboard.


Key features of TorchServe

  • Automated deployments from EOS
  • Automatic restart on failures
  • Managed TLS certificates
  • Token-based authentication
  • Manual or auto scaling
  • Optional persistent disks (for faster restarts)
  • REST (HTTP) and gRPC support
  • Health checks (readiness and liveness)

Quick start

This section focuses on deploying and serving TorchServe models.

Step 1: Install dependencies

Our platform uses MinIO CLI (mc) to upload model archives to EOS.

If you’re using Instance, skip this step — mc is pre-installed.

macOS installation

brew install minio/stable/mc

Step 2: Create directories for model and config

mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config
cd mnist/model-store

Step 3: Download a trained model archive

wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar

Step 4: Download the TorchServe config file

cd ../config
wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties

Step 5: Create a model

  1. Go to the AI Platform.
  2. Navigate to Model RepositoryCreate Model.
  3. Name it my-mnist and select New E2E Object Store Bucket.
  4. Copy and run the mc alias command provided:
mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>
mc ls my-mnist/

Step 6: Upload model and config to EOS

cd ..
mc cp -r * my-mnist/<your-bucket-name>

Step 7: Create an inference service

  1. Go to DeploymentsCreate Deployment.
  2. Choose framework TorchServe and select the my-mnist model.
  3. Use the Sample API request in the dashboard to test your service.

Developer workflow

Typical TorchServe workflow:

  1. Train and save the model (.pt).
  2. Optionally, write a custom handler.
  3. Create a model archive (.mar) using torch-model-archiver.
  4. Create a config.properties file.
  5. Run TorchServe — automated in our platform

For examples, visit the TorchServe MNIST demo.


Creating a model archive

Use the Torch Model Archiver utility to generate .mar files.

torch-model-archiver --model-name mnist --version 1.0 \
--model-file mnist.py --serialized-file mnist.pt --handler image_classifier \
--export-path model-store

Key parameters

  • --model-name: Name for your model (used in API endpoint).
  • --version: Optional version tag.
  • --model-file: Model definition file (e.g., mnist.py).
  • --serialized-file: Model weights file (e.g., mnist.pt).
  • --handler: Pre-built or custom handler.

See available handlers.


TorchServe config file

A sample config.properties file:

metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true

Package and upload model updates

  1. Create a model structure:
mkdir -p my-model/config my-model/model-store
cp config.properties my-model/config/
  1. Upload to EOS:
mc cp -r my-model my-mnist/<bucket-name>

Connecting to the service endpoint

E2E secures TorchServe endpoints with authentication tokens.

Check endpoint status

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist

Send prediction requests

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X POST \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d '{"instances": [{"data": "<base64-image>", "target": 0}]}'

Batch prediction example

import pathlib, base64
from tir_inference import endpoint

data_dir = pathlib.Path("mnist-images-directory")
files = list(data_dir.glob("*.png"))

with open(files[0], "rb") as f:
data = {"data": base64.b64encode(f.read()).decode("utf-8")}

response = endpoint.predict(instances=[data])

Monitoring

It provides real-time logs and metrics for all TorchServe deployments.

Logs

View logs under Deployments → Logs.

Metrics

To enable Prometheus metrics:

metrics_format=prometheus

Advanced use cases

Custom containers

Supports extending built-in containers. Learn more in Custom Inference Containers.

Large models and multi-GPU setups

E2E fully supports TorchServe’s multi-GPU and large model capabilities. Refer to TorchServe Large Model Inference.


Examples

For additional examples, visit the official TorchServe repository.