# TorchServe

TorchServe allows you to serve PyTorch models with REST and gRPC APIs. It provides a built-in web server that can serve one or multiple models, with configurable ports, hosts, and logging options.

Our platform simplifies deploying PyTorch models. You can upload your TorchServe model archive to **E2E Object Storage (EOS)**, and E2E will automatically create and manage the deployment — handling container creation, model download, and web server launch with monitoring and scaling features through the dashboard.

---

## Key features of TorchServe

* Automated deployments from EOS
* Automatic restart on failures
* Managed TLS certificates
* Token-based authentication
* Manual or auto scaling
* Optional persistent disks (for faster restarts)
* REST (HTTP) and gRPC support
* Health checks (readiness and liveness)

---

## Quick start

This section focuses on deploying and serving TorchServe models.

### Step 1: Install dependencies

Our platform uses **MinIO CLI (mc)** to upload model archives to EOS.

If you’re using Instance, skip this step — `mc` is pre-installed.

#### macOS installation

```bash
brew install minio/stable/mc
```

---

### Step 2: Create directories for model and config

```bash
mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config
cd mnist/model-store
```

---

### Step 3: Download a trained model archive

```bash
wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar
```

---

### Step 4: Download the TorchServe config file

```bash
cd ../config
wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties
```

---

### Step 5: Create a model

1. Go to the [AI Platform](https://tir.e2enetworks.com).
2. Navigate to **Model Repository** → **Create Model**.
3. Name it `my-mnist` and select **New E2E Object Store Bucket**.
4. Copy and run the `mc alias` command provided:

```bash
mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>
```

```bash
mc ls my-mnist/
```

---

### Step 6: Upload model and config to EOS

```bash
cd ..
mc cp -r * my-mnist/<your-bucket-name>
```

---

### Step 7: Create an inference service

1. Go to **Deployments** → **Create Deployment**.
2. Choose framework **TorchServe** and select the `my-mnist` model.
3. Use the **Sample API request** in the dashboard to test your service.

---

## Developer workflow

Typical TorchServe workflow:

1. Train and save the model (`.pt`).
2. Optionally, write a custom handler.
3. Create a model archive (`.mar`) using `torch-model-archiver`.
4. Create a `config.properties` file.
5. Run TorchServe — automated in our platform

For examples, visit the [TorchServe MNIST demo](https://github.com/pytorch/serve/tree/master/examples/image_classifier/mnist).

---

## Creating a model archive

Use the **Torch Model Archiver** utility to generate `.mar` files.

```bash
torch-model-archiver --model-name mnist --version 1.0 \
--model-file mnist.py --serialized-file mnist.pt --handler image_classifier \
--export-path model-store
```

### Key parameters

* **--model-name**: Name for your model (used in API endpoint).
* **--version**: Optional version tag.
* **--model-file**: Model definition file (e.g., `mnist.py`).
* **--serialized-file**: Model weights file (e.g., `mnist.pt`).
* **--handler**: Pre-built or custom handler.

See [available handlers](https://github.com/pytorch/serve/tree/master/ts/torch_handler).

---

## TorchServe config file

A sample `config.properties` file:

```properties
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
```

---

## Package and upload model updates

1. Create a model structure:

```bash
mkdir -p my-model/config my-model/model-store
cp config.properties my-model/config/
```

2. Upload to EOS:

```bash
mc cp -r my-model my-mnist/<bucket-name>
```

---

## Connecting to the service endpoint

E2E secures TorchServe endpoints with authentication tokens.

### Check endpoint status

```bash
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist
```

### Send prediction requests

```bash
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X POST \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d '{"instances": [{"data": "<base64-image>", "target": 0}]}'
```

---

### Batch prediction example

```python
import pathlib, base64
from tir_inference import endpoint

data_dir = pathlib.Path("mnist-images-directory")
files = list(data_dir.glob("*.png"))

with open(files[0], "rb") as f:
    data = {"data": base64.b64encode(f.read()).decode("utf-8")}

response = endpoint.predict(instances=[data])
```

---

## Monitoring

It provides real-time logs and metrics for all TorchServe deployments.

### Logs

View logs under **Deployments → Logs**.

### Metrics

To enable Prometheus metrics:

```properties
metrics_format=prometheus
```

---

## Advanced use cases

### Custom containers

Supports extending built-in containers. Learn more in [Custom Inference Containers](custom_inference).

### Large models and multi-GPU setups

E2E fully supports TorchServe’s multi-GPU and large model capabilities. Refer to [TorchServe Large Model Inference](https://pytorch.org/serve/large_model_inference.html).

---

## Examples

For additional examples, visit the [official TorchServe repository](https://github.com/pytorch/serve/tree/master/examples).


---