TorchServe

TorchServe takes a PyTorch deep learning model and wraps it in a set of REST APIs. It comes with a built-in web server that you run from the command line. This built-in server takes command line arguments for single or multiple models you want to serve, along with optional parameters controlling port, host, and logging.

TIR makes deploying PyTorch models as easy as pushing code. You can upload your TorchServe model archive—more on this later—to E2E Object Storage (EOS), and that’s it. TIR can automatically launch containers, download the model from the EOS bucket to a local directory on the container, and start the TorchServe web server. What you get is not only automated containerized deployment but also monitoring and maintenance features in the TIR dashboard.

Some Features for TorchServe on TIR

Automated Deployments from E2E Object Storage (EOS bucket)
Automatic restart on failures
E2E Managed TLS certificates
Token-based Authentication
Manual or Automated Scaling
Optional Persistent Disks (to reduce boot time when the model downloads on restarts)
REST (HTTP) and gRPC
Readiness and Liveness Checks

Quick Start

This section focuses on serving model files (deployment) without much discussion on the model files themselves, where they come from, and how they are made. We will learn more about model development in later sections.

Install Dependencies:

TIR deployments require MinIO CLI (mc) to upload the model archive to E2E Object Store.

Follow the instructions below to download MinIO CLI (mc) on your local. If you are using TIR Notebooks, you can skip this step as they come pre-installed with mc (MinIO CLI).

Installing MinIO CLI

macOS

To install MinIO CLI on macOS, run the following command:

brew install minio/stable/mc

Create a Directory to Store the Model Weights

Create a model directory and subdirectories:

# Make a model directory 
mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config

cd model-store

# Download a Trained Model (Torch Archive)

Use the following command to download the trained model:

```bash
wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar

Download a TorchServe Config File

Navigate to the config directory and download the config file using the following command:

# Go to config directory
cd ../config && wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties

Create a TIR Model

Go to the TIR AI Platform.
Navigate to Model Repository in the Inference section.
Create a model with the name my-mnist. If prompted, select New E2E Object Store Bucket as the model type.
Once the model is created, copy the mc alias command for the model.
Run the mc alias command in your command line.

    mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>

Note

You can also note down the bucket name (auto-created by TIR) with this command as we will need in next section:

     mc ls my-mnist/

Upload the Model and Config to TIR Model

Run the following commands from your command line to upload the model-store and config directories to the TIR model bucket:

# Return to model directory (top)
cd ..

# Run this command to get the bucket name
mc ls my-mnist/

# Copy contents to model bucket in E2E Object Store
mc cp -r * my-mnist/<enter-bucket-name-here>

# If all goes well, you should see the directories in the model bucket
mc ls my-mnist/<enter-bucket-name-here>

Create an Inference Service

We have our model weights and config in E2E Object Storage now. Follow these steps to create an inference service in TIR:

Go to the TIR AI Platform.
Navigate to Deployments.
Create a new deployment. When prompted, select the torchserve framework and the my-mnist model.
Follow the instructions from the Sample API request to test the service.

Developer Workflow

PyTorch is the most commonly used model training toolkit. Once a model is trained, a typical developer workflow would look like this:

Save the model (.pt) to the file system.
Write a custom API handler (optional).
Make a model archive (.mar) using torch-model-archiver.
Prepare a config file to set the runtime behavior of the service. In TIR, this step is not optional.
Run the torchserve command to launch the service. For TIR users, this step is completely automated.

For sample code, click here.

Make a Model Archive

Torch Model Archiver is a tool used for creating archives of trained PyTorch models that can be consumed by TorchServe for inference.

After you save a trained model on the file system, use the torch-model-archiver utility to generate an archive file (.mar). This file will need to be pushed to the EOS Bucket (more details on this in further sections).

$ torch-model-archiver -h
usage: torch-model-archiver [-h] --model-name MODEL_NAME  --version MODEL_VERSION_NUMBER
                      --model-file MODEL_FILE_PATH --serialized-file MODEL_SERIALIZED_PATH
                      --handler HANDLER [--runtime {python,python3}]
                      [--export-path EXPORT_PATH] [-f] [--requirements-file] [--config-file]

Torch Model Archiver Parameters

--model-name

Enter a unique name for this model. This name is important as your API endpoint will depend on it. For example, if the model name is mnist, the endpoint will look like https://../mnist/infer.

--version

This is optional. You may choose to set a version number. However, you would also have to create a version in the EOS bucket. More on that in the Push Model Updates section.

--model-file

This is the file path for the model definition, e.g., mnist.py that defines the PyTorch model.

--serialized-file

This is the actual model weights file. The extension format will be .pt.

--handler

You can use built-in handlers like base_handler, image_classifier, text_classifier, object_detector, or more from the list here. You may also write your own custom handler as shown in this example.

TorchServe Config File

The default configuration of TorchServe can be overridden through a config file. In most cases, this will be necessary. For example, the default method is to write metrics in the log file, but you may want to push them to Prometheus.

A sample config.properties file is shown below. It is the same config you may have used in the Quick Start section.

# Modifications to the following parameters are supported.
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true

# Below are certain defaults that TIR uses. We recommend not to change these.
# inference_address=http://0.0.0.0:8080
# management_address=http://0.0.0.0:8081
# metrics_address=http://0.0.0.0:8082 
# grpc_inference_port=7070
# grpc_management_port=7071
# enable_metrics_api=true
# model_store=/mnt/models/model-store

Note

You can learn more about the config params here: https://pytorch.org/serve/configuration.html#config-properties-file

Package and Push Model Updates

Now that we have covered how to prepare the config file and model archive, we can now package them together.

Create directories as shown below:

mkdir my-model && mkdir -p my-model/config && mkdir -p my-model/model-store

Package and Push Model Updates (Continued)

Now move the config.properties file into the config folder:

cp /path/to/config.properties my-model/config/

Connecting to the Service Endpoint

Torchserve does not provide authentication of endpoints. But not to worry yet, TIR can do this for you.

All inference endpoints in TIR are secured with an auth token. You can create new API tokens by locating the API Tokens section in the TIR dashboard.

Once you have the API Token, firing requests is easy. Each Inference Service is equipped with the following endpoints:

Checking Status of Endpoint

# This request returns the status of the mnist endpoint 
# we created in the Quick Start section. Also note, 
# the model name here matches exactly to the 
# model name used in the torch-model-archiver utility. 

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist

# response: {Active: true}

Predict Endpoint

The format of the predict API:

Predict v1

Verb: POST
Payload: Request: {"instances": []}
Response: {"predictions": []}

When submitting a prediction or inference request, the important part to consider is the request format.

The request body for the predict API must be a JSON object formatted as follows:

{ 
    "instances": <value>|<(nested)list>|<list-of-objects>
}

Sample Request to MNIST Endpoint

A sample request to the MNIST endpoint would look like this. Here, the image is converted to base64 format.

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' \
-X POST https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d '{
    "instances": [
        {
            "data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC",
            "target": 0
        }
    ]
}'

Note

To run above request, you will need to convert your image to base64 format. You can use os utils like base64 on mac OS or use libs available in source languages (base64 in python).

Batching Images for Prediction Request

In the following example, a directory of images is batched together before sending a prediction request.

data_directory = pathlib.Path(data_dir)
digit_files = list(data_directory.glob("mnist-images-directory/*"))

with open(digit_files[0], "rb") as f:
    data = {"data": base64.b64encode(f.read()).decode("utf-8")}

response = endpoint.predict(instances=[data])

attention

If your application restricts you to a specific request and response format, and can not adhere to the format above then you may consider writing a custom container.

Monitoring

The TorchServe containers in TIR are capable of logging and recording metrics at a deep level.

Logging

You can view detailed logs of the inference service by selecting the endpoint in the deployments section.

Metrics

By default, the metrics will be printed in the log. To view the metrics in the TIR dashboard, we recommend adding the following line to the config.properties file:

metrics_format=prometheus

Advanced Use Cases

Extending Built-in Container

TIR does not restrict you to pre-built frameworks. You can write your own container and publish it on TIR. To see an example, read Custom Containers in TIR.

Large Models and Multi-GPUs

TIR supports all functionality of TorchServe extensively, including multi-GPU training and deployments. You can find multi-GPU development guidelines for PyTorch here.

Examples

For more samples on using TorchServe, visit the official TorchServe repo.

Some Features for TorchServe on TIR​

Quick Start

Installing MinIO CLI​

macOS​

Download a TorchServe Config File

Create a TIR Model

Upload the Model and Config to TIR Model

Create an Inference Service

Developer Workflow

Make a Model Archive

Torch Model Archiver Parameters

--model-name​

--version​

--model-file​

--serialized-file​

--handler​

TorchServe Config File

Package and Push Model Updates

Package and Push Model Updates (Continued)

Connecting to the Service Endpoint

Checking Status of Endpoint​

Predict Endpoint

Sample Request to MNIST Endpoint​

Batching Images for Prediction Request​

Monitoring

Logging​

Metrics​

Advanced Use Cases

Extending Built-in Container​

Large Models and Multi-GPUs​

Examples

Some Features for TorchServe on TIR

Installing MinIO CLI

macOS

--model-name

--version

--model-file

--serialized-file

--handler

Checking Status of Endpoint

Sample Request to MNIST Endpoint

Batching Images for Prediction Request

Logging

Metrics

Extending Built-in Container

Large Models and Multi-GPUs