TorchServe
TorchServe takes a PyTorch deep learning model and wraps it in a set of REST APIs. It comes with a built-in web server that you run from the command line. This built-in server takes command line arguments for single or multiple models you want to serve, along with optional parameters controlling port, host, and logging.
TIR makes deploying PyTorch models as easy as pushing code. You can upload your TorchServe model archive—more on this later—to E2E Object Storage (EOS), and that’s it. TIR can automatically launch containers, download the model from the EOS bucket to a local directory on the container, and start the TorchServe web server. What you get is not only automated containerized deployment but also monitoring and maintenance features in the TIR dashboard.
Some Features for TorchServe on TIR
- Automated Deployments from E2E Object Storage (EOS bucket)
- Automatic restart on failures
- E2E Managed TLS certificates
- Token-based Authentication
- Manual or Automated Scaling
- Optional Persistent Disks (to reduce boot time when the model downloads on restarts)
- REST (HTTP) and gRPC
- Readiness and Liveness Checks
Quick Start
This section focuses on serving model files (deployment) without much discussion on the model files themselves, where they come from, and how they are made. We will learn more about model development in later sections.
-
Install Dependencies:
TIR deployments require MinIO CLI (mc) to upload the model archive to E2E Object Store.
Follow the instructions below to download MinIO CLI (mc) on your local. If you are using TIR Notebooks, you can skip this step as they come pre-installed with mc (MinIO CLI).
Installing MinIO CLI
macOS
To install MinIO CLI on macOS, run the following command:
brew install minio/stable/mc
-
Create a Directory to Store the Model Weights
Create a model directory and subdirectories:
# Make a model directory
mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config
cd model-store
# Download a Trained Model (Torch Archive)
Use the following command to download the trained model:
```bash
wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar
Download a TorchServe Config File
Navigate to the config directory and download the config file using the following command:
# Go to config directory
cd ../config && wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties
Create a TIR Model
- Go to the TIR AI Platform.
- Navigate to Model Repository in the Inference section.
- Create a model with the name
my-mnist
. If prompted, select New E2E Object Store Bucket as the model type. - Once the model is created, copy the
mc alias
command for the model. - Run the
mc alias
command in your command line.
mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>
You can also note down the bucket name (auto-created by TIR) with this command as we will need in next section:
mc ls my-mnist/
Upload the Model and Config to TIR Model
Run the following commands from your command line to upload the model-store
and config
directories to the TIR model bucket:
# Return to model directory (top)
cd ..
# Run this command to get the bucket name
mc ls my-mnist/
# Copy contents to model bucket in E2E Object Store
mc cp -r * my-mnist/<enter-bucket-name-here>
# If all goes well, you should see the directories in the model bucket
mc ls my-mnist/<enter-bucket-name-here>
Create an Inference Service
We have our model weights and config in E2E Object Storage now. Follow these steps to create an inference service in TIR:
- Go to the TIR AI Platform.
- Navigate to Deployments.
- Create a new deployment. When prompted, select the
torchserve
framework and themy-mnist
model. - Follow the instructions from the Sample API request to test the service.
Developer Workflow
PyTorch is the most commonly used model training toolkit. Once a model is trained, a typical developer workflow would look like this:
- Save the model (.pt) to the file system.
- Write a custom API handler (optional).
- Make a model archive (.mar) using
torch-model-archiver
. - Prepare a config file to set the runtime behavior of the service. In TIR, this step is not optional.
- Run the
torchserve
command to launch the service. For TIR users, this step is completely automated.
For sample code, click here.
Make a Model Archive
Torch Model Archiver is a tool used for creating archives of trained PyTorch models that can be consumed by TorchServe for inference.
After you save a trained model on the file system, use the torch-model-archiver
utility to generate an archive file (.mar). This file will need to be pushed to the EOS Bucket (more details on this in further sections).
$ torch-model-archiver -h
usage: torch-model-archiver [-h] --model-name MODEL_NAME --version MODEL_VERSION_NUMBER
--model-file MODEL_FILE_PATH --serialized-file MODEL_SERIALIZED_PATH
--handler HANDLER [--runtime {python,python3}]
[--export-path EXPORT_PATH] [-f] [--requirements-file] [--config-file]
Torch Model Archiver Parameters
--model-name
Enter a unique name for this model. This name is important as your API endpoint will depend on it. For example, if the model name is mnist
, the endpoint will look like https://../mnist/infer
.
--version
This is optional. You may choose to set a version number. However, you would also have to create a version in the EOS bucket. More on that in the Push Model Updates section.
--model-file
This is the file path for the model definition, e.g., mnist.py
that defines the PyTorch model.
--serialized-file
This is the actual model weights file. The extension format will be .pt
.
--handler
You can use built-in handlers like base_handler
, image_classifier
, text_classifier
, object_detector
, or more from the list here. You may also write your own custom handler as shown in this example.
TorchServe Config File
The default configuration of TorchServe can be overridden through a config file. In most cases, this will be necessary. For example, the default method is to write metrics in the log file, but you may want to push them to Prometheus.
A sample config.properties
file is shown below. It is the same config you may have used in the Quick Start section.
# Modifications to the following parameters are supported.
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
# Below are certain defaults that TIR uses. We recommend not to change these.
# inference_address=http://0.0.0.0:8080
# management_address=http://0.0.0.0:8081
# metrics_address=http://0.0.0.0:8082
# grpc_inference_port=7070
# grpc_management_port=7071
# enable_metrics_api=true
# model_store=/mnt/models/model-store
You can learn more about the config params here: https://pytorch.org/serve/configuration.html#config-properties-file
Package and Push Model Updates
Now that we have covered how to prepare the config file and model archive, we can now package them together.
- Create directories as shown below:
mkdir my-model && mkdir -p my-model/config && mkdir -p my-model/model-store
Package and Push Model Updates (Continued)
- Now move the
config.properties
file into the config folder:
cp /path/to/config.properties my-model/config/
Connecting to the Service Endpoint
Torchserve does not provide authentication of endpoints. But not to worry yet, TIR can do this for you.
All inference endpoints in TIR are secured with an auth token. You can create new API tokens by locating the API Tokens section in the TIR dashboard.
Once you have the API Token, firing requests is easy. Each Inference Service is equipped with the following endpoints:
Checking Status of Endpoint
# This request returns the status of the mnist endpoint
# we created in the Quick Start section. Also note,
# the model name here matches exactly to the
# model name used in the torch-model-archiver utility.
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist
# response: {Active: true}
Predict Endpoint
The format of the predict API:
Predict v1
-
Verb: POST
-
Payload: Request:
{"instances": []}
-
Response:
{"predictions": []}
When submitting a prediction or inference request, the important part to consider is the request format.
The request body for the predict
API must be a JSON object formatted as follows:
{
"instances": <value>|<(nested)list>|<list-of-objects>
}
Sample Request to MNIST Endpoint
A sample request to the MNIST endpoint would look like this. Here, the image is converted to base64 format.
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' \
-X POST https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d '{
"instances": [
{
"data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC",
"target": 0
}
]
}'
To run above request, you will need to convert your image to base64 format. You can use os utils like base64 on mac OS or use libs available in source languages (base64 in python).
Batching Images for Prediction Request
In the following example, a directory of images is batched together before sending a prediction request.
data_directory = pathlib.Path(data_dir)
digit_files = list(data_directory.glob("mnist-images-directory/*"))
with open(digit_files[0], "rb") as f:
data = {"data": base64.b64encode(f.read()).decode("utf-8")}
response = endpoint.predict(instances=[data])
If your application restricts you to a specific request and response format, and can not adhere to the format above then you may consider writing a custom container.
Monitoring
The TorchServe containers in TIR are capable of logging and recording metrics at a deep level.
Logging
You can view detailed logs of the inference service by selecting the endpoint in the deployments section.
Metrics
By default, the metrics will be printed in the log. To view the metrics in the TIR dashboard, we recommend adding the following line to the config.properties
file:
metrics_format=prometheus
Advanced Use Cases
Extending Built-in Container
TIR does not restrict you to pre-built frameworks. You can write your own container and publish it on TIR. To see an example, read Custom Containers in TIR.
Large Models and Multi-GPUs
TIR supports all functionality of TorchServe extensively, including multi-GPU training and deployments. You can find multi-GPU development guidelines for PyTorch here.
Examples
For more samples on using TorchServe, visit the official TorchServe repo.