TorchServe

TorchServe takes a Pytorch deep learning model and wraps it in a set of REST APIs. It comes with a built-in web server that you run from command line. This built-in server takes command line arguments like single or multiple models you want to serve, along with optional parameters controlling port, host and logging.

TIR makes deploying pytorch models as easy as pushing code. You can upload your torchserve model archive - more on this later - to E2E Object Storage (EOS) and thats it. TIR can automatically launch containers, download the model from EOS bucket to a local directory on container and start the torchserve web server. What you get is not only the automated containerized deployment but also monitoring and maintenance features in TIR dashboard.

Some feature for Torchserve on TIR:

  • Automated Deployments from E2E Object Storage (EOS bucket)

  • Automatic restart on failures

  • E2E Managed TLS certificates

  • Token based Authentication

  • Manual or Automated Scaling

  • Optional Persistent disks (to reduce boot time when the model downloads on restarts)

  • Rest (HTTP) and GRPC

  • Readiness and Liveness Checks

Quick Start

This section focuses on serving model files (deployment) without much discussion on the model files themselves, where they come from, and how they are made. We will learn more about model development in later sections.

  1. Install dependencies:

    TIR deployments requires minio CLI to upload model archive to E2E Object Store.

    Follow instructions below to download minio CLI (mc) on your local. If you are using TIR Notebooks, you can skip this step as they come pre-installed with mc (minio CLI).

    • macOS

      
      

      brew install minio/stable/mc

    • Windows: Follow instructions from here

    • Linux: Follow instructions from here

  2. Create a directory to store the model weights.

    # make a model directory
    mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config
    
    cd model-store
    
  3. Download a trained model (torch archive)

    wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar
    
  4. Download a torchserve config file

    # go to config directory
    cd ../config && wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties
    
  5. Create a TIR Model

    • Go to TIR Dashboard

    • Go to Model Repository in Inference section.

    • Create a Model with name my-mnist. If prompted select New E2E Object Store Bucket as model type.

    • When model is created, copy the mc alias command for the model

    • Run the mc alias command on your command line:

      mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>

    You can also note down the bucket name (auto-created by TIR) with this command as we will need in next section:

    mc ls my-mnist/

  6. Upload the model and config to TIR Model

    Run the following commands from your command line to upload model-store and config directories to TIR model bucket.

    # return to model directory (top)
    cd ..
    
    # run this command to get bucket name
    mc ls my-mnist/
    
    # copy contents to model bucket in E2E Object Store
    mc cp -r * my-mnist/<enter-bucket-name-here>
    
    # if all goes well, you should see the directories in model bucket
    mc ls my-mnist/<enter-bucket-name-here>
    
  7. Create an Inference Service We have our model weights and config in E2E Object Storage now. Follow these steps to create inference service in TIR.

    • Go to TIR Dashboard

    • Go to Deployments

    • Create a new deployment. When prompted, select torchserve framework and my-mnist model.

    • Follow instructions from Sample API request to test the service

Developer Workflow

Pytorch is the most commonly used model training toolkit. Once a model is trained, a typical developer workflow would look like this:

  • Save the model (.pt) to file system.

  • Write a custom API handler (optional)

  • Make a model archive (.mar) using torch-model-archiver

  • Prepare a config file to set runtime behaviour of the service. In TIR, this step is not optional.

  • Run torchserve command to launch the service. For TIR users, this step is completely automated.

For sample code, click here

Make a Model Archive

Torch Model Archiver is a tool used for creating archives of trained pytorch models that can be consumed by TorchServe for inference.

After you save a trained model on the file system, use torch-model-archiver utility to generate an archive file (.mar). This file will need to be pushed to EOS Bucket (more details on this in further sections).

$ torch-model-archiver -h
usage: torch-model-archiver [-h] --model-name MODEL_NAME  --version MODEL_VERSION_NUMBER
                  --model-file MODEL_FILE_PATH --serialized-file MODEL_SERIALIZED_PATH
                  --handler HANDLER [--runtime {python,python3}]
                  [--export-path EXPORT_PATH] [-f] [--requirements-file] [--config-file]

–model-name

Enter a unique name for this model. This name is important as your API Endpoint will depend on it. For e.g. if model-name is mnist, the endpoint will look like https://../mnist/infer

–version

This is optional. You may choose to set a version number. But, you would have to also create a version in EOS bucket. More on that in Push model Updates section.

–model-file

This is a the file path for model definition. e.g. mnist.py that defines pytorch model

–serialized-file

This is the actual model weights file. The extension format will be .pt

–handler

You can use built-in handlers like base_handler, image_classifier, text_classifier, object_detector, or more from the list here. You may as well write your own custom handler as this example.

Torchserve Config File

The default configuration of torchserve can be overridden through a config file. In most cases, this will be a necessary. For example, the default method is to write metrics in the log file but you may want to push them to prometheus.

A sample config.properties file is shown below. It is the same config you may have used in Quick Start section.

# Modifications to the following parameters are supported.
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true

# Below are certain defaults that TIR uses. We recommend not to change these.
# inference_address=http://0.0.0.0:8080
# management_address=http://0.0.0.0:8081
# metrics_address=http://0.0.0.0:8082
# grpc_inference_port=7070
# grpc_management_port=7071
# enable_metrics_api=true
# model_store=/mnt/models/model-store

Note

You can learn more about the config params here: https://pytorch.org/serve/configuration.html#config-properties-file

Package and Push Model Updates

Now that we have covered, how to prepare config file and model archive, we can now package them together.

  1. Create a directories as shown below:

    mkdir my-model && mkdir -p my-model/config && mkdir -p my-model/model-store

  2. Now move the config.properties file in config folder

    cp /path/to/config.properties my-model/config/

  3. Move the model archive in model-store

    cp /path/to/model.mar my-model/model-store

  4. Push the contents of my-model directory to TIR Model. You can find steps for this step by locating your model in TIR dashboard and following instructions from setup tab.

  5. Create a new Inference Service or Restart an existing one if you meant to just update the model version.

Connecting to the Service Endpoint

Torchserve does not provide authentication of endpoints. But not to worry yet, TIR can do this for you.

All inference endpoints in TIR are secured with an auth token. You can create new API tokens by locating API Tokens section in TIR dashboard.

Once you have API Token, firing requests is easy. Each Inference Service is equiped with the following endpoints:

Checking Status of Endpoint

# This request returns status of the mnist endpoint
# we created in Quick Start section. Also note,
# the model name here matches exactly to the
# model name used in torch-model-archiver utility.

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist

# response: {Active: true}

Predict Endpoint

The format of predict api:

Predict v1

  • verb: POST

  • Path: /v1/models/<modelname>:predict

  • Payload: Request:{“instances”: []}

  • Response:{“predictions”: []}

When submitting a prediction or inference request, the important part to consider would be the request format.

The request body for predict API must be a JSON object formatted as follows:

{
    "instances": <value>|<(nested)list>|<list-of-objects>
}

A sample request to mnist endpoint would look like below. Here, the image is converted to base64 format.

curl -v -H 'Authorization: Bearer $AUTH_TOKEN' \
-X POST https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d {
    "instances": [
        {
            "data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC",
            "target": 0
        }
    ]
}

Note

To run above request, you will need to convert your image to base64 format. You can use os utils like base64 on mac OS (e.g. base64 -i <in-file> -o <outfile>) or use libs available in source languages (base64 in python).

In the following example, a directory of images are batched together before sending prediction request.

data_directory = pathlib.Path(data_dir)
digit_files = list(data_directory.glob("mnist-images-directory/*"))

with open(digit_files[0], "rb") as f:
    data = {"data": base64.b64encode(f.read()).decode("utf-8")}

response = endpoint.predict(instances=[data])

Attention

If your application restricts you to a specific request and response format, and can not adhere to the format above then you may consider writing a custom container.

Monitoring

The torchserve containers in TIR are capable of logging and recording metrics at a deep level.

Logging

You can view detailed logs of inference service by selecting the endpoint in deployments section.

Metrics

By default, the metrics will be printed in the log. But to view the metrics in TIR dashboard, we recommend adding following line to config.properties file.

metrics_format=prometheus

Advanced Use-cases

Extending built-in container

TIR does not restrict you to pre-built frameworks. You can write your own container and publish it on TIR. To see an example, read Custom Containers in TIR

Large Models and Multi-GPUs

TIR supports all functionality of torchserve extensively including multi-gpu training and deployments. You can find multi-gpu development guidelines for pytorch here.

Examples

For more samples on using torchserve, visit the official torchserve repo.