It is easy to deploy containers that serve model API.
TIR offers two methods to create an inference service (API endpoint) for your AI model:
Deploy using pre-built (TIR provided) Containers
Before you launch a service with pre-built containers, you must first create a TIR Model and upload model files to it. The pre-built containers are designed to auto-download the model files from EOS (E2E Object Storage) Bucket and launch the API server with them. Once an endpoint is ready, you can send synchronous request to the endpoint for inference.
Deploy using your own container
You can provide a public or private docker image and launch an inference service with it. Once endpoint is ready, you can make synchronous requests to the endpoint for inference. You may also choose to attach a tir model to your service, to automate download of model files from EOS bucket to the container.
TIR provides docker container images that you can run as pre-built containers. These containers provide inference servers (HTTP) that can serve inference requests with minimal configuration. They are also capable of connecting with E2E Object Storage and downloading models on the containers at the startup.
This section lists deployment guides for all the integrated frameworks that TIR supports.
Go through this complete guide for deploying a torchserve service.
Go through this detailed guide to deploy a triton service.
Go through this tutorial to deploy LLAMA v2.
Go through this tutorial to deploy CodeLLMA Service.
Go through this tutorial to deploy Stable Diffusion Inference Service.
Go through this detailed tutorial to building a custom images for model deployments.