Skip to content

Triton Inference Server on AIR-T

This tutorial will walk you through how to set up and run the triton inference server on your AIR-T and provide a minimal example to load a model and get a prediction.

Triton Inference Server is an open source inference serving software that streamlines AI inference, i.e., running an AI application for production. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Further instructions for setup and optimization can be found here: Nvidia Triton Documentation

Triton on Jetson

Triton Installation

To install triton from your AIR-T terminal:

  1. Update package lists and install necessary packages:

    sudo apt-get update
    sudo apt-get install -y --no-install-recommends \
        software-properties-common \
        libb64-dev \
        libre2-dev \
        libssl-dev \
        libboost-dev \
        libcurl4-openssl-dev \
        rapidjson-dev \
        patchelf \
        zlib1g-dev \
        g++-8 \
        clang-8 \
        lld-8
    
  2. Download Triton Server

    wget https://github.com/triton-inference-server/server/releases/download/v2.19.0/tritonserver2.19.0-jetpack4.6.1.tgz
    
  3. Upgrade pip and install PyTorch

    pip3 install --upgrade https://developer.download.nvidia.com/compute/redist/jp/v461/pytorch/torch-1.11.0a0+17540c5+nv22.01-cp36-cp36m-linux_aarch64.whl
    
  4. Add LLVM to LD_LIBRARY_PATH

    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/llvm-8/lib"
    
  5. Extract Triton Server

    mkdir /opt/triton/
    tar -xvf tritonserver2.19.0-jetpack4.6.1.tgz -C /opt/triton/
    cd /opt/triton/bin
    chmod +x tritonserver
    

Setup a Model Repository

  • Choose a folder on your AIR-T to hold your triton inference models. Inside this folder you will need to follow this format:

     <model-repository-path>/
       <model-name>/
         [config.pbtxt]
         [<output-labels-file> ...]
         <version>/
           <model-definition-file>
         <version>/
           <model-definition-file>
         ...
       ...
    

We will download the model from the Inference on AIR-T tutorial and use it to request a prediction:

  1. Create model directory

    mkdir triton_models
    cd triton_models
    
  2. Create directories

    mkdir avg_power_net
    mkdir avg_power_net/0
    
  3. Download model onnx file into version folder

    cd avg_power_net/0
    wget https://github.com/deepwavedigital/airstack-examples/raw/master/inference/pytorch/avg_pow_net.onnx -O model.onnx
    
  4. Create the config.pbtxt file show below. This file should live in the top level of this model directory and be named config.pbtxt

    name: "avg_power_net"
    platform: "onnxruntime_onnx"
    max_batch_size: 1
    input [
    {
    name: "input_buffer"
    data_type: TYPE_FP32
    dims: [4096]
    }]
    output [{
    name: "output_buffer"
    data_type: TYPE_FP32
    dims: [1]
    }]
    

Note: More information on model repository setup can be found here model repository documentation

Run the Triton Server

If everything is installed correctly you should be able to start the triton server using:

/opt/triton/bin/tritonserver --model-repository {your model repository} --backend-directory /opt/triton/backends

Your triton inference server is now available at the listed addresses. You should be able to check the average power net model using:

curl http://0.0.0.0:8000/v2/models/avg_power_net

and receive back:

{
    "name": "avg_power_net",
    "versions": ["0"],
    "platform": "onnxruntime_onnx",
    "inputs": [
        {
            "name": "input_buffer",
            "datatype": "FP32",
            "shape": [-1, 4096]
        }
    ],
    "outputs": [
        {
            "name": "output_buffer",
            "datatype": "FP32",
            "shape": [-1, 1]
        }
    ]
}

To request an inference result using python you can follow the code below:

import requests
import numpy as np

# Setup POST Request
url = "http://localhost:8000/v2/models/avg_power_net/infer"
headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}
session = requests.Session()

# Create input data array
input_data = np.random.uniform(-1, 1, 4096).astype(np.float32)
data = (
    {
        "inputs": [{
            "name": "input_buffer",
            "shape": [1, 4096],
            "datatype": "FP32",
            "data": input_data.tolist()
        }],
        "outputs": [{
            "name": "output_buffer"
        }]
    }
)

# Post request
r = session.post(url, json=data, headers=headers)

# Read Results
print(r.content)

For more information on HTTP/REST & GRPC protocols check out: Triton Protocols


Last update: February 26, 2024