This repo demonstrates the use of triton inference server with deepstream for the purpose of dynamic model reload.
It uses the peoplenet model with a deepstream pipeline, and demonstrates the ability to load one of two different versions available without stopping the deepstream pipeline.
These steps are tested on a Nvidia Jetson target running Jetpack 6.2 and Deepstream 7.1
- Connect a camera to the Jetson (or, optionally change the deepstream source file to use the file input instead)
- Run
tao_download_and_convert_to_plan.sh, following instructions to install necessary dependencies. This will download the tao files from NVIDIA and prepare a local directory with extracted and converted contents suitable for use with this demo. - Run
setup_model_repo_with_version.sh $versionto setup the model repo with the desired peoplenet version based on the versions listed in the environment.sh file. Running with no arguments will provide a list of supported versions. - Open a dedicated command window (or screen/tmux session) and run
start_triton_server.shto start the triton server with configuration for the model location setup in the previous step. You can leave this window open to monitor the triton server. - Open a command window on a session attached to a UI screen and run
start_deepstream_pipeline.shto start the deepstream pipeline. - Open a command window and run
get_model_stats.sh. Note that the version printed here should match the version of the model loaded in thesetup_model_repo_with_version.shstep. - With the pipeline still running, setup the model repo with a new model version using the
setup_model_repo_with_version.shscript. - With the pipeline still running, reload the model on the inference server using the
reload_model_on_server.shscript. You should notice:
- The triton server should note a changed version.
- The pipeline should continue running, now with the updated model.
- The version reported by
get_model_stats.shwill match the version of the newly loaded model.
In order to more obviosly show the difference in model reload, you can use the clobber_model.py
script to corrupt one of the input models.
- Run the clobber_model.py script to corrupt the model weights, passing a path to the ngc_models download .onnx file.
- Re-generate the model.plan file using the trtexec function in the download_and_convert_to_plan script.