# DeepStream-Yolo NVIDIA DeepStream SDK 5.1 configuration for YOLO models ## ### Improvements on this repository * Darknet CFG params parser (not need to edit nvdsparsebbox_Yolo.cpp or another file for native models) * Support for new_coords, beta_nms and scale_x_y params * Support for new models not supported in official DeepStream SDK YOLO. * Support for layers not supported in official DeepStream SDK YOLO. * Support for activations not supported in official DeepStream SDK YOLO. * Support for Convolutional groups * **Support for INT8 calibration** (not available for YOLOv5 models) * **Support for non square models** ## Tutorial * [Basic usage](#basic-usage) * [INT8 calibration](#int8-calibration) * [Configuring to your custom model](https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/customModels.md) * [Multiple YOLO inferences](https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/multipleInferences.md) TensorRT conversion * Native (tested models below) * [YOLOv4x-Mish](https://github.com/AlexeyAB/darknet) [[cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4x-mish.cfg)] [[weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.weights)] * [YOLOv4-CSP](https://github.com/WongKinYiu/ScaledYOLOv4/tree/yolov4-csp) [[cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp.cfg)] [[weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp.weights)] * [YOLOv4](https://github.com/AlexeyAB/darknet) [[cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg)] [[weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights)] * [YOLOv4-Tiny](https://github.com/AlexeyAB/darknet) [[cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg)] [[weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights)] * [YOLOv3-SPP](https://github.com/pjreddie/darknet) [[cfg](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3-spp.cfg)] [[weights](https://pjreddie.com/media/files/yolov3-spp.weights)] * [YOLOv3](https://github.com/pjreddie/darknet) [[cfg](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg)] [[weights](https://pjreddie.com/media/files/yolov3.weights)] * [YOLOv3-Tiny-PRN](https://github.com/WongKinYiu/PartialResidualNetworks) [[cfg](https://raw.githubusercontent.com/WongKinYiu/PartialResidualNetworks/master/cfg/yolov3-tiny-prn.cfg)] [[weights](https://github.com/WongKinYiu/PartialResidualNetworks/raw/master/model/yolov3-tiny-prn.weights)] * [YOLOv3-Tiny](https://github.com/pjreddie/darknet) [[cfg](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3-tiny.cfg)] [[weights](https://pjreddie.com/media/files/yolov3-tiny.weights)] * [YOLOv3-Lite](https://github.com/dog-qiuqiu/MobileNet-Yolo) [[cfg](https://raw.githubusercontent.com/dog-qiuqiu/MobileNet-Yolo/master/MobileNetV2-YOLOv3-Lite/COCO/MobileNetV2-YOLOv3-Lite-coco.cfg)] [[weights](https://github.com/dog-qiuqiu/MobileNet-Yolo/raw/master/MobileNetV2-YOLOv3-Lite/COCO/MobileNetV2-YOLOv3-Lite-coco.weights)] * [YOLOv3-Nano](https://github.com/dog-qiuqiu/MobileNet-Yolo) [[cfg](https://raw.githubusercontent.com/dog-qiuqiu/MobileNet-Yolo/master/MobileNetV2-YOLOv3-Nano/COCO/MobileNetV2-YOLOv3-Nano-coco.cfg)] [[weights](https://github.com/dog-qiuqiu/MobileNet-Yolo/raw/master/MobileNetV2-YOLOv3-Nano/COCO/MobileNetV2-YOLOv3-Nano-coco.weights)] * [YOLO-Fastest 1.1](https://github.com/dog-qiuqiu/Yolo-Fastest) [[cfg](https://raw.githubusercontent.com/dog-qiuqiu/Yolo-Fastest/master/ModelZoo/yolo-fastest-1.1_coco/yolo-fastest-1.1-xl.cfg)] [[weights](https://github.com/dog-qiuqiu/Yolo-Fastest/raw/master/ModelZoo/yolo-fastest-1.1_coco/yolo-fastest-1.1-xl.weights)] * [YOLO-Fastest-XL 1.1](https://github.com/dog-qiuqiu/Yolo-Fastest) [[cfg](https://raw.githubusercontent.com/dog-qiuqiu/Yolo-Fastest/master/ModelZoo/yolo-fastest-1.1_coco/yolo-fastest-1.1.cfg)] [[weights](https://github.com/dog-qiuqiu/Yolo-Fastest/raw/master/ModelZoo/yolo-fastest-1.1_coco/yolo-fastest-1.1.weights)] * [YOLOv2](https://github.com/pjreddie/darknet) [[cfg](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov2.cfg)] [[weights](https://pjreddie.com/media/files/yolov2.weights)] * [YOLOv2-Tiny](https://github.com/pjreddie/darknet) [[cfg](https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov2-tiny.cfg)] [[weights](https://pjreddie.com/media/files/yolov2-tiny.weights)] * External * [YOLOv5 5.0](https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/YOLOv5-5.0.md) * [YOLOv5 4.0](https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/YOLOv5-4.0.md) * [YOLOv5 3.X (3.0/3.1)](https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/YOLOv5-3.X.md) Benchmark * [mAP/FPS comparison between models](#mapfps-comparison-between-models) ## ### Requirements * [NVIDIA DeepStream SDK 5.1](https://developer.nvidia.com/deepstream-sdk) * [DeepStream-Yolo Native](https://github.com/marcoslucianops/DeepStream-Yolo/tree/master/native) (for Darknet YOLO based models) * [DeepStream-Yolo External](https://github.com/marcoslucianops/DeepStream-Yolo/tree/master/external) (for PyTorch YOLOv5 based model) ## ### Basic usage ``` git clone https://github.com/marcoslucianops/DeepStream-Yolo.git cd DeepStream-Yolo/native ``` Download cfg and weights files from your model and move to DeepStream-Yolo/native folder Compile * x86 platform ``` CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo ``` * Jetson platform ``` CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo ``` Edit config_infer_primary.txt for your model (example for YOLOv4) ``` [property] ... # 0=RGB, 1=BGR, 2=GRAYSCALE model-color-format=0 # CFG custom-network-config=yolov4.cfg # Weights model-file=yolov4.weights # Generated TensorRT model (will be created if it doesn't exist) model-engine-file=model_b1_gpu0_fp32.engine # Model labels file labelfile-path=labels.txt # Batch size batch-size=1 # 0=FP32, 1=INT8, 2=FP16 mode network-mode=0 # Number of classes in label file num-detected-classes=80 ... [class-attrs-all] # CONF_THRESH pre-cluster-threshold=0.25 ``` Run ``` deepstream-app -c deepstream_app_config.txt ``` If you want to use YOLOv2 or YOLOv2-Tiny models, change, before run, deepstream_app_config.txt ``` [primary-gie] enable=1 gpu-id=0 gie-unique-id=1 nvbuf-memory-type=0 config-file=config_infer_primary_yoloV2.txt ``` Note: config_infer_primary.txt uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while config_infer_primary_yoloV2.txt uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS. ## ### INT8 calibration Install OpenCV ``` sudo apt-get install libopencv-dev ``` Compile/recompile the nvdsinfer_custom_impl_Yolo lib with OpenCV support * x86 platform ``` cd DeepStream-Yolo/native CUDA_VER=11.1 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo ``` * Jetson platform ``` cd DeepStream-Yolo/native CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo ``` For COCO dataset, download the [val2017](https://drive.google.com/file/d/1gbvfn7mcsGDRZ_luJwtITL-ru2kK99aK/view?usp=sharing), extract, and move to DeepStream-Yolo/native folder Select 1000 random images from COCO dataset to run calibration ``` mkdir calibration for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \ cp val2017/${jpg} calibration/; \ done ``` Create the calibration.txt file with all selected images ``` realpath calibration/*jpg > calibration.txt ``` Set environment variables ``` export INT8_CALIB_IMG_PATH=calibration.txt export INT8_CALIB_BATCH_SIZE=1 ``` Change config_infer_primary.txt file ``` ... model-engine-file=model_b1_gpu0_fp32.engine #int8-calib-file=calib.table ... network-mode=0 ... ``` To ``` ... model-engine-file=model_b1_gpu0_int8.engine int8-calib-file=calib.table ... network-mode=1 ... ``` Run ``` deepstream-app -c deepstream_app_config.txt ``` Note: NVIDIA recommends at least 500 images to get a good accuracy. In this example I used 1000 images to get better accuracy (more images = more accuracy). Higher INT8_CALIB_BATCH_SIZE values will increase the accuracy and calibration speed. Set it according to you GPU memory. This process can take a long time. The calibration isn't available for YOLOv5 models. ## ### mAP/FPS comparison between models
Open ``` valid = val2017 (COCO) NMS = 0.45 (changed to beta_nms when used in Darknet cfg file) / 0.6 (YOLOv5 models) pre-cluster-threshold = 0.001 (mAP eval) / 0.25 (FPS measurement) batch-size = 1 FPS measurement display width = 1920 FPS measurement display height = 1080 NOTE: Used NVIDIA GTX 1050 (4GB Mobile) for evaluate. Used maintain-aspect-ratio=1 in config_infer file for YOLOv4 (with letter_box=1) and YOLOv5 models. For INT8 calibration, was used 1000 random images from val2017 (COCO) and INT8_CALIB_BATCH_SIZE=1. ``` | TensorRT | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS
(with display) | FPS
(without display) | |:---------------:|:---------:|:----------:|:------------:|:-------:|:--------:|:-----------------------:|:--------------------------:| | YOLOv5x 5.0 | FP32 | 640 | 0. | 0. | 0. | . | . | | YOLOv5l 5.0 | FP32 | 640 | 0. | 0. | 0. | . | . | | YOLOv5m 5.0 | FP32 | 640 | 0. | 0. | 0. | . | . | | YOLOv5s 5.0 | FP32 | 640 | 0. | 0. | 0. | . | . | | YOLOv5s 5.0 | FP32 | 416 | 0. | 0. | 0. | . | . | | YOLOv4x-MISH | FP32 | 640 | 0.461 | 0.649 | 0.499 | . | . | | YOLOv4x-MISH | **INT8** | 640 | 0.443 | 0.629 | 0.479 | . | . | | YOLOv4x-MISH | FP32 | 608 | 0.461 | 0.650 | 0.496 | . | . | | YOLOv4-CSP | FP32 | 640 | 0.443 | 0.632 | 0.477 | . | . | | YOLOv4-CSP | FP32 | 608 | 0.443 | 0.632 | 0.477 | . | . | | YOLOv4-CSP | FP32 | 512 | 0.437 | 0.625 | 0.471 | . | . | | YOLOv4-CSP | **INT8** | 512 | 0.414 | 0.601 | 0.447 | . | . | | YOLOv4 | FP32 | 640 | 0.492 | 0.729 | 0.547 | . | . | | YOLOv4 | FP32 | 608 | 0.499 | 0.739 | 0.551 | . | . | | YOLOv4 | **INT8** | 608 | 0.483 | 0.728 | 0.534 | . | . | | YOLOv4 | FP32 | 512 | 0.492 | 0.730 | 0.542 | . | . | | YOLOv4 | FP32 | 416 | 0.468 | 0.702 | 0.507 | . | . | | YOLOv3-SPP | FP32 | 608 | 0.412 | 0.687 | 0.434 | . | . | | YOLOv3 | FP32 | 608 | 0.378 | 0.674 | 0.389 | . | . | | YOLOv3 | **INT8** | 608 | 0.381 | 0.677 | 0.388 | . | . | | YOLOv3 | FP32 | 416 | 0.373 | 0.669 | 0.379 | . | . | | YOLOv2 | FP32 | 608 | 0.211 | 0.365 | 0.220 | . | . | | YOLOv2 | FP32 | 416 | 0.207 | 0.362 | 0.211 | . | . | | YOLOv4-Tiny | FP32 | 416 | 0.216 | 0.403 | 0.207 | . | . | | YOLOv4-Tiny | **INT8** | 416 | 0.203 | 0.385 | 0.192 | . | . | | YOLOv3-Tiny-PRN | FP32 | 416 | 0.168 | 0.381 | 0.126 | . | . | | YOLOv3-Tiny-PRN | **INT8** | 416 | 0.155 | 0.358 | 0.113 | . | . | | YOLOv3-Tiny | FP32 | 416 | 0.096 | 0.203 | 0.080 | . | . | | YOLOv2-Tiny | FP32 | 416 | 0.084 | 0.194 | 0.062 | . | . | | YOLOv3-Lite | FP32 | 416 | 0.169 | 0.356 | 0.137 | . | . | | YOLOv3-Lite | FP32 | 320 | 0.158 | 0.328 | 0.132 | . | . | | YOLOv3-Nano | FP32 | 416 | 0.128 | 0.278 | 0.099 | . | . | | YOLOv3-Nano | FP32 | 320 | 0.122 | 0.260 | 0.099 | . | . | | YOLO-Fastest-XL | FP32 | 416 | 0.160 | 0.342 | 0.130 | . | . | | YOLO-Fastest-XL | FP32 | 320 | 0.158 | 0.329 | 0.135 | . | . | | YOLO-Fastest | FP32 | 416 | 0.101 | 0.230 | 0.072 | . | . | | YOLO-Fastest | FP32 | 320 | 0.102 | 0.232 | 0.073 | . | . |
## ### Extract metadata You can get metadata from deepstream in Python and C++. For C++, you need edit deepstream-app or deepstream-test code. For Python your need install and edit [deepstream_python_apps](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps). You need manipulate NvDsObjectMeta ([Python](https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsMeta/NvDsObjectMeta.html)/[C++](https://docs.nvidia.com/metropolis/deepstream/sdk-api/Meta/_NvDsObjectMeta.html)), NvDsFrameMeta ([Python](https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsMeta/NvDsFrameMeta.html)/[C++](https://docs.nvidia.com/metropolis/deepstream/sdk-api/Meta/_NvDsFrameMeta.html)) and NvOSD_RectParams ([Python](https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsOSD/NvOSD_RectParams.html)/[C++](https://docs.nvidia.com/metropolis/deepstream/sdk-api/OSD/Data_Structures/_NvOSD_FrameRectParams.html)) to get label, position, etc. of bboxs. In C++ deepstream-app application, your code need be in analytics_done_buf_prob function. In C++/Python deepstream-test application, your code need be in osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe function. Python is slightly slower than C (about 5-10%). ## This code is open-source. You can use as you want. :) If you want me to create commercial DeepStream SDK projects for you, contact me at email address available in GitHub. My projects: https://www.youtube.com/MarcosLucianoTV