New features and fixes

This commit is contained in:
Marcos Luciano
2023-06-05 14:48:23 -03:00
parent 3f14b0d95d
commit 66a6754b77
57 changed files with 2137 additions and 1534 deletions

936
README.md
View File

@@ -3,7 +3,7 @@
NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for YOLO models NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for YOLO models
-------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------
### Important: please generate the ONNX model and the TensorRT engine again with the updated files ### Important: please export the ONNX model with the new export file, generate the TensorRT engine again with the updated files, and use the new config_infer_primary file according to your model
-------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------
### Future updates ### Future updates
@@ -19,11 +19,14 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for Y
* Support for INT8 calibration * Support for INT8 calibration
* Support for non square models * Support for non square models
* Models benchmarks * Models benchmarks
* **Support for Darknet YOLO models (YOLOv4, etc) using cfg and weights conversion with GPU post-processing** * Support for Darknet models (YOLOv4, etc) using cfg and weights conversion with GPU post-processing
* **Support for YOLO-NAS, PPYOLOE+, PPYOLOE, DAMO-YOLO, YOLOX, YOLOR, YOLOv8, YOLOv7, YOLOv6 and YOLOv5 using ONNX conversion with GPU post-processing** * Support for YOLO-NAS, PPYOLOE+, PPYOLOE, DAMO-YOLO, YOLOX, YOLOR, YOLOv8, YOLOv7, YOLOv6 and YOLOv5 using ONNX conversion with GPU post-processing
* **GPU bbox parser (it is slightly slower than CPU bbox parser on V100 GPU tests)** * GPU bbox parser (it is slightly slower than CPU bbox parser on V100 GPU tests)
* **Dynamic batch-size for ONNX exported models (YOLO-NAS, PPYOLOE+, PPYOLOE, DAMO-YOLO, YOLOX, YOLOR, YOLOv8, YOLOv7, YOLOv6 and YOLOv5)**
* **Support for DeepStream 5.1** * **Support for DeepStream 5.1**
* **Custom ONNX model parser (`NvDsInferYoloCudaEngineGet`)**
* **Dynamic batch-size for Darknet and ONNX exported models**
* **INT8 calibration (PTQ) for Darknet and ONNX exported models**
* **New output structure (fix wrong output on DeepStream < 6.2) - it need to export the ONNX model with the new export file, generate the TensorRT engine again with the updated files, and use the new config_infer_primary file according to your model**
## ##
@@ -31,12 +34,12 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for Y
* [Requirements](#requirements) * [Requirements](#requirements)
* [Suported models](#supported-models) * [Suported models](#supported-models)
* [Benchmarks](#benchmarks) * [Benchmarks](docs/benchmarks.md)
* [dGPU installation](#dgpu-installation) * [dGPU installation](docs/dGPUInstalation.md)
* [Basic usage](#basic-usage) * [Basic usage](#basic-usage)
* [Docker usage](#docker-usage) * [Docker usage](#docker-usage)
* [NMS configuration](#nms-configuration) * [NMS configuration](#nms-configuration)
* [INT8 calibration](#int8-calibration) * [INT8 calibration](docs/INT8Calibration.md)
* [YOLOv5 usage](docs/YOLOv5.md) * [YOLOv5 usage](docs/YOLOv5.md)
* [YOLOv6 usage](docs/YOLOv6.md) * [YOLOv6 usage](docs/YOLOv6.md)
* [YOLOv7 usage](docs/YOLOv7.md) * [YOLOv7 usage](docs/YOLOv7.md)
@@ -137,7 +140,7 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for Y
### Suported models ### Suported models
* [Darknet YOLO](https://github.com/AlexeyAB/darknet) * [Darknet](https://github.com/AlexeyAB/darknet)
* [MobileNet-YOLO](https://github.com/dog-qiuqiu/MobileNet-Yolo) * [MobileNet-YOLO](https://github.com/dog-qiuqiu/MobileNet-Yolo)
* [YOLO-Fastest](https://github.com/dog-qiuqiu/Yolo-Fastest) * [YOLO-Fastest](https://github.com/dog-qiuqiu/Yolo-Fastest)
* [YOLOv5](https://github.com/ultralytics/yolov5) * [YOLOv5](https://github.com/ultralytics/yolov5)
@@ -152,784 +155,6 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 configuration for Y
## ##
### Benchmarks
#### Config
```
board = NVIDIA Tesla V100 16GB (AWS: p3.2xlarge)
batch-size = 1
eval = val2017 (COCO)
sample = 1920x1080 video
```
**NOTE**: Used maintain-aspect-ratio=1 in config_infer file for Darknet (with letter_box=1) and PyTorch models.
#### NMS config
- Eval
```
nms-iou-threshold = 0.6 (Darknet) / 0.65 (YOLOv5, YOLOv6, YOLOv7, YOLOR and YOLOX) / 0.7 (Paddle, YOLO-NAS, DAMO-YOLO, YOLOv8 and YOLOv7-u6)
pre-cluster-threshold = 0.001
topk = 300
```
- Test
```
nms-iou-threshold = 0.45
pre-cluster-threshold = 0.25
topk = 300
```
#### Results
**NOTE**: * = PyTorch.
**NOTE**: ** = The YOLOv4 is trained with the trainvalno5k set, so the mAP is high on val2017 test.
**NOTE**: star = DAMO-YOLO model trained with distillation.
**NOTE**: The V100 GPU decoder max out at 625-635 FPS on DeepStream even using lighter models.
**NOTE**: The GPU bbox parser is a bit slower than CPU bbox parser on V100 GPU tests.
| DeepStream | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS<br />(without display) |
|:------------------:|:---------:|:----------:|:------------:|:-------:|:--------:|:--------------------------:|
| YOLO-NAS L | FP16 | 640 | 0.484 | 0.658 | 0.532 | 235.27 |
| YOLO-NAS M | FP16 | 640 | 0.480 | 0.651 | 0.524 | 287.39 |
| YOLO-NAS S | FP16 | 640 | 0.442 | 0.614 | 0.485 | 478.52 |
| PP-YOLOE+_x | FP16 | 640 | 0.528 | 0.705 | 0.579 | 121.17 |
| PP-YOLOE+_l | FP16 | 640 | 0.511 | 0.686 | 0.557 | 191.82 |
| PP-YOLOE+_m | FP16 | 640 | 0.483 | 0.658 | 0.528 | 264.39 |
| PP-YOLOE+_s | FP16 | 640 | 0.424 | 0.594 | 0.464 | 476.13 |
| PP-YOLOE-s (400) | FP16 | 640 | 0.423 | 0.589 | 0.463 | 461.23 |
| DAMO-YOLO-L star | FP16 | 640 | 0.502 | 0.674 | 0.551 | 176.93 |
| DAMO-YOLO-M star | FP16 | 640 | 0.485 | 0.656 | 0.530 | 242.24 |
| DAMO-YOLO-S star | FP16 | 640 | 0.460 | 0.631 | 0.502 | 385.09 |
| DAMO-YOLO-S | FP16 | 640 | 0.445 | 0.611 | 0.486 | 378.68 |
| DAMO-YOLO-T star | FP16 | 640 | 0.419 | 0.586 | 0.455 | 492.24 |
| DAMO-YOLO-Nl | FP16 | 416 | 0.392 | 0.559 | 0.423 | 483.73 |
| DAMO-YOLO-Nm | FP16 | 416 | 0.371 | 0.532 | 0.402 | 555.94 |
| DAMO-YOLO-Ns | FP16 | 416 | 0.312 | 0.460 | 0.335 | 627.67 |
| YOLOX-x | FP16 | 640 | 0.447 | 0.616 | 0.483 | 125.40 |
| YOLOX-l | FP16 | 640 | 0.430 | 0.598 | 0.466 | 193.10 |
| YOLOX-m | FP16 | 640 | 0.397 | 0.566 | 0.431 | 298.61 |
| YOLOX-s | FP16 | 640 | 0.335 | 0.502 | 0.365 | 522.05 |
| YOLOX-s legacy | FP16 | 640 | 0.375 | 0.569 | 0.407 | 518.52 |
| YOLOX-Darknet | FP16 | 640 | 0.414 | 0.595 | 0.453 | 212.88 |
| YOLOX-Tiny | FP16 | 640 | 0.274 | 0.427 | 0.292 | 633.95 |
| YOLOX-Nano | FP16 | 640 | 0.212 | 0.342 | 0.222 | 633.04 |
| YOLOv8x | FP16 | 640 | 0.499 | 0.669 | 0.545 | 130.49 |
| YOLOv8l | FP16 | 640 | 0.491 | 0.660 | 0.535 | 180.75 |
| YOLOv8m | FP16 | 640 | 0.468 | 0.637 | 0.510 | 278.08 |
| YOLOv8s | FP16 | 640 | 0.415 | 0.578 | 0.453 | 493.45 |
| YOLOv8n | FP16 | 640 | 0.343 | 0.492 | 0.373 | 627.43 |
| YOLOv7-u6 | FP16 | 640 | 0.484 | 0.652 | 0.530 | 193.54 |
| YOLOv7x* | FP16 | 640 | 0.496 | 0.679 | 0.536 | 155.07 |
| YOLOv7* | FP16 | 640 | 0.476 | 0.660 | 0.518 | 226.01 |
| YOLOv7-Tiny Leaky* | FP16 | 640 | 0.345 | 0.516 | 0.372 | 626.23 |
| YOLOv7-Tiny Leaky* | FP16 | 416 | 0.328 | 0.493 | 0.349 | 633.90 |
| YOLOv6-L 4.0 | FP16 | 640 | 0.490 | 0.671 | 0.535 | 178.41 |
| YOLOv6-M 4.0 | FP16 | 640 | 0.460 | 0.635 | 0.502 | 293.39 |
| YOLOv6-S 4.0 | FP16 | 640 | 0.416 | 0.585 | 0.453 | 513.90 |
| YOLOv6-N 4.0 | FP16 | 640 | 0.349 | 0.503 | 0.378 | 633.37 |
| YOLOv5x 7.0 | FP16 | 640 | 0.471 | 0.652 | 0.513 | 149.93 |
| YOLOv5l 7.0 | FP16 | 640 | 0.455 | 0.637 | 0.497 | 235.55 |
| YOLOv5m 7.0 | FP16 | 640 | 0.421 | 0.604 | 0.459 | 351.69 |
| YOLOv5s 7.0 | FP16 | 640 | 0.344 | 0.529 | 0.372 | 618.13 |
| YOLOv5n 7.0 | FP16 | 640 | 0.247 | 0.414 | 0.257 | 629.66 |
##
### dGPU installation
To install the DeepStream on dGPU (x86 platform), without docker, we need to do some steps to prepare the computer.
<details><summary>DeepStream 6.2</summary>
#### 1. Disable Secure Boot in BIOS
#### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer-plugins-base1.0-dev libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev libjsoncpp-dev protobuf-compiler
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
#### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
#### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/525.105.17/NVIDIA-Linux-x86_64-525.105.17.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/525.105.17/NVIDIA-Linux-x86_64-525.105.17.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
#### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
#### 6. Install TensorRT
```
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get install libnvinfer8=8.5.2-1+cuda11.8 libnvinfer-plugin8=8.5.2-1+cuda11.8 libnvparsers8=8.5.2-1+cuda11.8 libnvonnxparsers8=8.5.2-1+cuda11.8 libnvinfer-bin=8.5.2-1+cuda11.8 libnvinfer-dev=8.5.2-1+cuda11.8 libnvinfer-plugin-dev=8.5.2-1+cuda11.8 libnvparsers-dev=8.5.2-1+cuda11.8 libnvonnxparsers-dev=8.5.2-1+cuda11.8 libnvinfer-samples=8.5.2-1+cuda11.8 libcudnn8=8.7.0.84-1+cuda11.8 libcudnn8-dev=8.7.0.84-1+cuda11.8 python3-libnvinfer=8.5.2-1+cuda11.8 python3-libnvinfer-dev=8.5.2-1+cuda11.8
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* python3-libnvinfer* tensorrt
```
#### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-getting-started) and install the DeepStream SDK
DeepStream 6.2 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.2_6.2.0-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.8 /usr/local/cuda
```
#### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.1.1</summary>
#### 1. Disable Secure Boot in BIOS
#### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt-get install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer-plugins-base1.0-dev libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
#### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
#### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
#### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
#### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.4 GA for Ubuntu 20.04 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6 and 11.7 DEB local repo Package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu2004-cuda11.6-trt8.4.1.5-ga-20220604_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu2004-cuda11.6-trt8.4.1.5-ga-20220604/9a60d8bf.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.4.1-1+cuda11.6 libnvinfer-plugin8=8.4.1-1+cuda11.6 libnvparsers8=8.4.1-1+cuda11.6 libnvonnxparsers8=8.4.1-1+cuda11.6 libnvinfer-bin=8.4.1-1+cuda11.6 libnvinfer-dev=8.4.1-1+cuda11.6 libnvinfer-plugin-dev=8.4.1-1+cuda11.6 libnvparsers-dev=8.4.1-1+cuda11.6 libnvonnxparsers-dev=8.4.1-1+cuda11.6 libnvinfer-samples=8.4.1-1+cuda11.6 libcudnn8=8.4.1.50-1+cuda11.6 libcudnn8-dev=8.4.1.50-1+cuda11.6 python3-libnvinfer=8.4.1-1+cuda11.6 python3-libnvinfer-dev=8.4.1-1+cuda11.6
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
#### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-getting-started) and install the DeepStream SDK
DeepStream 6.1.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.1_6.1.1-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.7 /usr/local/cuda
```
#### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.1</summary>
#### 1. Disable Secure Boot in BIOS
#### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt-get install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
#### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
#### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
#### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.6.1/local_installers/cuda_11.6.1_510.47.03_linux.run
sudo sh cuda_11.6.1_510.47.03_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
#### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.2 GA Update 4 for Ubuntu 20.04 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4 and 11.5 DEB local repo Package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505/82307095.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.2.5-1+cuda11.4 libnvinfer-plugin8=8.2.5-1+cuda11.4 libnvparsers8=8.2.5-1+cuda11.4 libnvonnxparsers8=8.2.5-1+cuda11.4 libnvinfer-bin=8.2.5-1+cuda11.4 libnvinfer-dev=8.2.5-1+cuda11.4 libnvinfer-plugin-dev=8.2.5-1+cuda11.4 libnvparsers-dev=8.2.5-1+cuda11.4 libnvonnxparsers-dev=8.2.5-1+cuda11.4 libnvinfer-samples=8.2.5-1+cuda11.4 libnvinfer-doc=8.2.5-1+cuda11.4 libcudnn8-dev=8.4.0.27-1+cuda11.6 libcudnn8=8.4.0.27-1+cuda11.6
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
#### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-sdk-download-tesla-archived) and install the DeepStream SDK
DeepStream 6.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.1_6.1.0-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.6 /usr/local/cuda
```
#### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.0.1 / 6.0</summary>
#### 1. Disable Secure Boot in BIOS
<details><summary>If you are using a laptop with newer Intel/AMD processors and your Graphics in Settings->Details->About tab is llvmpipe, please update the kernel.</summary>
```
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100_5.11.0-051100.202102142330_all.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-image-unsigned-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-modules-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
sudo dpkg -i *.deb
sudo reboot
```
</details>
#### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install libssl1.0.0 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Install DKMS only if you are using the default Ubuntu kernel
```
sudo apt-get install dkms
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
#### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
#### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/470.129.06/NVIDIA-Linux-x86_64-470.129.06.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/470.129.06/NVIDIA-Linux-x86_64-470.129.06.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
</blockquote></details>
#### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda_11.4.1_470.57.02_linux.run
sudo sh cuda_11.4.1_470.57.02_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
#### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.0.1 GA for Ubuntu 18.04 and CUDA 11.3 DEB local repo package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626/7fa2af80.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.0.1-1+cuda11.3 libnvinfer-plugin8=8.0.1-1+cuda11.3 libnvparsers8=8.0.1-1+cuda11.3 libnvonnxparsers8=8.0.1-1+cuda11.3 libnvinfer-bin=8.0.1-1+cuda11.3 libnvinfer-dev=8.0.1-1+cuda11.3 libnvinfer-plugin-dev=8.0.1-1+cuda11.3 libnvparsers-dev=8.0.1-1+cuda11.3 libnvonnxparsers-dev=8.0.1-1+cuda11.3 libnvinfer-samples=8.0.1-1+cuda11.3 libnvinfer-doc=8.0.1-1+cuda11.3 libcudnn8-dev=8.2.1.32-1+cuda11.3 libcudnn8=8.2.1.32-1+cuda11.3
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
#### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-sdk-download-tesla-archived) and install the DeepStream SDK
* DeepStream 6.0.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.0_6.0.1-1_amd64.deb
```
* DeepStream 6.0 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.0_6.0.0-1_amd64.deb
```
* Run
```
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.4 /usr/local/cuda
```
#### 8. Reboot the computer
```
sudo reboot
```
</details>
##
### Basic usage ### Basic usage
#### 1. Download the repo #### 1. Download the repo
@@ -970,7 +195,7 @@ cd DeepStream-Yolo
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -979,18 +204,12 @@ cd DeepStream-Yolo
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
#### 4. Edit the `config_infer_primary.txt` file according to your model (example for YOLOv4) #### 4. Edit the `config_infer_primary.txt` file according to your model (example for YOLOv4)
``` ```
@@ -1001,6 +220,14 @@ model-file=yolov4.weights
... ...
``` ```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
```
#### 5. Run #### 5. Run
``` ```
@@ -1066,125 +293,6 @@ topk=300
## ##
### INT8 calibration
**NOTE**: For now, Only for Darknet YOLO model.
#### 1. Install OpenCV
```
sudo apt-get install libopencv-dev
```
#### 2. Compile/recompile the `nvdsinfer_custom_impl_Yolo` lib with OpenCV support
* DeepStream 6.2 on x86 platform
```
CUDA_VER=11.8 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1.1 on x86 platform
```
CUDA_VER=11.7 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1 on x86 platform
```
CUDA_VER=11.6 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 on x86 platform
```
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 5.1 on x86 platform
```
CUDA_VER=11.1 OPENCV=1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
```
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 on Jetson platform
```
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 OPENCV=1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
#### 3. For COCO dataset, download the [val2017](https://drive.google.com/file/d/1gbvfn7mcsGDRZ_luJwtITL-ru2kK99aK/view?usp=sharing), extract, and move to DeepStream-Yolo folder
* Select 1000 random images from COCO dataset to run calibration
```
mkdir calibration
```
```
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
cp ${jpg} calibration/; \
done
```
* Create the `calibration.txt` file with all selected images
```
realpath calibration/*jpg > calibration.txt
```
* Set environment variables
```
export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1
```
* Edit the `config_infer` file
```
...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...
```
To
```
...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...
```
* Run
```
deepstream-app -c deepstream_app_config.txt
```
**NOTE**: NVIDIA recommends at least 500 images to get a good accuracy. On this example, I recommend to use 1000 images to get better accuracy (more images = more accuracy). Higher `INT8_CALIB_BATCH_SIZE` values will result in more accuracy and faster calibration speed. Set it according to you GPU memory. This process may take a long time.
##
### Extract metadata ### Extract metadata
You can get metadata from DeepStream using Python and C/C++. For C/C++, you can edit the `deepstream-app` or `deepstream-test` codes. For Python, your can install and edit [deepstream_python_apps](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps). You can get metadata from DeepStream using Python and C/C++. For C/C++, you can edit the `deepstream-app` or `deepstream-test` codes. For Python, your can install and edit [deepstream_python_apps](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps).

View File

@@ -17,7 +17,9 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet engine-create-func-name=NvDsInferYoloCudaEngineGet

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=1 net-scale-factor=1
model-color-format=0 model-color-format=0
onnx-file=damoyolo_tinynasL25_S.onnx onnx-file=damoyolo_tinynasL25_S.onnx
model-engine-file=damoyolo_tinynasL25_S.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -15,8 +15,11 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYoloE parse-bbox-func-name=NvDsInferParseYoloE
#parse-bbox-func-name=NvDsInferParseYoloECuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -4,7 +4,7 @@ net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
model-color-format=0 model-color-format=0
onnx-file=ppyoloe_crn_s_400e_coco.onnx onnx-file=ppyoloe_crn_s_400e_coco.onnx
model-engine-file=ppyoloe_crn_s_400e_coco.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYoloE parse-bbox-func-name=NvDsInferParseYoloE
#parse-bbox-func-name=NvDsInferParseYoloECuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx
model-engine-file=ppyoloe_plus_crn_s_80e_coco.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -15,8 +15,11 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYoloE parse-bbox-func-name=NvDsInferParseYoloE
#parse-bbox-func-name=NvDsInferParseYoloECuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -16,7 +16,9 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet engine-create-func-name=NvDsInferYoloCudaEngineGet

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolov5s.onnx onnx-file=yolov5s.onnx
model-engine-file=yolov5s.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolov6s.onnx onnx-file=yolov6s.onnx
model-engine-file=yolov6s.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolov7.onnx onnx-file=yolov7.onnx
model-engine-file=yolov7.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolov8s.onnx onnx-file=yolov8s.onnx
model-engine-file=yolov8s.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolo_nas_s_coco.onnx onnx-file=yolo_nas_s_coco.onnx
model-engine-file=yolo_nas_s_coco.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYoloE parse-bbox-func-name=NvDsInferParseYoloE
#parse-bbox-func-name=NvDsInferParseYoloECuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolor_csp.onnx onnx-file=yolor_csp.onnx
model-engine-file=yolor_csp.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,7 +3,7 @@ gpu-id=0
net-scale-factor=1 net-scale-factor=1
model-color-format=0 model-color-format=0
onnx-file=yolox_s.onnx onnx-file=yolox_s.onnx
model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,8 +16,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -4,7 +4,7 @@ net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
model-color-format=0 model-color-format=0
onnx-file=yolox_s.onnx onnx-file=yolox_s.onnx
model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -17,8 +17,11 @@ network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
#force-implicit-batch-dim=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -43,6 +43,24 @@ Generate the ONNX model file (example for DAMO-YOLO-S*)
python3 export_damoyolo.py -w damoyolo_tinynasL25_S_477.pth -c configs/damoyolo_tinynasL25_S.py --simplify --dynamic python3 export_damoyolo.py -w damoyolo_tinynasL25_S_477.pth -c configs/damoyolo_tinynasL25_S.py --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 11 or lower. **NOTE**: If you are using DeepStream 5.1, use opset 11 or lower.
``` ```
@@ -107,7 +125,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -116,18 +134,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_damoyolo file ### Edit the config_infer_primary_damoyolo file
@@ -138,7 +150,6 @@ Edit the `config_infer_primary_damoyolo.txt` file according to your model (examp
[property] [property]
... ...
onnx-file=damoyolo_tinynasL25_S.onnx onnx-file=damoyolo_tinynasL25_S.onnx
model-engine-file=damoyolo_tinynasL25_S.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -149,7 +160,17 @@ parse-bbox-func-name=NvDsInferParseYoloE
**NOTE**: The **DAMO-YOLO** do not resize the input with padding. To get better accuracy, use **NOTE**: The **DAMO-YOLO** do not resize the input with padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=0 maintain-aspect-ratio=0
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

108
docs/INT8Calibration.md Normal file
View File

@@ -0,0 +1,108 @@
# INT8 calibration (PTQ)
### 1. Install OpenCV
```
sudo apt-get install libopencv-dev
```
### 2. Compile/recompile the `nvdsinfer_custom_impl_Yolo` lib with OpenCV support
* DeepStream 6.2 on x86 platform
```
CUDA_VER=11.8 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1.1 on x86 platform
```
CUDA_VER=11.7 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1 on x86 platform
```
CUDA_VER=11.6 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 on x86 platform
```
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 5.1 on x86 platform
```
CUDA_VER=11.1 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
```
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
```
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
```
### 3. For COCO dataset, download the [val2017](https://drive.google.com/file/d/1gbvfn7mcsGDRZ_luJwtITL-ru2kK99aK/view?usp=sharing), extract, and move to DeepStream-Yolo folder
* Select 1000 random images from COCO dataset to run calibration
```
mkdir calibration
```
```
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
cp ${jpg} calibration/; \
done
```
* Create the `calibration.txt` file with all selected images
```
realpath calibration/*jpg > calibration.txt
```
* Set environment variables
```
export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1
```
* Edit the `config_infer` file
```
...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...
```
To
```
...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...
```
* Run
```
deepstream-app -c deepstream_app_config.txt
```
**NOTE**: NVIDIA recommends at least 500 images to get a good accuracy. On this example, I recommend to use 1000 images to get better accuracy (more images = more accuracy). Higher `INT8_CALIB_BATCH_SIZE` values will result in more accuracy and faster calibration speed. Set it according to you GPU memory. This process may take a long time.

View File

@@ -38,7 +38,25 @@ Generate the ONNX model file (example for PP-YOLOE+_s)
``` ```
pip3 install onnx onnxsim onnxruntime pip3 install onnx onnxsim onnxruntime
python3 export_ppyoloe.py -w ppyoloe_plus_crn_s_80e_coco.pdparams -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --simplify python3 export_ppyoloe.py -w ppyoloe_plus_crn_s_80e_coco.pdparams -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --simplify --dynamic
```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
``` ```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 11. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 11.
@@ -84,7 +102,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -93,18 +111,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_ppyoloe_plus file ### Edit the config_infer_primary_ppyoloe_plus file
@@ -115,7 +127,6 @@ Edit the `config_infer_primary_ppyoloe_plus.txt` file according to your model (e
[property] [property]
... ...
onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx
model-engine-file=ppyoloe_plus_crn_s_80e_coco.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -128,13 +139,17 @@ parse-bbox-func-name=NvDsInferParseYoloE
**NOTE**: The **PP-YOLOE+ and PP-YOLOE legacy** do not resize the input with padding. To get better accuracy, use **NOTE**: The **PP-YOLOE+ and PP-YOLOE legacy** do not resize the input with padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=0 maintain-aspect-ratio=0
...
``` ```
**NOTE**: The **PP-YOLOE+** uses zero mean normalization on the image preprocess. It is important to change the `net-scale-factor` according to the trained values. **NOTE**: The **PP-YOLOE+** uses zero mean normalization on the image preprocess. It is important to change the `net-scale-factor` according to the trained values.
``` ```
...
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
...
``` ```
**NOTE**: The **PP-YOLOE legacy** uses normalization on the image preprocess. It is important to change the `net-scale-factor` and `offsets` according to the trained values. **NOTE**: The **PP-YOLOE legacy** uses normalization on the image preprocess. It is important to change the `net-scale-factor` and `offsets` according to the trained values.
@@ -142,8 +157,18 @@ net-scale-factor=0.0039215697906911373
Default: `mean = 0.485, 0.456, 0.406` and `std = 0.229, 0.224, 0.225` Default: `mean = 0.485, 0.456, 0.406` and `std = 0.229, 0.224, 0.225`
``` ```
...
net-scale-factor=0.0173520735727919486 net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -46,6 +46,24 @@ Generate the ONNX model file (example for YOLO-NAS S)
python3 export_yolonas.py -m yolo_nas_s -w yolo_nas_s_coco.pth --simplify --dynamic python3 export_yolonas.py -m yolo_nas_s -w yolo_nas_s_coco.pth --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 14. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 14.
``` ```
@@ -128,7 +146,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -137,18 +155,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yolonas file ### Edit the config_infer_primary_yolonas file
@@ -159,7 +171,6 @@ Edit the `config_infer_primary_yolonas.txt` file according to your model (exampl
[property] [property]
... ...
onnx-file=yolo_nas_s_coco.onnx onnx-file=yolo_nas_s_coco.onnx
model-engine-file=yolo_nas_s_coco.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -170,8 +181,18 @@ parse-bbox-func-name=NvDsInferParseYoloE
**NOTE**: The **YOLO-NAS** resizes the input with left/top padding. To get better accuracy, use **NOTE**: The **YOLO-NAS** resizes the input with left/top padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -55,6 +55,24 @@ Generate the ONNX model file
python3 export_yolor.py -w yolor-p6.pt --simplify --dynamic python3 export_yolor.py -w yolor-p6.pt --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 12. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 12.
``` ```
@@ -125,7 +143,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -134,18 +152,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yolor file ### Edit the config_infer_primary_yolor file
@@ -156,7 +168,6 @@ Edit the `config_infer_primary_yolor.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolor_csp.onnx onnx-file=yolor_csp.onnx
model-engine-file=yolor_csp.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -167,8 +178,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOR** resizes the input with center padding. To get better accuracy, use **NOTE**: The **YOLOR** resizes the input with center padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -46,6 +46,24 @@ Generate the ONNX model file (example for YOLOX-s)
python3 export_yolox.py -w yolox_s.pth -c exps/default/yolox_s.py --simplify --dynamic python3 export_yolox.py -w yolox_s.pth -c exps/default/yolox_s.py --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 11. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 11.
``` ```
@@ -89,7 +107,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -98,18 +116,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yolox file ### Edit the config_infer_primary_yolox file
@@ -120,7 +132,6 @@ Edit the `config_infer_primary_yolox.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolox_s.onnx onnx-file=yolox_s.onnx
model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -133,14 +144,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOX and YOLOX legacy** resize the input with left/top padding. To get better accuracy, use **NOTE**: The **YOLOX and YOLOX legacy** resize the input with left/top padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
...
``` ```
**NOTE**: The **YOLOX** uses no normalization on the image preprocess. It is important to change the `net-scale-factor` according to the trained values. **NOTE**: The **YOLOX** uses no normalization on the image preprocess. It is important to change the `net-scale-factor` according to the trained values.
``` ```
...
net-scale-factor=1 net-scale-factor=1
...
``` ```
**NOTE**: The **YOLOX legacy** uses normalization on the image preprocess. It is important to change the `net-scale-factor` and `offsets` according to the trained values. **NOTE**: The **YOLOX legacy** uses normalization on the image preprocess. It is important to change the `net-scale-factor` and `offsets` according to the trained values.
@@ -148,8 +163,18 @@ net-scale-factor=1
Default: `mean = 0.485, 0.456, 0.406` and `std = 0.229, 0.224, 0.225` Default: `mean = 0.485, 0.456, 0.406` and `std = 0.229, 0.224, 0.225`
``` ```
...
net-scale-factor=0.0173520735727919486 net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -47,6 +47,24 @@ Generate the ONNX model file (example for YOLOv5s)
python3 export_yoloV5.py -w yolov5s.pt --simplify --dynamic python3 export_yoloV5.py -w yolov5s.pt --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 17. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 17.
``` ```
@@ -117,7 +135,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -126,18 +144,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yoloV5 file ### Edit the config_infer_primary_yoloV5 file
@@ -148,7 +160,6 @@ Edit the `config_infer_primary_yoloV5.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolov5s.onnx onnx-file=yolov5s.onnx
model-engine-file=yolov5s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -159,8 +170,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOv5** resizes the input with center padding. To get better accuracy, use **NOTE**: The **YOLOv5** resizes the input with center padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -47,6 +47,24 @@ Generate the ONNX model file (example for YOLOv6-S 4.0)
python3 export_yoloV6.py -w yolov6s.pt --simplify --dynamic python3 export_yoloV6.py -w yolov6s.pt --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 13. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 13.
``` ```
@@ -117,7 +135,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -126,18 +144,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yoloV6 file ### Edit the config_infer_primary_yoloV6 file
@@ -148,7 +160,6 @@ Edit the `config_infer_primary_yoloV6.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolov6s.onnx onnx-file=yolov6s.onnx
model-engine-file=yolov6s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -159,8 +170,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOv6** resizes the input with center padding. To get better accuracy, use **NOTE**: The **YOLOv6** resizes the input with center padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -49,6 +49,24 @@ Generate the ONNX model file (example for YOLOv7)
python3 export_yoloV7.py -w yolov7.pt --simplify --dynamic python3 export_yoloV7.py -w yolov7.pt --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 12. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 12.
``` ```
@@ -119,7 +137,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -128,18 +146,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yoloV7 file ### Edit the config_infer_primary_yoloV7 file
@@ -150,7 +162,6 @@ Edit the `config_infer_primary_yoloV7.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolov7.onnx onnx-file=yolov7.onnx
model-engine-file=yolov7.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -161,8 +172,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOv7** resizes the input with center padding. To get better accuracy, use **NOTE**: The **YOLOv7** resizes the input with center padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

View File

@@ -46,6 +46,24 @@ Generate the ONNX model file (example for YOLOv8s)
python3 export_yoloV8.py -w yolov8s.pt --simplify --dynamic python3 export_yoloV8.py -w yolov8s.pt --simplify --dynamic
``` ```
**NOTE**: To simplify the ONNX model
```
--simplify
```
**NOTE**: To use dynamic batch-size
```
--dynamic
```
**NOTE**: To use implicit batch-size (example for batch-size = 4)
```
--batch 4
```
**NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 16. **NOTE**: If you are using DeepStream 5.1, use opset 12 or lower. The default opset is 16.
``` ```
@@ -110,7 +128,7 @@ Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 5.1 on x86 platform * DeepStream 5.1 on x86 platform
``` ```
CUDA_VER=11.1 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform * DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
@@ -119,18 +137,12 @@ Open the `DeepStream-Yolo` folder and compile the lib
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 6.0.1 / 6.0 on Jetson platform * DeepStream 6.0.1 / 6.0 / 5.1 on Jetson platform
``` ```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
``` ```
* DeepStream 5.1 on Jetson platform
```
CUDA_VER=10.2 LEGACY=1 make -C nvdsinfer_custom_impl_Yolo
```
## ##
### Edit the config_infer_primary_yoloV8 file ### Edit the config_infer_primary_yoloV8 file
@@ -141,7 +153,6 @@ Edit the `config_infer_primary_yoloV8.txt` file according to your model (example
[property] [property]
... ...
onnx-file=yolov8s.onnx onnx-file=yolov8s.onnx
model-engine-file=yolov8s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
@@ -152,8 +163,18 @@ parse-bbox-func-name=NvDsInferParseYolo
**NOTE**: The **YOLOv8** resizes the input with center padding. To get better accuracy, use **NOTE**: The **YOLOv8** resizes the input with center padding. To get better accuracy, use
``` ```
...
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
...
```
**NOTE**: By default, the dynamic batch-size is set. To use implicit batch-size, uncomment the line
```
...
force-implicit-batch-dim=1
...
``` ```
## ##

88
docs/benchmarks.md Normal file
View File

@@ -0,0 +1,88 @@
# Benchmarks
### Config
```
board = NVIDIA Tesla V100 16GB (AWS: p3.2xlarge)
batch-size = 1
eval = val2017 (COCO)
sample = 1920x1080 video
```
**NOTE**: Used maintain-aspect-ratio=1 in config_infer file for Darknet (with letter_box=1) and PyTorch models.
### NMS config
- Eval
```
nms-iou-threshold = 0.6 (Darknet) / 0.65 (YOLOv5, YOLOv6, YOLOv7, YOLOR and YOLOX) / 0.7 (Paddle, YOLO-NAS, DAMO-YOLO, YOLOv8 and YOLOv7-u6)
pre-cluster-threshold = 0.001
topk = 300
```
- Test
```
nms-iou-threshold = 0.45
pre-cluster-threshold = 0.25
topk = 300
```
### Results
**NOTE**: * = PyTorch.
**NOTE**: ** = The YOLOv4 is trained with the trainvalno5k set, so the mAP is high on val2017 test.
**NOTE**: star = DAMO-YOLO model trained with distillation.
**NOTE**: The V100 GPU decoder max out at 625-635 FPS on DeepStream even using lighter models.
**NOTE**: The GPU bbox parser is a bit slower than CPU bbox parser on V100 GPU tests.
| DeepStream | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS<br />(without display) |
|:------------------:|:---------:|:----------:|:------------:|:-------:|:--------:|:--------------------------:|
| YOLO-NAS L | FP16 | 640 | 0.484 | 0.658 | 0.532 | 235.27 |
| YOLO-NAS M | FP16 | 640 | 0.480 | 0.651 | 0.524 | 287.39 |
| YOLO-NAS S | FP16 | 640 | 0.442 | 0.614 | 0.485 | 478.52 |
| PP-YOLOE+_x | FP16 | 640 | 0.528 | 0.705 | 0.579 | 121.17 |
| PP-YOLOE+_l | FP16 | 640 | 0.511 | 0.686 | 0.557 | 191.82 |
| PP-YOLOE+_m | FP16 | 640 | 0.483 | 0.658 | 0.528 | 264.39 |
| PP-YOLOE+_s | FP16 | 640 | 0.424 | 0.594 | 0.464 | 476.13 |
| PP-YOLOE-s (400) | FP16 | 640 | 0.423 | 0.589 | 0.463 | 461.23 |
| DAMO-YOLO-L star | FP16 | 640 | 0.502 | 0.674 | 0.551 | 176.93 |
| DAMO-YOLO-M star | FP16 | 640 | 0.485 | 0.656 | 0.530 | 242.24 |
| DAMO-YOLO-S star | FP16 | 640 | 0.460 | 0.631 | 0.502 | 385.09 |
| DAMO-YOLO-S | FP16 | 640 | 0.445 | 0.611 | 0.486 | 378.68 |
| DAMO-YOLO-T star | FP16 | 640 | 0.419 | 0.586 | 0.455 | 492.24 |
| DAMO-YOLO-Nl | FP16 | 416 | 0.392 | 0.559 | 0.423 | 483.73 |
| DAMO-YOLO-Nm | FP16 | 416 | 0.371 | 0.532 | 0.402 | 555.94 |
| DAMO-YOLO-Ns | FP16 | 416 | 0.312 | 0.460 | 0.335 | 627.67 |
| YOLOX-x | FP16 | 640 | 0.447 | 0.616 | 0.483 | 125.40 |
| YOLOX-l | FP16 | 640 | 0.430 | 0.598 | 0.466 | 193.10 |
| YOLOX-m | FP16 | 640 | 0.397 | 0.566 | 0.431 | 298.61 |
| YOLOX-s | FP16 | 640 | 0.335 | 0.502 | 0.365 | 522.05 |
| YOLOX-s legacy | FP16 | 640 | 0.375 | 0.569 | 0.407 | 518.52 |
| YOLOX-Darknet | FP16 | 640 | 0.414 | 0.595 | 0.453 | 212.88 |
| YOLOX-Tiny | FP16 | 640 | 0.274 | 0.427 | 0.292 | 633.95 |
| YOLOX-Nano | FP16 | 640 | 0.212 | 0.342 | 0.222 | 633.04 |
| YOLOv8x | FP16 | 640 | 0.499 | 0.669 | 0.545 | 130.49 |
| YOLOv8l | FP16 | 640 | 0.491 | 0.660 | 0.535 | 180.75 |
| YOLOv8m | FP16 | 640 | 0.468 | 0.637 | 0.510 | 278.08 |
| YOLOv8s | FP16 | 640 | 0.415 | 0.578 | 0.453 | 493.45 |
| YOLOv8n | FP16 | 640 | 0.343 | 0.492 | 0.373 | 627.43 |
| YOLOv7-u6 | FP16 | 640 | 0.484 | 0.652 | 0.530 | 193.54 |
| YOLOv7x* | FP16 | 640 | 0.496 | 0.679 | 0.536 | 155.07 |
| YOLOv7* | FP16 | 640 | 0.476 | 0.660 | 0.518 | 226.01 |
| YOLOv7-Tiny Leaky* | FP16 | 640 | 0.345 | 0.516 | 0.372 | 626.23 |
| YOLOv7-Tiny Leaky* | FP16 | 416 | 0.328 | 0.493 | 0.349 | 633.90 |
| YOLOv6-L 4.0 | FP16 | 640 | 0.490 | 0.671 | 0.535 | 178.41 |
| YOLOv6-M 4.0 | FP16 | 640 | 0.460 | 0.635 | 0.502 | 293.39 |
| YOLOv6-S 4.0 | FP16 | 640 | 0.416 | 0.585 | 0.453 | 513.90 |
| YOLOv6-N 4.0 | FP16 | 640 | 0.349 | 0.503 | 0.378 | 633.37 |
| YOLOv5x 7.0 | FP16 | 640 | 0.471 | 0.652 | 0.513 | 149.93 |
| YOLOv5l 7.0 | FP16 | 640 | 0.455 | 0.637 | 0.497 | 235.55 |
| YOLOv5m 7.0 | FP16 | 640 | 0.421 | 0.604 | 0.459 | 351.69 |
| YOLOv5s 7.0 | FP16 | 640 | 0.344 | 0.529 | 0.372 | 618.13 |
| YOLOv5n 7.0 | FP16 | 640 | 0.247 | 0.414 | 0.257 | 629.66 |

684
docs/dGPUInstalation.md Normal file
View File

@@ -0,0 +1,684 @@
# dGPU installation
To install the DeepStream on dGPU (x86 platform), without docker, we need to do some steps to prepare the computer.
<details><summary>DeepStream 6.2</summary>
### 1. Disable Secure Boot in BIOS
### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer-plugins-base1.0-dev libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev libjsoncpp-dev protobuf-compiler
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/525.105.17/NVIDIA-Linux-x86_64-525.105.17.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/525.105.17/NVIDIA-Linux-x86_64-525.105.17.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
### 6. Install TensorRT
```
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get install libnvinfer8=8.5.2-1+cuda11.8 libnvinfer-plugin8=8.5.2-1+cuda11.8 libnvparsers8=8.5.2-1+cuda11.8 libnvonnxparsers8=8.5.2-1+cuda11.8 libnvinfer-bin=8.5.2-1+cuda11.8 libnvinfer-dev=8.5.2-1+cuda11.8 libnvinfer-plugin-dev=8.5.2-1+cuda11.8 libnvparsers-dev=8.5.2-1+cuda11.8 libnvonnxparsers-dev=8.5.2-1+cuda11.8 libnvinfer-samples=8.5.2-1+cuda11.8 libcudnn8=8.7.0.84-1+cuda11.8 libcudnn8-dev=8.7.0.84-1+cuda11.8 python3-libnvinfer=8.5.2-1+cuda11.8 python3-libnvinfer-dev=8.5.2-1+cuda11.8
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* python3-libnvinfer* tensorrt
```
### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-getting-started) and install the DeepStream SDK
DeepStream 6.2 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.2_6.2.0-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.8 /usr/local/cuda
```
### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.1.1</summary>
### 1. Disable Secure Boot in BIOS
### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt-get install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer-plugins-base1.0-dev libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-515.65.01.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.4 GA for Ubuntu 20.04 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6 and 11.7 DEB local repo Package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu2004-cuda11.6-trt8.4.1.5-ga-20220604_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu2004-cuda11.6-trt8.4.1.5-ga-20220604/9a60d8bf.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.4.1-1+cuda11.6 libnvinfer-plugin8=8.4.1-1+cuda11.6 libnvparsers8=8.4.1-1+cuda11.6 libnvonnxparsers8=8.4.1-1+cuda11.6 libnvinfer-bin=8.4.1-1+cuda11.6 libnvinfer-dev=8.4.1-1+cuda11.6 libnvinfer-plugin-dev=8.4.1-1+cuda11.6 libnvparsers-dev=8.4.1-1+cuda11.6 libnvonnxparsers-dev=8.4.1-1+cuda11.6 libnvinfer-samples=8.4.1-1+cuda11.6 libcudnn8=8.4.1.50-1+cuda11.6 libcudnn8-dev=8.4.1.50-1+cuda11.6 python3-libnvinfer=8.4.1-1+cuda11.6 python3-libnvinfer-dev=8.4.1-1+cuda11.6
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-getting-started) and install the DeepStream SDK
DeepStream 6.1.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.1_6.1.1-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.7 /usr/local/cuda
```
### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.1</summary>
### 1. Disable Secure Boot in BIOS
### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install dkms
sudo apt-get install libssl1.1 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4 libyaml-cpp-dev
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
</blockquote></details>
### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.6.1/local_installers/cuda_11.6.1_510.47.03_linux.run
sudo sh cuda_11.6.1_510.47.03_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.2 GA Update 4 for Ubuntu 20.04 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4 and 11.5 DEB local repo Package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505/82307095.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.2.5-1+cuda11.4 libnvinfer-plugin8=8.2.5-1+cuda11.4 libnvparsers8=8.2.5-1+cuda11.4 libnvonnxparsers8=8.2.5-1+cuda11.4 libnvinfer-bin=8.2.5-1+cuda11.4 libnvinfer-dev=8.2.5-1+cuda11.4 libnvinfer-plugin-dev=8.2.5-1+cuda11.4 libnvparsers-dev=8.2.5-1+cuda11.4 libnvonnxparsers-dev=8.2.5-1+cuda11.4 libnvinfer-samples=8.2.5-1+cuda11.4 libnvinfer-doc=8.2.5-1+cuda11.4 libcudnn8-dev=8.4.0.27-1+cuda11.6 libcudnn8=8.4.0.27-1+cuda11.6
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-sdk-download-tesla-archived) and install the DeepStream SDK
DeepStream 6.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.1_6.1.0-1_amd64.deb
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.6 /usr/local/cuda
```
### 8. Reboot the computer
```
sudo reboot
```
</details>
<details><summary>DeepStream 6.0.1 / 6.0</summary>
### 1. Disable Secure Boot in BIOS
<details><summary>If you are using a laptop with newer Intel/AMD processors and your Graphics in Settings->Details->About tab is llvmpipe, please update the kernel.</summary>
```
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100_5.11.0-051100.202102142330_all.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-headers-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-image-unsigned-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11/amd64/linux-modules-5.11.0-051100-generic_5.11.0-051100.202102142330_amd64.deb
sudo dpkg -i *.deb
sudo reboot
```
</details>
### 2. Install dependencies
```
sudo apt-get update
sudo apt-get install gcc make git libtool autoconf autogen pkg-config cmake
sudo apt-get install python3 python3-dev python3-pip
sudo apt-get install libssl1.0.0 libgstreamer1.0-0 gstreamer1.0-tools gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstrtspserver-1.0-0 libjansson4
sudo apt-get install linux-headers-$(uname -r)
```
**NOTE**: Install DKMS only if you are using the default Ubuntu kernel
```
sudo apt-get install dkms
```
**NOTE**: Purge all NVIDIA driver, CUDA, etc (replace $CUDA_PATH to your CUDA path)
```
sudo nvidia-uninstall
sudo $CUDA_PATH/bin/cuda-uninstaller
sudo apt-get remove --purge '*nvidia*'
sudo apt-get remove --purge '*cuda*'
sudo apt-get remove --purge '*cudnn*'
sudo apt-get remove --purge '*tensorrt*'
sudo apt autoremove --purge && sudo apt autoclean && sudo apt clean
```
### 3. Install CUDA Keyring
```
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
```
### 4. Download and install NVIDIA Driver
<details><summary>TITAN, GeForce RTX / GTX series and RTX / Quadro series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/470.129.06/NVIDIA-Linux-x86_64-470.129.06.run
```
<blockquote><details><summary>Laptop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: This step will disable the nouveau drivers.
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
**NOTE**: If you are using a laptop with NVIDIA Optimius, run
```
sudo apt-get install nvidia-prime
sudo prime-select nvidia
```
</details></blockquote>
<blockquote><details><summary>Desktop</summary>
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: This step will disable the nouveau drivers.
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
* Reboot
```
sudo reboot
```
* Install
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
</details></blockquote>
</blockquote></details>
<details><summary>Data center / Tesla series</summary><blockquote>
- Download
```
wget https://us.download.nvidia.com/tesla/470.129.06/NVIDIA-Linux-x86_64-470.129.06.run
```
* Run
```
sudo sh NVIDIA-Linux-x86_64-470.129.06.run --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
```
**NOTE**: Remove --dkms flag if you installed the 5.11.0 kernel.
</blockquote></details>
### 5. Download and install CUDA
```
wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda_11.4.1_470.57.02_linux.run
sudo sh cuda_11.4.1_470.57.02_linux.run --silent --toolkit
```
* Export environment variables
```
echo $'export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}\nexport LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc && source ~/.bashrc
```
### 6. Download from [NVIDIA website](https://developer.nvidia.com/nvidia-tensorrt-8x-download) and install the TensorRT
TensorRT 8.0.1 GA for Ubuntu 18.04 and CUDA 11.3 DEB local repo package
```
sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626/7fa2af80.pub
sudo apt-get update
sudo apt-get install libnvinfer8=8.0.1-1+cuda11.3 libnvinfer-plugin8=8.0.1-1+cuda11.3 libnvparsers8=8.0.1-1+cuda11.3 libnvonnxparsers8=8.0.1-1+cuda11.3 libnvinfer-bin=8.0.1-1+cuda11.3 libnvinfer-dev=8.0.1-1+cuda11.3 libnvinfer-plugin-dev=8.0.1-1+cuda11.3 libnvparsers-dev=8.0.1-1+cuda11.3 libnvonnxparsers-dev=8.0.1-1+cuda11.3 libnvinfer-samples=8.0.1-1+cuda11.3 libnvinfer-doc=8.0.1-1+cuda11.3 libcudnn8-dev=8.2.1.32-1+cuda11.3 libcudnn8=8.2.1.32-1+cuda11.3
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn8* tensorrt
```
### 7. Download from [NVIDIA website](https://developer.nvidia.com/deepstream-sdk-download-tesla-archived) and install the DeepStream SDK
* DeepStream 6.0.1 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.0_6.0.1-1_amd64.deb
```
* DeepStream 6.0 for Servers and Workstations (.deb)
```
sudo apt-get install ./deepstream-6.0_6.0.0-1_amd64.deb
```
* Run
```
rm ${HOME}/.cache/gstreamer-1.0/registry.x86_64.bin
sudo ln -snf /usr/local/cuda-11.4 /usr/local/cuda
```
### 8. Reboot the computer
```
sudo reboot
```
</details>

View File

@@ -33,9 +33,9 @@ ifeq ($(OPENCV),)
OPENCV=0 OPENCV=0
endif endif
LEGACY?= GRAPH?=
ifeq ($(LEGACY),) ifeq ($(GRAPH),)
LEGACY=0 GRAPH=0
endif endif
CC:= g++ CC:= g++
@@ -50,13 +50,13 @@ ifeq ($(OPENCV), 1)
LIBS+= $(shell pkg-config --libs opencv4 2> /dev/null || pkg-config --libs opencv) LIBS+= $(shell pkg-config --libs opencv4 2> /dev/null || pkg-config --libs opencv)
endif endif
ifeq ($(LEGACY), 1) ifeq ($(GRAPH), 1)
COMMON+= -DLEGACY COMMON+= -GRAPH
endif endif
CUFLAGS:= -I/opt/nvidia/deepstream/deepstream/sources/includes -I/usr/local/cuda-$(CUDA_VER)/include CUFLAGS:= -I/opt/nvidia/deepstream/deepstream/sources/includes -I/usr/local/cuda-$(CUDA_VER)/include
LIBS+= -lnvinfer_plugin -lnvinfer -lnvparsers -L/usr/local/cuda-$(CUDA_VER)/lib64 -lcudart -lcublas -lstdc++fs LIBS+= -lnvinfer_plugin -lnvinfer -lnvparsers -lnvonnxparser -L/usr/local/cuda-$(CUDA_VER)/lib64 -lcudart -lcublas -lstdc++fs
LFLAGS:= -shared -Wl,--start-group $(LIBS) -Wl,--end-group LFLAGS:= -shared -Wl,--start-group $(LIBS) -Wl,--end-group
INCS:= $(wildcard *.h) INCS:= $(wildcard *.h)

View File

@@ -8,17 +8,18 @@
#include <fstream> #include <fstream>
#include <iterator> #include <iterator>
Int8EntropyCalibrator2::Int8EntropyCalibrator2(const int& batchsize, const int& channels, const int& height, Int8EntropyCalibrator2::Int8EntropyCalibrator2(const int& batchSize, const int& channels, const int& height, const int& width,
const int& width, const int& letterbox, const std::string& imgPath, const float& scaleFactor, const float* offsets, const std::string& imgPath, const std::string& calibTablePath) :
const std::string& calibTablePath) : batchSize(batchsize), inputC(channels), inputH(height), inputW(width), batchSize(batchSize), inputC(channels), inputH(height), inputW(width), scaleFactor(scaleFactor), offsets(offsets),
letterBox(letterbox), calibTablePath(calibTablePath), imageIndex(0) calibTablePath(calibTablePath), imageIndex(0)
{ {
inputCount = batchsize * channels * height * width; inputCount = batchSize * channels * height * width;
std::fstream f(imgPath); std::fstream f(imgPath);
if (f.is_open()) { if (f.is_open()) {
std::string temp; std::string temp;
while (std::getline(f, temp)) while (std::getline(f, temp)) {
imgPaths.push_back(temp); imgPaths.push_back(temp);
}
} }
batchData = new float[inputCount]; batchData = new float[inputCount];
CUDA_CHECK(cudaMalloc(&deviceInput, inputCount * sizeof(float))); CUDA_CHECK(cudaMalloc(&deviceInput, inputCount * sizeof(float)));
@@ -27,8 +28,9 @@ Int8EntropyCalibrator2::Int8EntropyCalibrator2(const int& batchsize, const int&
Int8EntropyCalibrator2::~Int8EntropyCalibrator2() Int8EntropyCalibrator2::~Int8EntropyCalibrator2()
{ {
CUDA_CHECK(cudaFree(deviceInput)); CUDA_CHECK(cudaFree(deviceInput));
if (batchData) if (batchData) {
delete[] batchData; delete[] batchData;
}
} }
int int
@@ -40,24 +42,33 @@ Int8EntropyCalibrator2::getBatchSize() const noexcept
bool bool
Int8EntropyCalibrator2::getBatch(void** bindings, const char** names, int nbBindings) noexcept Int8EntropyCalibrator2::getBatch(void** bindings, const char** names, int nbBindings) noexcept
{ {
if (imageIndex + batchSize > uint(imgPaths.size())) if (imageIndex + batchSize > uint(imgPaths.size())) {
return false; return false;
}
float* ptr = batchData; float* ptr = batchData;
for (size_t i = imageIndex; i < imageIndex + batchSize; ++i) { for (size_t i = imageIndex; i < imageIndex + batchSize; ++i) {
cv::Mat img = cv::imread(imgPaths[i], cv::IMREAD_COLOR); cv::Mat img = cv::imread(imgPaths[i]);
std::vector<float> inputData = prepareImage(img, inputC, inputH, inputW, letterBox); if (img.empty()){
std::cerr << "Failed to read image for calibration" << std::endl;
return false;
}
int len = (int) (inputData.size()); std::vector<float> inputData = prepareImage(img, inputC, inputH, inputW, scaleFactor, offsets);
size_t len = inputData.size();
memcpy(ptr, inputData.data(), len * sizeof(float)); memcpy(ptr, inputData.data(), len * sizeof(float));
ptr += inputData.size(); ptr += inputData.size();
std::cout << "Load image: " << imgPaths[i] << std::endl; std::cout << "Load image: " << imgPaths[i] << std::endl;
std::cout << "Progress: " << (i + 1)*100. / imgPaths.size() << "%" << std::endl; std::cout << "Progress: " << (i + 1) * 100. / imgPaths.size() << "%" << std::endl;
} }
imageIndex += batchSize; imageIndex += batchSize;
CUDA_CHECK(cudaMemcpy(deviceInput, batchData, inputCount * sizeof(float), cudaMemcpyHostToDevice)); CUDA_CHECK(cudaMemcpy(deviceInput, batchData, inputCount * sizeof(float), cudaMemcpyHostToDevice));
bindings[0] = deviceInput; bindings[0] = deviceInput;
return true; return true;
} }
@@ -67,8 +78,9 @@ Int8EntropyCalibrator2::readCalibrationCache(std::size_t &length) noexcept
calibrationCache.clear(); calibrationCache.clear();
std::ifstream input(calibTablePath, std::ios::binary); std::ifstream input(calibTablePath, std::ios::binary);
input >> std::noskipws; input >> std::noskipws;
if (readCache && input.good()) if (readCache && input.good()) {
std::copy(std::istream_iterator<char>(input), std::istream_iterator<char>(), std::back_inserter(calibrationCache)); std::copy(std::istream_iterator<char>(input), std::istream_iterator<char>(), std::back_inserter(calibrationCache));
}
length = calibrationCache.size(); length = calibrationCache.size();
return length ? calibrationCache.data() : nullptr; return length ? calibrationCache.data() : nullptr;
} }
@@ -81,43 +93,24 @@ Int8EntropyCalibrator2::writeCalibrationCache(const void* cache, std::size_t len
} }
std::vector<float> std::vector<float>
prepareImage(cv::Mat& img, int input_c, int input_h, int input_w, int letter_box) prepareImage(cv::Mat& img, int input_c, int input_h, int input_w, float scaleFactor, const float* offsets)
{ {
cv::Mat out; cv::Mat out;
cv::cvtColor(img, out, cv::COLOR_BGR2RGB);
int image_w = img.cols; int image_w = img.cols;
int image_h = img.rows; int image_h = img.rows;
if (image_w != input_w || image_h != input_h) {
if (letter_box == 1) {
float ratio_w = (float) image_w / (float) input_w;
float ratio_h = (float) image_h / (float) input_h;
if (ratio_w > ratio_h) {
int new_width = input_w * ratio_h;
int x = (image_w - new_width) / 2;
cv::Rect roi(abs(x), 0, new_width, image_h);
out = img(roi);
}
else if (ratio_w < ratio_h) {
int new_height = input_h * ratio_w;
int y = (image_h - new_height) / 2;
cv::Rect roi(0, abs(y), image_w, new_height);
out = img(roi);
}
else
out = img;
cv::resize(out, out, cv::Size(input_w, input_h), 0, 0, cv::INTER_CUBIC);
}
else {
cv::resize(img, out, cv::Size(input_w, input_h), 0, 0, cv::INTER_CUBIC);
}
cv::cvtColor(out, out, cv::COLOR_BGR2RGB);
}
else
cv::cvtColor(img, out, cv::COLOR_BGR2RGB);
if (input_c == 3) if (image_w != input_w || image_h != input_h) {
out.convertTo(out, CV_32FC3, 1.0 / 255.0); float resizeFactor = std::max(input_w / (float) image_w, input_h / (float) img.rows);
else cv::resize(out, out, cv::Size(0, 0), resizeFactor, resizeFactor, cv::INTER_CUBIC);
out.convertTo(out, CV_32FC1, 1.0 / 255.0); cv::Rect crop(cv::Point(0.5 * (out.cols - input_w), 0.5 * (out.rows - input_h)), cv::Size(input_w, input_h));
out = out(crop);
}
out.convertTo(out, CV_32F, scaleFactor);
cv::subtract(out, cv::Scalar(offsets[2] / 255, offsets[1] / 255, offsets[0] / 255), out, cv::noArray(), -1);
std::vector<cv::Mat> input_channels(input_c); std::vector<cv::Mat> input_channels(input_c);
cv::split(out, input_channels); cv::split(out, input_channels);

View File

@@ -22,8 +22,8 @@
class Int8EntropyCalibrator2 : public nvinfer1::IInt8EntropyCalibrator2 { class Int8EntropyCalibrator2 : public nvinfer1::IInt8EntropyCalibrator2 {
public: public:
Int8EntropyCalibrator2(const int& batchsize, const int& channels, const int& height, const int& width, Int8EntropyCalibrator2(const int& batchSize, const int& channels, const int& height, const int& width,
const int& letterbox, const std::string& imgPath, const std::string& calibTablePath); const float& scaleFactor, const float* offsets, const std::string& imgPath, const std::string& calibTablePath);
virtual ~Int8EntropyCalibrator2(); virtual ~Int8EntropyCalibrator2();
@@ -41,6 +41,8 @@ class Int8EntropyCalibrator2 : public nvinfer1::IInt8EntropyCalibrator2 {
int inputH; int inputH;
int inputW; int inputW;
int letterBox; int letterBox;
float scaleFactor;
const float* offsets;
std::string calibTablePath; std::string calibTablePath;
size_t imageIndex; size_t imageIndex;
size_t inputCount; size_t inputCount;
@@ -51,6 +53,7 @@ class Int8EntropyCalibrator2 : public nvinfer1::IInt8EntropyCalibrator2 {
std::vector<char> calibrationCache; std::vector<char> calibrationCache;
}; };
std::vector<float> prepareImage(cv::Mat& img, int input_c, int input_h, int input_w, int letter_box); std::vector<float> prepareImage(cv::Mat& img, int input_c, int input_h, int input_w, float scaleFactor,
const float* offsets);
#endif //CALIBRATOR_H #endif //CALIBRATOR_H

View File

@@ -28,7 +28,7 @@ implicitLayer(int layerIdx, std::map<std::string, std::string>& block, std::vect
convWt.values = val; convWt.values = val;
trtWeights.push_back(convWt); trtWeights.push_back(convWt);
nvinfer1::IConstantLayer* implicit = network->addConstant(nvinfer1::Dims{3, {filters, 1, 1}}, convWt); nvinfer1::IConstantLayer* implicit = network->addConstant(nvinfer1::Dims{4, {1, filters, 1, 1}}, convWt);
assert(implicit != nullptr); assert(implicit != nullptr);
std::string implicitLayerName = block.at("type") + "_" + std::to_string(layerIdx); std::string implicitLayerName = block.at("type") + "_" + std::to_string(layerIdx);
implicit->setName(implicitLayerName.c_str()); implicit->setName(implicitLayerName.c_str());

View File

@@ -14,46 +14,100 @@ reorgLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::IT
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
assert(block.at("type") == "reorg3d"); assert(block.at("type") == "reorg" || block.at("type") == "reorg3d");
int stride = 1;
if(block.find("stride") != block.end()) {
stride = std::stoi(block.at("stride"));
}
nvinfer1::Dims inputDims = input->getDimensions(); nvinfer1::Dims inputDims = input->getDimensions();
nvinfer1::ISliceLayer *slice1 = network->addSlice(*input, nvinfer1::Dims{3, {0, 0, 0}}, if (block.at("type") == "reorg3d") {
nvinfer1::Dims{3, {inputDims.d[0], inputDims.d[1] / 2, inputDims.d[2] / 2}}, nvinfer1::Dims{3, {1, 2, 2}}); nvinfer1::ISliceLayer* slice1 = network->addSlice(*input, nvinfer1::Dims{4, {0, 0, 0, 0}},
assert(slice1 != nullptr); nvinfer1::Dims{4, {inputDims.d[0], inputDims.d[1], inputDims.d[2] / stride, inputDims.d[3] / stride}},
std::string slice1LayerName = "slice1_" + std::to_string(layerIdx); nvinfer1::Dims{4, {1, 1, stride, stride}});
slice1->setName(slice1LayerName.c_str()); assert(slice1 != nullptr);
std::string slice1LayerName = "slice1_" + std::to_string(layerIdx);
slice1->setName(slice1LayerName.c_str());
nvinfer1::ISliceLayer *slice2 = network->addSlice(*input, nvinfer1::Dims{3, {0, 1, 0}}, nvinfer1::ISliceLayer* slice2 = network->addSlice(*input, nvinfer1::Dims{4, {0, 0, 0, 1}},
nvinfer1::Dims{3, {inputDims.d[0], inputDims.d[1] / 2, inputDims.d[2] / 2}}, nvinfer1::Dims{3, {1, 2, 2}}); nvinfer1::Dims{4, {inputDims.d[0], inputDims.d[1], inputDims.d[2] / stride, inputDims.d[3] / stride}},
assert(slice2 != nullptr); nvinfer1::Dims{4, {1, 1, stride, stride}});
std::string slice2LayerName = "slice2_" + std::to_string(layerIdx); assert(slice2 != nullptr);
slice2->setName(slice2LayerName.c_str()); std::string slice2LayerName = "slice2_" + std::to_string(layerIdx);
slice2->setName(slice2LayerName.c_str());
nvinfer1::ISliceLayer *slice3 = network->addSlice(*input, nvinfer1::Dims{3, {0, 0, 1}}, nvinfer1::ISliceLayer* slice3 = network->addSlice(*input, nvinfer1::Dims{4, {0, 0, 1, 0}},
nvinfer1::Dims{3, {inputDims.d[0], inputDims.d[1] / 2, inputDims.d[2] / 2}}, nvinfer1::Dims{3, {1, 2, 2}}); nvinfer1::Dims{4, {inputDims.d[0], inputDims.d[1], inputDims.d[2] / stride, inputDims.d[3] / stride}},
assert(slice3 != nullptr); nvinfer1::Dims{4, {1, 1, stride, stride}});
std::string slice3LayerName = "slice3_" + std::to_string(layerIdx); assert(slice3 != nullptr);
slice3->setName(slice3LayerName.c_str()); std::string slice3LayerName = "slice3_" + std::to_string(layerIdx);
slice3->setName(slice3LayerName.c_str());
nvinfer1::ISliceLayer *slice4 = network->addSlice(*input, nvinfer1::Dims{3, {0, 1, 1}}, nvinfer1::ISliceLayer* slice4 = network->addSlice(*input, nvinfer1::Dims{4, {0, 0, 1, 1}},
nvinfer1::Dims{3, {inputDims.d[0], inputDims.d[1] / 2, inputDims.d[2] / 2}}, nvinfer1::Dims{3, {1, 2, 2}}); nvinfer1::Dims{4, {inputDims.d[0], inputDims.d[1], inputDims.d[2] / stride, inputDims.d[3] / stride}},
assert(slice4 != nullptr); nvinfer1::Dims{4, {1, 1, stride, stride}});
std::string slice4LayerName = "slice4_" + std::to_string(layerIdx); assert(slice4 != nullptr);
slice4->setName(slice4LayerName.c_str()); std::string slice4LayerName = "slice4_" + std::to_string(layerIdx);
slice4->setName(slice4LayerName.c_str());
std::vector<nvinfer1::ITensor*> concatInputs; std::vector<nvinfer1::ITensor*> concatInputs;
concatInputs.push_back(slice1->getOutput(0)); concatInputs.push_back(slice1->getOutput(0));
concatInputs.push_back(slice2->getOutput(0)); concatInputs.push_back(slice2->getOutput(0));
concatInputs.push_back(slice3->getOutput(0)); concatInputs.push_back(slice3->getOutput(0));
concatInputs.push_back(slice4->getOutput(0)); concatInputs.push_back(slice4->getOutput(0));
nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size()); nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size());
assert(concat != nullptr); assert(concat != nullptr);
std::string concatLayerName = "concat_" + std::to_string(layerIdx); std::string concatLayerName = "concat_" + std::to_string(layerIdx);
concat->setName(concatLayerName.c_str()); concat->setName(concatLayerName.c_str());
concat->setAxis(0); concat->setAxis(0);
output = concat->getOutput(0); output = concat->getOutput(0);
}
else {
nvinfer1::IShuffleLayer* shuffle1 = network->addShuffle(*input);
assert(shuffle1 != nullptr);
std::string shuffle1LayerName = "shuffle1_" + std::to_string(layerIdx);
shuffle1->setName(shuffle1LayerName.c_str());
nvinfer1::Dims reshapeDims1{6, {inputDims.d[0], inputDims.d[1] / (stride * stride), inputDims.d[2], stride,
inputDims.d[3], stride}};
shuffle1->setReshapeDimensions(reshapeDims1);
nvinfer1::Permutation permutation1{{0, 1, 2, 4, 3, 5}};
shuffle1->setSecondTranspose(permutation1);
output = shuffle1->getOutput(0);
nvinfer1::IShuffleLayer* shuffle2 = network->addShuffle(*output);
assert(shuffle2 != nullptr);
std::string shuffle2LayerName = "shuffle2_" + std::to_string(layerIdx);
shuffle2->setName(shuffle2LayerName.c_str());
nvinfer1::Dims reshapeDims2{4, {inputDims.d[0], inputDims.d[1] / (stride * stride), inputDims.d[2] * inputDims.d[3],
stride * stride}};
shuffle2->setReshapeDimensions(reshapeDims2);
nvinfer1::Permutation permutation2{{0, 1, 3, 2}};
shuffle2->setSecondTranspose(permutation2);
output = shuffle2->getOutput(0);
nvinfer1::IShuffleLayer* shuffle3 = network->addShuffle(*output);
assert(shuffle3 != nullptr);
std::string shuffle3LayerName = "shuffle3_" + std::to_string(layerIdx);
shuffle3->setName(shuffle3LayerName.c_str());
nvinfer1::Dims reshapeDims3{4, {inputDims.d[0], inputDims.d[1] / (stride * stride), stride * stride,
inputDims.d[2] * inputDims.d[3]}};
shuffle3->setReshapeDimensions(reshapeDims3);
nvinfer1::Permutation permutation3{{0, 2, 1, 3}};
shuffle3->setSecondTranspose(permutation3);
output = shuffle3->getOutput(0);
nvinfer1::IShuffleLayer* shuffle4 = network->addShuffle(*output);
assert(shuffle4 != nullptr);
std::string shuffle4LayerName = "shuffle4_" + std::to_string(layerIdx);
shuffle4->setName(shuffle4LayerName.c_str());
nvinfer1::Dims reshapeDims4{4, {inputDims.d[0], inputDims.d[1] * stride * stride, inputDims.d[2] / stride,
inputDims.d[3] / stride}};
shuffle4->setReshapeDimensions(reshapeDims4);
output = shuffle4->getOutput(0);
}
return output; return output;
} }

View File

@@ -24,29 +24,36 @@ routeLayer(int layerIdx, std::string& layers, std::map<std::string, std::string>
} }
if (lastPos < strLayers.length()) { if (lastPos < strLayers.length()) {
std::string lastV = trim(strLayers.substr(lastPos)); std::string lastV = trim(strLayers.substr(lastPos));
if (!lastV.empty()) if (!lastV.empty()) {
idxLayers.push_back(std::stoi(lastV)); idxLayers.push_back(std::stoi(lastV));
}
} }
assert (!idxLayers.empty()); assert(!idxLayers.empty());
std::vector<nvinfer1::ITensor*> concatInputs; std::vector<nvinfer1::ITensor*> concatInputs;
for (uint i = 0; i < idxLayers.size(); ++i) { for (uint i = 0; i < idxLayers.size(); ++i) {
if (idxLayers[i] < 0) if (idxLayers[i] < 0) {
idxLayers[i] = tensorOutputs.size() + idxLayers[i]; idxLayers[i] = tensorOutputs.size() + idxLayers[i];
assert (idxLayers[i] >= 0 && idxLayers[i] < (int)tensorOutputs.size()); }
assert(idxLayers[i] >= 0 && idxLayers[i] < (int)tensorOutputs.size());
concatInputs.push_back(tensorOutputs[idxLayers[i]]); concatInputs.push_back(tensorOutputs[idxLayers[i]]);
if (i < idxLayers.size() - 1) if (i < idxLayers.size() - 1) {
layers += std::to_string(idxLayers[i]) + ", "; layers += std::to_string(idxLayers[i]) + ", ";
}
} }
layers += std::to_string(idxLayers[idxLayers.size() - 1]); layers += std::to_string(idxLayers[idxLayers.size() - 1]);
if (concatInputs.size() == 1) if (concatInputs.size() == 1) {
output = concatInputs[0]; output = concatInputs[0];
}
else { else {
int axis = 0; int axis = 1;
if (block.find("axis") != block.end()) if (block.find("axis") != block.end()) {
axis = std::stoi(block.at("axis")); axis += std::stoi(block.at("axis"));
if (axis < 0) std::cout << axis << std::endl;
axis = concatInputs[0]->getDimensions().nbDims + axis; }
if (axis < 0) {
axis += concatInputs[0]->getDimensions().nbDims;
}
nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size()); nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size());
assert(concat != nullptr); assert(concat != nullptr);
@@ -60,10 +67,11 @@ routeLayer(int layerIdx, std::string& layers, std::map<std::string, std::string>
nvinfer1::Dims prevTensorDims = output->getDimensions(); nvinfer1::Dims prevTensorDims = output->getDimensions();
int groups = stoi(block.at("groups")); int groups = stoi(block.at("groups"));
int group_id = stoi(block.at("group_id")); int group_id = stoi(block.at("group_id"));
int startSlice = (prevTensorDims.d[0] / groups) * group_id; int startSlice = (prevTensorDims.d[1] / groups) * group_id;
int channelSlice = (prevTensorDims.d[0] / groups); int channelSlice = (prevTensorDims.d[1] / groups);
nvinfer1::ISliceLayer* slice = network->addSlice(*output, nvinfer1::Dims{3, {startSlice, 0, 0}}, nvinfer1::ISliceLayer* slice = network->addSlice(*output, nvinfer1::Dims{4, {0, startSlice, 0, 0}},
nvinfer1::Dims{3, {channelSlice, prevTensorDims.d[1], prevTensorDims.d[2]}}, nvinfer1::Dims{3, {1, 1, 1}}); nvinfer1::Dims{4, {prevTensorDims.d[0], channelSlice, prevTensorDims.d[2], prevTensorDims.d[3]}},
nvinfer1::Dims{4, {1, 1, 1, 1}});
assert(slice != nullptr); assert(slice != nullptr);
std::string sliceLayerName = "slice_" + std::to_string(layerIdx); std::string sliceLayerName = "slice_" + std::to_string(layerIdx);
slice->setName(sliceLayerName.c_str()); slice->setName(sliceLayerName.c_str());

View File

@@ -17,8 +17,8 @@ shortcutLayer(int layerIdx, std::string activation, std::string inputVol, std::s
assert(block.at("type") == "shortcut"); assert(block.at("type") == "shortcut");
if (inputVol != shortcutVol) { if (inputVol != shortcutVol) {
nvinfer1::ISliceLayer* slice = network->addSlice(*shortcutInput, nvinfer1::Dims{3, {0, 0, 0}}, input->getDimensions(), nvinfer1::ISliceLayer* slice = network->addSlice(*shortcutInput, nvinfer1::Dims{4, {0, 0, 0, 0}}, input->getDimensions(),
nvinfer1::Dims{3, {1, 1, 1}}); nvinfer1::Dims{4, {1, 1, 1, 1}});
assert(slice != nullptr); assert(slice != nullptr);
std::string sliceLayerName = "slice_" + std::to_string(layerIdx); std::string sliceLayerName = "slice_" + std::to_string(layerIdx);
slice->setName(sliceLayerName.c_str()); slice->setName(sliceLayerName.c_str());

View File

@@ -18,14 +18,14 @@ upsampleLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1:
int stride = std::stoi(block.at("stride")); int stride = std::stoi(block.at("stride"));
float scale[3] = {1, static_cast<float>(stride), static_cast<float>(stride)}; float scale[4] = {1, 1, static_cast<float>(stride), static_cast<float>(stride)};
nvinfer1::IResizeLayer* resize = network->addResize(*input); nvinfer1::IResizeLayer* resize = network->addResize(*input);
assert(resize != nullptr); assert(resize != nullptr);
std::string resizeLayerName = "upsample_" + std::to_string(layerIdx); std::string resizeLayerName = "upsample_" + std::to_string(layerIdx);
resize->setName(resizeLayerName.c_str()); resize->setName(resizeLayerName.c_str());
resize->setResizeMode(nvinfer1::ResizeMode::kNEAREST); resize->setResizeMode(nvinfer1::ResizeMode::kNEAREST);
resize->setScales(scale, 3); resize->setScales(scale, 4);
output = resize->getOutput(0); output = resize->getOutput(0);
return output; return output;

View File

@@ -35,39 +35,56 @@
static bool static bool
getYoloNetworkInfo(NetworkInfo& networkInfo, const NvDsInferContextInitParams* initParams) getYoloNetworkInfo(NetworkInfo& networkInfo, const NvDsInferContextInitParams* initParams)
{ {
std::string yoloCfg = initParams->customNetworkConfigFilePath; std::string onnxWtsFilePath = initParams->onnxFilePath;
std::string yoloType; std::string darknetWtsFilePath = initParams->modelFilePath;
std::string darknetCfgFilePath = initParams->customNetworkConfigFilePath;
std::transform(yoloCfg.begin(), yoloCfg.end(), yoloCfg.begin(), [] (uint8_t c) { std::string yoloType = onnxWtsFilePath != "" ? "onnx" : "darknet";
std::string modelName = yoloType == "onnx" ?
onnxWtsFilePath.substr(0, onnxWtsFilePath.find(".onnx")).substr(onnxWtsFilePath.rfind("/") + 1) :
darknetWtsFilePath.substr(0, darknetWtsFilePath.find(".weights")).substr(darknetWtsFilePath.rfind("/") + 1);
std::transform(modelName.begin(), modelName.end(), modelName.begin(), [] (uint8_t c) {
return std::tolower(c); return std::tolower(c);
}); });
yoloType = yoloCfg.substr(0, yoloCfg.find(".cfg"));
networkInfo.inputBlobName = "input"; networkInfo.inputBlobName = "input";
networkInfo.networkType = yoloType; networkInfo.networkType = yoloType;
networkInfo.configFilePath = initParams->customNetworkConfigFilePath; networkInfo.modelName = modelName;
networkInfo.wtsFilePath = initParams->modelFilePath; networkInfo.onnxWtsFilePath = onnxWtsFilePath;
networkInfo.darknetWtsFilePath = darknetWtsFilePath;
networkInfo.darknetCfgFilePath = darknetCfgFilePath;
networkInfo.batchSize = initParams->maxBatchSize;
networkInfo.implicitBatch = initParams->forceImplicitBatchDimension;
networkInfo.int8CalibPath = initParams->int8CalibrationFilePath; networkInfo.int8CalibPath = initParams->int8CalibrationFilePath;
networkInfo.deviceType = (initParams->useDLA ? "kDLA" : "kGPU"); networkInfo.deviceType = initParams->useDLA ? "kDLA" : "kGPU";
networkInfo.numDetectedClasses = initParams->numDetectedClasses; networkInfo.numDetectedClasses = initParams->numDetectedClasses;
networkInfo.clusterMode = initParams->clusterMode; networkInfo.clusterMode = initParams->clusterMode;
networkInfo.scaleFactor = initParams->networkScaleFactor;
networkInfo.offsets = initParams->offsets;
if (initParams->networkMode == 0) if (initParams->networkMode == NvDsInferNetworkMode_FP32)
networkInfo.networkMode = "FP32"; networkInfo.networkMode = "FP32";
else if (initParams->networkMode == 1) else if (initParams->networkMode == NvDsInferNetworkMode_INT8)
networkInfo.networkMode = "INT8"; networkInfo.networkMode = "INT8";
else if (initParams->networkMode == 2) else if (initParams->networkMode == NvDsInferNetworkMode_FP16)
networkInfo.networkMode = "FP16"; networkInfo.networkMode = "FP16";
if (networkInfo.configFilePath.empty() || networkInfo.wtsFilePath.empty()) { if (yoloType == "onnx") {
std::cerr << "YOLO config file or weights file is not specified\n" << std::endl; if (!fileExists(networkInfo.onnxWtsFilePath)) {
return false; std::cerr << "ONNX model file does not exist\n" << std::endl;
return false;
}
} }
else {
if (!fileExists(networkInfo.configFilePath) || !fileExists(networkInfo.wtsFilePath)) { if (!fileExists(networkInfo.darknetWtsFilePath)) {
std::cerr << "YOLO config file or weights file is not exist\n" << std::endl; std::cerr << "Darknet weights file does not exist\n" << std::endl;
return false; return false;
}
else if (!fileExists(networkInfo.darknetCfgFilePath)) {
std::cerr << "Darknet cfg file does not exist\n" << std::endl;
return false;
}
} }
return true; return true;
@@ -99,7 +116,7 @@ NvDsInferYoloCudaEngineGet(nvinfer1::IBuilder* const builder, nvinfer1::IBuilder
Yolo yolo(networkInfo); Yolo yolo(networkInfo);
cudaEngine = yolo.createEngine(builder, builderConfig); cudaEngine = yolo.createEngine(builder, builderConfig);
if (cudaEngine == nullptr) { if (cudaEngine == nullptr) {
std::cerr << "Failed to build CUDA engine on " << networkInfo.configFilePath << std::endl; std::cerr << "Failed to build CUDA engine" << std::endl;
return false; return false;
} }

View File

@@ -26,10 +26,10 @@
#include "nvdsinfer_custom_impl.h" #include "nvdsinfer_custom_impl.h"
bool bool
NvDsInferInitializeInputLayers(std::vector<NvDsInferLayerInfo> const &inputLayersInfo, NvDsInferInitializeInputLayers(std::vector<NvDsInferLayerInfo> const& inputLayersInfo,
NvDsInferNetworkInfo const &networkInfo, unsigned int maxBatchSize) NvDsInferNetworkInfo const& networkInfo, unsigned int maxBatchSize)
{ {
float *scaleFactor = (float *) inputLayersInfo[0].buffer; float* scaleFactor = (float*) inputLayersInfo[0].buffer;
for (unsigned int i = 0; i < maxBatchSize; i++) { for (unsigned int i = 0; i < maxBatchSize; i++) {
scaleFactor[i * 2 + 0] = 1.0; scaleFactor[i * 2 + 0] = 1.0;
scaleFactor[i * 2 + 1] = 1.0; scaleFactor[i * 2 + 1] = 1.0;

View File

@@ -73,22 +73,22 @@ addBBoxProposal(const float bx1, const float by1, const float bx2, const float b
} }
static std::vector<NvDsInferParseObjectInfo> static std::vector<NvDsInferParseObjectInfo>
decodeTensorYolo(const float* detection, const uint& outputSize, const uint& netW, const uint& netH, decodeTensorYolo(const float* boxes, const float* scores, const int* classes, const uint& outputSize, const uint& netW,
const std::vector<float>& preclusterThreshold) const uint& netH, const std::vector<float>& preclusterThreshold)
{ {
std::vector<NvDsInferParseObjectInfo> binfo; std::vector<NvDsInferParseObjectInfo> binfo;
for (uint b = 0; b < outputSize; ++b) { for (uint b = 0; b < outputSize; ++b) {
float maxProb = detection[b * 6 + 4]; float maxProb = scores[b];
int maxIndex = (int) detection[b * 6 + 5]; int maxIndex = classes[b];
if (maxProb < preclusterThreshold[maxIndex]) if (maxProb < preclusterThreshold[maxIndex])
continue; continue;
float bxc = detection[b * 6 + 0]; float bxc = boxes[b * 4 + 0];
float byc = detection[b * 6 + 1]; float byc = boxes[b * 4 + 1];
float bw = detection[b * 6 + 2]; float bw = boxes[b * 4 + 2];
float bh = detection[b * 6 + 3]; float bh = boxes[b * 4 + 3];
float bx1 = bxc - bw / 2; float bx1 = bxc - bw / 2;
float by1 = byc - bh / 2; float by1 = byc - bh / 2;
@@ -102,22 +102,22 @@ decodeTensorYolo(const float* detection, const uint& outputSize, const uint& net
} }
static std::vector<NvDsInferParseObjectInfo> static std::vector<NvDsInferParseObjectInfo>
decodeTensorYoloE(const float* detection, const uint& outputSize, const uint& netW, const uint& netH, decodeTensorYoloE(const float* boxes, const float* scores, const int* classes, const uint& outputSize, const uint& netW,
const std::vector<float>& preclusterThreshold) const uint& netH, const std::vector<float>& preclusterThreshold)
{ {
std::vector<NvDsInferParseObjectInfo> binfo; std::vector<NvDsInferParseObjectInfo> binfo;
for (uint b = 0; b < outputSize; ++b) { for (uint b = 0; b < outputSize; ++b) {
float maxProb = detection[b * 6 + 4]; float maxProb = scores[b];
int maxIndex = (int) detection[b * 6 + 5]; int maxIndex = classes[b];
if (maxProb < preclusterThreshold[maxIndex]) if (maxProb < preclusterThreshold[maxIndex])
continue; continue;
float bx1 = detection[b * 6 + 0]; float bx1 = boxes[b * 4 + 0];
float by1 = detection[b * 6 + 1]; float by1 = boxes[b * 4 + 1];
float bx2 = detection[b * 6 + 2]; float bx2 = boxes[b * 4 + 2];
float by2 = detection[b * 6 + 3]; float by2 = boxes[b * 4 + 3];
addBBoxProposal(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo); addBBoxProposal(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
} }
@@ -136,12 +136,27 @@ NvDsInferParseCustomYolo(std::vector<NvDsInferLayerInfo> const& outputLayersInfo
std::vector<NvDsInferParseObjectInfo> objects; std::vector<NvDsInferParseObjectInfo> objects;
const NvDsInferLayerInfo& layer = outputLayersInfo[0]; NvDsInferLayerInfo* boxes;
NvDsInferLayerInfo* scores;
NvDsInferLayerInfo* classes;
const uint outputSize = layer.inferDims.d[0]; for (uint i = 0; i < 3; ++i) {
if (outputLayersInfo[i].dataType == NvDsInferDataType::INT32) {
classes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else if (outputLayersInfo[i].inferDims.d[1] == 4) {
boxes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else {
scores = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
}
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYolo((const float*) (layer.buffer), outputSize, const uint outputSize = boxes->inferDims.d[0];
networkInfo.width, networkInfo.height, detectionParams.perClassPreclusterThreshold);
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYolo((const float*) (boxes->buffer),
(const float*) (scores->buffer), (const int*) (classes->buffer), outputSize, networkInfo.width, networkInfo.height,
detectionParams.perClassPreclusterThreshold);
objects.insert(objects.end(), outObjs.begin(), outObjs.end()); objects.insert(objects.end(), outObjs.begin(), outObjs.end());
@@ -161,12 +176,27 @@ NvDsInferParseCustomYoloE(std::vector<NvDsInferLayerInfo> const& outputLayersInf
std::vector<NvDsInferParseObjectInfo> objects; std::vector<NvDsInferParseObjectInfo> objects;
const NvDsInferLayerInfo& layer = outputLayersInfo[0]; NvDsInferLayerInfo* boxes;
NvDsInferLayerInfo* scores;
NvDsInferLayerInfo* classes;
const uint outputSize = layer.inferDims.d[0]; for (uint i = 0; i < 3; ++i) {
if (outputLayersInfo[i].dataType == NvDsInferDataType::INT32) {
classes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else if (outputLayersInfo[i].inferDims.d[1] == 4) {
boxes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else {
scores = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
}
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYoloE((const float*) (layer.buffer), outputSize, const uint outputSize = boxes->inferDims.d[0];
networkInfo.width, networkInfo.height, detectionParams.perClassPreclusterThreshold);
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYoloE((const float*) (boxes->buffer),
(const float*) (scores->buffer), (const int*) (classes->buffer), outputSize, networkInfo.width, networkInfo.height,
detectionParams.perClassPreclusterThreshold);
objects.insert(objects.end(), outObjs.begin(), outObjs.end()); objects.insert(objects.end(), outObjs.begin(), outObjs.end());

View File

@@ -30,33 +30,33 @@
#include "nvdsinfer_custom_impl.h" #include "nvdsinfer_custom_impl.h"
extern "C" bool extern "C" bool
NvDsInferParseYolo_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseYoloCuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList); NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList);
extern "C" bool extern "C" bool
NvDsInferParseYoloE_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseYoloECuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList); NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList);
__global__ void decodeTensorYolo_cuda(NvDsInferParseObjectInfo *binfo, float* input, int outputSize, int netW, int netH, __global__ void decodeTensorYoloCuda(NvDsInferParseObjectInfo *binfo, float* boxes, float* scores, int* classes,
float minPreclusterThreshold) int outputSize, int netW, int netH, float minPreclusterThreshold)
{ {
int x_id = blockIdx.x * blockDim.x + threadIdx.x; int x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize) if (x_id >= outputSize)
return; return;
float maxProb = input[x_id * 6 + 4]; float maxProb = scores[x_id];
int maxIndex = (int) input[x_id * 6 + 5]; int maxIndex = classes[x_id];
if (maxProb < minPreclusterThreshold) { if (maxProb < minPreclusterThreshold) {
binfo[x_id].detectionConfidence = 0.0; binfo[x_id].detectionConfidence = 0.0;
return; return;
} }
float bxc = input[x_id * 6 + 0]; float bxc = boxes[x_id * 4 + 0];
float byc = input[x_id * 6 + 1]; float byc = boxes[x_id * 4 + 1];
float bw = input[x_id * 6 + 2]; float bw = boxes[x_id * 4 + 2];
float bh = input[x_id * 6 + 3]; float bh = boxes[x_id * 4 + 3];
float x0 = bxc - bw / 2; float x0 = bxc - bw / 2;
float y0 = byc - bh / 2; float y0 = byc - bh / 2;
@@ -76,26 +76,26 @@ __global__ void decodeTensorYolo_cuda(NvDsInferParseObjectInfo *binfo, float* in
binfo[x_id].classId = maxIndex; binfo[x_id].classId = maxIndex;
} }
__global__ void decodeTensorYoloE_cuda(NvDsInferParseObjectInfo *binfo, float* input, int outputSize, int netW, int netH, __global__ void decodeTensorYoloECuda(NvDsInferParseObjectInfo *binfo, float* boxes, float* scores, int* classes,
float minPreclusterThreshold) int outputSize, int netW, int netH, float minPreclusterThreshold)
{ {
int x_id = blockIdx.x * blockDim.x + threadIdx.x; int x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize) if (x_id >= outputSize)
return; return;
float maxProb = input[x_id * 6 + 4]; float maxProb = scores[x_id];
int maxIndex = (int) input[x_id * 6 + 5]; int maxIndex = classes[x_id];
if (maxProb < minPreclusterThreshold) { if (maxProb < minPreclusterThreshold) {
binfo[x_id].detectionConfidence = 0.0; binfo[x_id].detectionConfidence = 0.0;
return; return;
} }
float x0 = input[x_id * 6 + 0]; float x0 = boxes[x_id * 4 + 0];
float y0 = input[x_id * 6 + 1]; float y0 = boxes[x_id * 4 + 1];
float x1 = input[x_id * 6 + 2]; float x1 = boxes[x_id * 4 + 2];
float y1 = input[x_id * 6 + 3]; float y1 = boxes[x_id * 4 + 3];
x0 = fminf(float(netW), fmaxf(float(0.0), x0)); x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0)); y0 = fminf(float(netH), fmaxf(float(0.0), y0));
@@ -110,7 +110,7 @@ __global__ void decodeTensorYoloE_cuda(NvDsInferParseObjectInfo *binfo, float* i
binfo[x_id].classId = maxIndex; binfo[x_id].classId = maxIndex;
} }
static bool NvDsInferParseCustomYolo_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, static bool NvDsInferParseCustomYoloCuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList) std::vector<NvDsInferParseObjectInfo>& objectList)
{ {
@@ -119,9 +119,23 @@ static bool NvDsInferParseCustomYolo_cuda(std::vector<NvDsInferLayerInfo> const&
return false; return false;
} }
const NvDsInferLayerInfo &layer = outputLayersInfo[0]; NvDsInferLayerInfo* boxes;
NvDsInferLayerInfo* scores;
NvDsInferLayerInfo* classes;
const int outputSize = layer.inferDims.d[0]; for (uint i = 0; i < 3; ++i) {
if (outputLayersInfo[i].dataType == NvDsInferDataType::INT32) {
classes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else if (outputLayersInfo[i].inferDims.d[1] == 4) {
boxes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else {
scores = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
}
const int outputSize = boxes->inferDims.d[0];
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize); thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
@@ -131,9 +145,9 @@ static bool NvDsInferParseCustomYolo_cuda(std::vector<NvDsInferLayerInfo> const&
int threads_per_block = 1024; int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1; int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensorYolo_cuda<<<number_of_blocks, threads_per_block>>>( decodeTensorYoloCuda<<<number_of_blocks, threads_per_block>>>(
thrust::raw_pointer_cast(objects.data()), (float*) layer.buffer, outputSize, networkInfo.width, networkInfo.height, thrust::raw_pointer_cast(objects.data()), (float*) (boxes->buffer), (float*) (scores->buffer),
minPreclusterThreshold); (int*) (classes->buffer), outputSize, networkInfo.width, networkInfo.height, minPreclusterThreshold);
objectList.resize(outputSize); objectList.resize(outputSize);
thrust::copy(objects.begin(), objects.end(), objectList.begin()); thrust::copy(objects.begin(), objects.end(), objectList.begin());
@@ -141,7 +155,7 @@ static bool NvDsInferParseCustomYolo_cuda(std::vector<NvDsInferLayerInfo> const&
return true; return true;
} }
static bool NvDsInferParseCustomYoloE_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, static bool NvDsInferParseCustomYoloECuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList) std::vector<NvDsInferParseObjectInfo>& objectList)
{ {
@@ -150,9 +164,23 @@ static bool NvDsInferParseCustomYoloE_cuda(std::vector<NvDsInferLayerInfo> const
return false; return false;
} }
const NvDsInferLayerInfo &layer = outputLayersInfo[0]; NvDsInferLayerInfo* boxes;
NvDsInferLayerInfo* scores;
NvDsInferLayerInfo* classes;
const int outputSize = layer.inferDims.d[0]; for (uint i = 0; i < 3; ++i) {
if (outputLayersInfo[i].dataType == NvDsInferDataType::INT32) {
classes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else if (outputLayersInfo[i].inferDims.d[1] == 4) {
boxes = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
else {
scores = (NvDsInferLayerInfo*) &outputLayersInfo[i];
}
}
const int outputSize = boxes->inferDims.d[0];
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize); thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
@@ -162,9 +190,9 @@ static bool NvDsInferParseCustomYoloE_cuda(std::vector<NvDsInferLayerInfo> const
int threads_per_block = 1024; int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1; int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensorYoloE_cuda<<<number_of_blocks, threads_per_block>>>( decodeTensorYoloECuda<<<number_of_blocks, threads_per_block>>>(
thrust::raw_pointer_cast(objects.data()), (float*) layer.buffer, outputSize, networkInfo.width, networkInfo.height, thrust::raw_pointer_cast(objects.data()), (float*) (boxes->buffer), (float*) (scores->buffer),
minPreclusterThreshold); (int*) (classes->buffer), outputSize, networkInfo.width, networkInfo.height, minPreclusterThreshold);
objectList.resize(outputSize); objectList.resize(outputSize);
thrust::copy(objects.begin(), objects.end(), objectList.begin()); thrust::copy(objects.begin(), objects.end(), objectList.begin());
@@ -173,18 +201,18 @@ static bool NvDsInferParseCustomYoloE_cuda(std::vector<NvDsInferLayerInfo> const
} }
extern "C" bool extern "C" bool
NvDsInferParseYolo_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseYoloCuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList) NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{ {
return NvDsInferParseCustomYolo_cuda(outputLayersInfo, networkInfo, detectionParams, objectList); return NvDsInferParseCustomYoloCuda(outputLayersInfo, networkInfo, detectionParams, objectList);
} }
extern "C" bool extern "C" bool
NvDsInferParseYoloE_cuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseYoloECuda(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList) NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{ {
return NvDsInferParseCustomYoloE_cuda(outputLayersInfo, networkInfo, detectionParams, objectList); return NvDsInferParseCustomYoloECuda(outputLayersInfo, networkInfo, detectionParams, objectList);
} }
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo_cuda); CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYoloCuda);
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYoloE_cuda); CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYoloECuda);

View File

@@ -60,15 +60,16 @@ bool
fileExists(const std::string fileName, bool verbose) fileExists(const std::string fileName, bool verbose)
{ {
if (!std::experimental::filesystem::exists(std::experimental::filesystem::path(fileName))) { if (!std::experimental::filesystem::exists(std::experimental::filesystem::path(fileName))) {
if (verbose) if (verbose) {
std::cout << "\nFile does not exist: " << fileName << std::endl; std::cout << "\nFile does not exist: " << fileName << std::endl;
}
return false; return false;
} }
return true; return true;
} }
std::vector<float> std::vector<float>
loadWeights(const std::string weightsFilePath, const std::string& networkType) loadWeights(const std::string weightsFilePath, const std::string& modelName)
{ {
assert(fileExists(weightsFilePath)); assert(fileExists(weightsFilePath));
std::cout << "\nLoading pre-trained weights" << std::endl; std::cout << "\nLoading pre-trained weights" << std::endl;
@@ -80,7 +81,7 @@ loadWeights(const std::string weightsFilePath, const std::string& networkType)
assert(file.good()); assert(file.good());
std::string line; std::string line;
if (networkType.find("yolov2") != std::string::npos && networkType.find("yolov2-tiny") == std::string::npos) { if (modelName.find("yolov2") != std::string::npos && modelName.find("yolov2-tiny") == std::string::npos) {
// Remove 4 int32 bytes of data from the stream belonging to the header // Remove 4 int32 bytes of data from the stream belonging to the header
file.ignore(4 * 4); file.ignore(4 * 4);
} }
@@ -94,8 +95,9 @@ loadWeights(const std::string weightsFilePath, const std::string& networkType)
file.read(floatWeight, 4); file.read(floatWeight, 4);
assert(file.gcount() == 4); assert(file.gcount() == 4);
weights.push_back(*reinterpret_cast<float*>(floatWeight)); weights.push_back(*reinterpret_cast<float*>(floatWeight));
if (file.peek() == std::istream::traits_type::eof()) if (file.peek() == std::istream::traits_type::eof()) {
break; break;
}
} }
} }
else { else {
@@ -103,7 +105,7 @@ loadWeights(const std::string weightsFilePath, const std::string& networkType)
assert(0); assert(0);
} }
std::cout << "Loading weights of " << networkType << " complete" << std::endl; std::cout << "Loading weights of " << modelName << " complete" << std::endl;
std::cout << "Total weights read: " << weights.size() << std::endl; std::cout << "Total weights read: " << weights.size() << std::endl;
return weights; return weights;
@@ -116,8 +118,9 @@ dimsToString(const nvinfer1::Dims d)
std::stringstream s; std::stringstream s;
s << "["; s << "[";
for (int i = 0; i < d.nbDims - 1; ++i) for (int i = 1; i < d.nbDims - 1; ++i) {
s << d.d[i] << ", "; s << d.d[i] << ", ";
}
s << d.d[d.nbDims - 1] << "]"; s << d.d[d.nbDims - 1] << "]";
return s.str(); return s.str();
@@ -127,16 +130,15 @@ int
getNumChannels(nvinfer1::ITensor* t) getNumChannels(nvinfer1::ITensor* t)
{ {
nvinfer1::Dims d = t->getDimensions(); nvinfer1::Dims d = t->getDimensions();
assert(d.nbDims == 3); assert(d.nbDims == 4);
return d.d[1];
return d.d[0];
} }
void void
printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput, std::string layerOutput, printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput, std::string layerOutput,
std::string weightPtr) std::string weightPtr)
{ {
std::cout << std::setw(8) << std::left << layerIndex << std::setw(30) << std::left << layerName; std::cout << std::setw(7) << std::left << layerIndex << std::setw(40) << std::left << layerName;
std::cout << std::setw(20) << std::left << layerInput << std::setw(20) << std::left << layerOutput; std::cout << std::setw(19) << std::left << layerInput << std::setw(19) << std::left << layerOutput;
std::cout << weightPtr << std::endl; std::cout << weightPtr << std::endl;
} }

View File

@@ -40,7 +40,7 @@ float clamp(const float val, const float minVal, const float maxVal);
bool fileExists(const std::string fileName, bool verbose = true); bool fileExists(const std::string fileName, bool verbose = true);
std::vector<float> loadWeights(const std::string weightsFilePath, const std::string& networkType); std::vector<float> loadWeights(const std::string weightsFilePath, const std::string& modelName);
std::string dimsToString(const nvinfer1::Dims d); std::string dimsToString(const nvinfer1::Dims d);

View File

@@ -23,6 +23,8 @@
* https://www.github.com/marcoslucianops * https://www.github.com/marcoslucianops
*/ */
#include "NvOnnxParser.h"
#include "yolo.h" #include "yolo.h"
#include "yoloPlugins.h" #include "yoloPlugins.h"
@@ -31,11 +33,14 @@
#endif #endif
Yolo::Yolo(const NetworkInfo& networkInfo) : m_InputBlobName(networkInfo.inputBlobName), Yolo::Yolo(const NetworkInfo& networkInfo) : m_InputBlobName(networkInfo.inputBlobName),
m_NetworkType(networkInfo.networkType), m_ConfigFilePath(networkInfo.configFilePath), m_NetworkType(networkInfo.networkType), m_ModelName(networkInfo.modelName),
m_WtsFilePath(networkInfo.wtsFilePath), m_Int8CalibPath(networkInfo.int8CalibPath), m_DeviceType(networkInfo.deviceType), m_OnnxWtsFilePath(networkInfo.onnxWtsFilePath), m_DarknetWtsFilePath(networkInfo.darknetWtsFilePath),
m_NumDetectedClasses(networkInfo.numDetectedClasses), m_ClusterMode(networkInfo.clusterMode), m_DarknetCfgFilePath(networkInfo.darknetCfgFilePath), m_BatchSize(networkInfo.batchSize),
m_NetworkMode(networkInfo.networkMode), m_InputH(0), m_InputW(0), m_InputC(0), m_InputSize(0), m_NumClasses(0), m_ImplicitBatch(networkInfo.implicitBatch), m_Int8CalibPath(networkInfo.int8CalibPath),
m_LetterBox(0), m_NewCoords(0), m_YoloCount(0) m_DeviceType(networkInfo.deviceType), m_NumDetectedClasses(networkInfo.numDetectedClasses),
m_ClusterMode(networkInfo.clusterMode), m_NetworkMode(networkInfo.networkMode), m_ScaleFactor(networkInfo.scaleFactor),
m_Offsets(networkInfo.offsets), m_InputC(0), m_InputH(0), m_InputW(0), m_InputSize(0), m_NumClasses(0), m_LetterBox(0),
m_NewCoords(0), m_YoloCount(0)
{ {
} }
@@ -47,74 +52,175 @@ Yolo::~Yolo()
nvinfer1::ICudaEngine* nvinfer1::ICudaEngine*
Yolo::createEngine(nvinfer1::IBuilder* builder, nvinfer1::IBuilderConfig* config) Yolo::createEngine(nvinfer1::IBuilder* builder, nvinfer1::IBuilderConfig* config)
{ {
assert (builder); assert(builder);
m_ConfigBlocks = parseConfigFile(m_ConfigFilePath); nvinfer1::NetworkDefinitionCreationFlags flags =
parseConfigBlocks(); (1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
nvinfer1::INetworkDefinition *network = builder->createNetworkV2(0); nvinfer1::INetworkDefinition* network = builder->createNetworkV2(flags);
if (parseModel(*network) != NVDSINFER_SUCCESS) { assert(network);
#ifdef LEGACY nvonnxparser::IParser* parser;
network->destroy();
if (m_NetworkType == "onnx") {
parser = nvonnxparser::createParser(*network, *builder->getLogger());
if (!parser->parseFromFile(m_OnnxWtsFilePath.c_str(), static_cast<int32_t>(nvinfer1::ILogger::Severity::kWARNING))) {
std::cerr << "\nCould not parse the ONNX model\n" << std::endl;
#if NV_TENSORRT_MAJOR >= 8
delete parser;
delete network;
#else #else
delete network; parser->destroy();
network->destroy();
#endif #endif
return nullptr; return nullptr;
}
m_InputC = network->getInput(0)->getDimensions().d[1];
m_InputH = network->getInput(0)->getDimensions().d[2];
m_InputW = network->getInput(0)->getDimensions().d[3];
}
else {
m_ConfigBlocks = parseConfigFile(m_DarknetCfgFilePath);
parseConfigBlocks();
if (parseModel(*network) != NVDSINFER_SUCCESS) {
#if NV_TENSORRT_MAJOR >= 8
delete network;
#else
network->destroy();
#endif
return nullptr;
}
} }
std::cout << "Building the TensorRT Engine\n" << std::endl; if (!m_ImplicitBatch && network->getInput(0)->getDimensions().d[0] == -1) {
nvinfer1::IOptimizationProfile* profile = builder->createOptimizationProfile();
if (m_NumClasses != m_NumDetectedClasses) { assert(profile);
std::cout << "NOTE: Number of classes mismatch, make sure to set num-detected-classes=" << m_NumClasses for (int32_t i = 0; i < network->getNbInputs(); ++i) {
<< " in config_infer file\n" << std::endl; nvinfer1::ITensor* input = network->getInput(i);
nvinfer1::Dims inputDims = input->getDimensions();
nvinfer1::Dims dims = inputDims;
dims.d[0] = 1;
profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMIN, dims);
dims.d[0] = m_BatchSize;
profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kOPT, dims);
dims.d[0] = m_BatchSize;
profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMAX, dims);
}
config->addOptimizationProfile(profile);
} }
if (m_LetterBox == 1) {
std::cout << "NOTE: letter_box is set in cfg file, make sure to set maintain-aspect-ratio=1 in config_infer file" std::cout << "\nBuilding the TensorRT Engine\n" << std::endl;
<< " to get better accuracy\n" << std::endl;
if (m_NetworkType == "darknet") {
if (m_NumClasses != m_NumDetectedClasses) {
std::cout << "NOTE: Number of classes mismatch, make sure to set num-detected-classes=" << m_NumClasses
<< " in config_infer file\n" << std::endl;
}
if (m_LetterBox == 1) {
std::cout << "NOTE: letter_box is set in cfg file, make sure to set maintain-aspect-ratio=1 in config_infer file"
<< " to get better accuracy\n" << std::endl;
}
} }
if (m_ClusterMode != 2) { if (m_ClusterMode != 2) {
std::cout << "NOTE: Wrong cluster-mode is set, make sure to set cluster-mode=2 in config_infer file\n" << std::endl; std::cout << "NOTE: Wrong cluster-mode is set, make sure to set cluster-mode=2 in config_infer file\n" << std::endl;
} }
if (m_NetworkMode == "INT8" && !fileExists(m_Int8CalibPath)) { if (m_NetworkMode == "FP16") {
assert(builder->platformHasFastFp16());
config->setFlag(nvinfer1::BuilderFlag::kFP16);
}
else if (m_NetworkMode == "INT8") {
assert(builder->platformHasFastInt8()); assert(builder->platformHasFastInt8());
#ifdef OPENCV
std::string calib_image_list;
int calib_batch_size;
if (getenv("INT8_CALIB_IMG_PATH"))
calib_image_list = getenv("INT8_CALIB_IMG_PATH");
else {
std::cerr << "INT8_CALIB_IMG_PATH not set" << std::endl;
assert(0);
}
if (getenv("INT8_CALIB_BATCH_SIZE"))
calib_batch_size = std::stoi(getenv("INT8_CALIB_BATCH_SIZE"));
else {
std::cerr << "INT8_CALIB_BATCH_SIZE not set" << std::endl;
assert(0);
}
nvinfer1::IInt8EntropyCalibrator2 *calibrator = new Int8EntropyCalibrator2(calib_batch_size, m_InputC, m_InputH,
m_InputW, m_LetterBox, calib_image_list, m_Int8CalibPath);
config->setFlag(nvinfer1::BuilderFlag::kINT8); config->setFlag(nvinfer1::BuilderFlag::kINT8);
config->setInt8Calibrator(calibrator); if (m_Int8CalibPath != "" && !fileExists(m_Int8CalibPath)) {
#ifdef OPENCV
std::string calib_image_list;
int calib_batch_size;
if (getenv("INT8_CALIB_IMG_PATH")) {
calib_image_list = getenv("INT8_CALIB_IMG_PATH");
}
else {
std::cerr << "INT8_CALIB_IMG_PATH not set" << std::endl;
assert(0);
}
if (getenv("INT8_CALIB_BATCH_SIZE")) {
calib_batch_size = std::stoi(getenv("INT8_CALIB_BATCH_SIZE"));
}
else {
std::cerr << "INT8_CALIB_BATCH_SIZE not set" << std::endl;
assert(0);
}
nvinfer1::IInt8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(calib_batch_size, m_InputC, m_InputH,
m_InputW, m_ScaleFactor, m_Offsets, calib_image_list, m_Int8CalibPath);
config->setInt8Calibrator(calibrator);
#else #else
std::cerr << "OpenCV is required to run INT8 calibrator\n" << std::endl; std::cerr << "OpenCV is required to run INT8 calibrator\n" << std::endl;
assert(0);
#if NV_TENSORRT_MAJOR >= 8
if (m_NetworkType == "onnx") {
delete parser;
}
delete network;
#else
if (m_NetworkType == "onnx") {
parser->destroy();
}
network->destroy();
#endif #endif
return nullptr;
#endif
}
} }
nvinfer1::ICudaEngine *engine = builder->buildEngineWithConfig(*network, *config); #ifdef GRAPH
if (engine) config->setProfilingVerbosity(nvinfer1::ProfilingVerbosity::kDETAILED);
std::cout << "Building complete\n" << std::endl; #endif
else
std::cerr << "Building engine failed\n" << std::endl;
#ifdef LEGACY nvinfer1::ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
network->destroy(); if (engine) {
std::cout << "Building complete\n" << std::endl;
}
else {
std::cerr << "Building engine failed\n" << std::endl;
}
#ifdef GRAPH
nvinfer1::IExecutionContext *context = engine->createExecutionContext();
nvinfer1::IEngineInspector *inpector = engine->createEngineInspector();
inpector->setExecutionContext(context);
std::ofstream graph;
graph.open("graph.json");
graph << inpector->getEngineInformation(nvinfer1::LayerInformationFormat::kJSON);
graph.close();
std::cout << "Network graph saved to graph.json\n" << std::endl;
#if NV_TENSORRT_MAJOR >= 8
delete inpector;
delete context;
#else #else
delete network; inpector->destroy();
context->destroy();
#endif
#endif
#if NV_TENSORRT_MAJOR >= 8
if (m_NetworkType == "onnx") {
delete parser;
}
delete network;
#else
if (m_NetworkType == "onnx") {
parser->destroy();
}
network->destroy();
#endif #endif
return engine; return engine;
@@ -124,14 +230,16 @@ NvDsInferStatus
Yolo::parseModel(nvinfer1::INetworkDefinition& network) { Yolo::parseModel(nvinfer1::INetworkDefinition& network) {
destroyNetworkUtils(); destroyNetworkUtils();
std::vector<float> weights = loadWeights(m_WtsFilePath, m_NetworkType); std::vector<float> weights = loadWeights(m_DarknetWtsFilePath, m_ModelName);
std::cout << "Building YOLO network\n" << std::endl; std::cout << "Building YOLO network\n" << std::endl;
NvDsInferStatus status = buildYoloNetwork(weights, network); NvDsInferStatus status = buildYoloNetwork(weights, network);
if (status == NVDSINFER_SUCCESS) if (status == NVDSINFER_SUCCESS) {
std::cout << "Building YOLO network complete" << std::endl; std::cout << "Building YOLO network complete" << std::endl;
else }
else {
std::cerr << "Building YOLO network failed" << std::endl; std::cerr << "Building YOLO network failed" << std::endl;
}
return status; return status;
} }
@@ -141,8 +249,11 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
{ {
int weightPtr = 0; int weightPtr = 0;
uint batchSize = m_ImplicitBatch ? m_BatchSize : -1;
nvinfer1::ITensor* data = network.addInput(m_InputBlobName.c_str(), nvinfer1::DataType::kFLOAT, nvinfer1::ITensor* data = network.addInput(m_InputBlobName.c_str(), nvinfer1::DataType::kFLOAT,
nvinfer1::Dims{3, {static_cast<int>(m_InputC), static_cast<int>(m_InputH), static_cast<int>(m_InputW)}}); nvinfer1::Dims{4, {static_cast<int>(batchSize), static_cast<int>(m_InputC), static_cast<int>(m_InputH),
static_cast<int>(m_InputW)}});
assert(data != nullptr && data->getDimensions().nbDims > 0); assert(data != nullptr && data->getDimensions().nbDims > 0);
nvinfer1::ITensor* previous = data; nvinfer1::ITensor* previous = data;
@@ -287,28 +398,13 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
std::string layerName = m_ConfigBlocks.at(i).at("type"); std::string layerName = m_ConfigBlocks.at(i).at("type");
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "reorg3d") { else if (m_ConfigBlocks.at(i).at("type") == "reorg" || m_ConfigBlocks.at(i).at("type") == "reorg3d") {
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
previous = reorgLayer(i, m_ConfigBlocks.at(i), previous, &network); previous = reorgLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = "reorg3d"; std::string layerName = m_ConfigBlocks.at(i).at("type");
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "reorg") {
std::string inputVol = dimsToString(previous->getDimensions());
nvinfer1::IPluginV2* reorgPlugin = createReorgPlugin(2);
assert(reorgPlugin != nullptr);
nvinfer1::IPluginV2Layer* reorg = network.addPluginV2(&previous, 1, *reorgPlugin);
assert(reorg != nullptr);
std::string reorglayerName = "reorg_" + std::to_string(i);
reorg->setName(reorglayerName.c_str());
previous = reorg->getOutput(0);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "reorg";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "yolo" || m_ConfigBlocks.at(i).at("type") == "region") { else if (m_ConfigBlocks.at(i).at("type") == "yolo" || m_ConfigBlocks.at(i).at("type") == "region") {
@@ -317,9 +413,8 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
nvinfer1::Dims prevTensorDims = previous->getDimensions(); nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs); TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName; curYoloTensor.blobName = blobName;
curYoloTensor.gridSizeX = prevTensorDims.d[2]; curYoloTensor.gridSizeY = prevTensorDims.d[2];
curYoloTensor.gridSizeY = prevTensorDims.d[1]; curYoloTensor.gridSizeX = prevTensorDims.d[3];
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous; yoloTensorInputs[yoloCountInputs] = previous;
@@ -345,10 +440,10 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
uint64_t outputSize = 0; uint64_t outputSize = 0;
for (uint j = 0; j < yoloCountInputs; ++j) { for (uint j = 0; j < yoloCountInputs; ++j) {
TensorInfo& curYoloTensor = m_YoloTensors.at(j); TensorInfo& curYoloTensor = m_YoloTensors.at(j);
outputSize += curYoloTensor.gridSizeX * curYoloTensor.gridSizeY * curYoloTensor.numBBoxes; outputSize += curYoloTensor.numBBoxes * curYoloTensor.gridSizeY * curYoloTensor.gridSizeX;
} }
nvinfer1::IPluginV2* yoloPlugin = new YoloLayer(m_InputW, m_InputH, m_NumClasses, m_NewCoords, m_YoloTensors, nvinfer1::IPluginV2DynamicExt* yoloPlugin = new YoloLayer(m_InputW, m_InputH, m_NumClasses, m_NewCoords, m_YoloTensors,
outputSize); outputSize);
assert(yoloPlugin != nullptr); assert(yoloPlugin != nullptr);
nvinfer1::IPluginV2Layer* yolo = network.addPluginV2(yoloTensorInputs, m_YoloCount, *yoloPlugin); nvinfer1::IPluginV2Layer* yolo = network.addPluginV2(yoloTensorInputs, m_YoloCount, *yoloPlugin);
@@ -356,10 +451,19 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
std::string yoloLayerName = "yolo"; std::string yoloLayerName = "yolo";
yolo->setName(yoloLayerName.c_str()); yolo->setName(yoloLayerName.c_str());
nvinfer1::ITensor* outputYolo = yolo->getOutput(0); std::string outputlayerName;
std::string outputYoloLayerName = "output"; nvinfer1::ITensor* detection_boxes = yolo->getOutput(0);
outputYolo->setName(outputYoloLayerName.c_str()); outputlayerName = "boxes";
network.markOutput(*outputYolo); detection_boxes->setName(outputlayerName.c_str());
nvinfer1::ITensor* detection_scores = yolo->getOutput(1);
outputlayerName = "scores";
detection_scores->setName(outputlayerName.c_str());
nvinfer1::ITensor* detection_classes = yolo->getOutput(2);
outputlayerName = "classes";
detection_classes->setName(outputlayerName.c_str());
network.markOutput(*detection_boxes);
network.markOutput(*detection_scores);
network.markOutput(*detection_classes);
} }
else { else {
std::cerr << "\nError in yolo cfg file" << std::endl; std::cerr << "\nError in yolo cfg file" << std::endl;

View File

@@ -45,13 +45,19 @@ struct NetworkInfo
{ {
std::string inputBlobName; std::string inputBlobName;
std::string networkType; std::string networkType;
std::string configFilePath; std::string modelName;
std::string wtsFilePath; std::string onnxWtsFilePath;
std::string darknetWtsFilePath;
std::string darknetCfgFilePath;
uint batchSize;
int implicitBatch;
std::string int8CalibPath; std::string int8CalibPath;
std::string deviceType; std::string deviceType;
uint numDetectedClasses; uint numDetectedClasses;
int clusterMode; int clusterMode;
std::string networkMode; std::string networkMode;
float scaleFactor;
const float* offsets;
}; };
struct TensorInfo struct TensorInfo
@@ -74,7 +80,8 @@ class Yolo : public IModelParser {
bool hasFullDimsSupported() const override { return false; } bool hasFullDimsSupported() const override { return false; }
const char* getModelName() const override { const char* getModelName() const override {
return m_ConfigFilePath.empty() ? m_NetworkType.c_str() : m_ConfigFilePath.c_str(); return m_NetworkType == "onnx" ? m_OnnxWtsFilePath.substr(0, m_OnnxWtsFilePath.find(".onnx")).c_str() :
m_DarknetCfgFilePath.substr(0, m_DarknetCfgFilePath.find(".cfg")).c_str();
} }
NvDsInferStatus parseModel(nvinfer1::INetworkDefinition& network) override; NvDsInferStatus parseModel(nvinfer1::INetworkDefinition& network) override;
@@ -84,17 +91,23 @@ class Yolo : public IModelParser {
protected: protected:
const std::string m_InputBlobName; const std::string m_InputBlobName;
const std::string m_NetworkType; const std::string m_NetworkType;
const std::string m_ConfigFilePath; const std::string m_ModelName;
const std::string m_WtsFilePath; const std::string m_OnnxWtsFilePath;
const std::string m_DarknetWtsFilePath;
const std::string m_DarknetCfgFilePath;
const uint m_BatchSize;
const int m_ImplicitBatch;
const std::string m_Int8CalibPath; const std::string m_Int8CalibPath;
const std::string m_DeviceType; const std::string m_DeviceType;
const uint m_NumDetectedClasses; const uint m_NumDetectedClasses;
const int m_ClusterMode; const int m_ClusterMode;
const std::string m_NetworkMode; const std::string m_NetworkMode;
const float m_ScaleFactor;
const float* m_Offsets;
uint m_InputC;
uint m_InputH; uint m_InputH;
uint m_InputW; uint m_InputW;
uint m_InputC;
uint64_t m_InputSize; uint64_t m_InputSize;
uint m_NumClasses; uint m_NumClasses;
uint m_LetterBox; uint m_LetterBox;

View File

@@ -4,13 +4,12 @@
*/ */
#include <stdint.h> #include <stdint.h>
#include <stdio.h>
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); } inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
__global__ void gpuYoloLayer(const float* input, float* output, int* count, const uint netWidth, const uint netHeight, __global__ void gpuYoloLayer(const float* input, float* boxes, float* scores, int* classes, const uint netWidth,
const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY, const uint netHeight, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes,
const float* anchors, const int* mask) const uint64_t lastInputSize, const float scaleXY, const float* anchors, const int* mask)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -22,8 +21,6 @@ __global__ void gpuYoloLayer(const float* input, float* output, int* count, cons
const int numGridCells = gridSizeX * gridSizeY; const int numGridCells = gridSizeX * gridSizeY;
const int bbindex = y_id * gridSizeX + x_id; const int bbindex = y_id * gridSizeX + x_id;
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
const float alpha = scaleXY; const float alpha = scaleXY;
const float beta = -0.5 * (scaleXY - 1); const float beta = -0.5 * (scaleXY - 1);
@@ -37,6 +34,8 @@ __global__ void gpuYoloLayer(const float* input, float* output, int* count, cons
float h = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[mask[z_id] * 2 + 1]; float h = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[mask[z_id] * 2 + 1];
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
float maxProb = 0.0f; float maxProb = 0.0f;
int maxIndex = -1; int maxIndex = -1;
@@ -48,25 +47,25 @@ __global__ void gpuYoloLayer(const float* input, float* output, int* count, cons
} }
} }
int _count = (int)atomicAdd(count, 1); int count = z_id * gridSizeX * gridSizeY + y_id * gridSizeY + x_id + lastInputSize;
output[_count * 6 + 0] = xc; boxes[count * 4 + 0] = xc;
output[_count * 6 + 1] = yc; boxes[count * 4 + 1] = yc;
output[_count * 6 + 2] = w; boxes[count * 4 + 2] = w;
output[_count * 6 + 3] = h; boxes[count * 4 + 3] = h;
output[_count * 6 + 4] = maxProb * objectness; scores[count] = maxProb * objectness;
output[_count * 6 + 5] = maxIndex; classes[count] = maxIndex;
} }
cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream); const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream) const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -75,9 +74,10 @@ cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const ui
for (unsigned int batch = 0; batch < batchSize; ++batch) { for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*> (input) + (batch * inputSize), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<float*> (output) + (batch * 6 * outputSize), reinterpret_cast<float*> (boxes) + (batch * 4 * outputSize),
reinterpret_cast<int*> (count) + (batch), reinterpret_cast<float*> (scores) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY, reinterpret_cast<int*> (classes) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, lastInputSize, scaleXY,
reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask)); reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask));
} }
return cudaGetLastError(); return cudaGetLastError();

View File

@@ -5,9 +5,9 @@
#include <stdint.h> #include <stdint.h>
__global__ void gpuYoloLayer_nc(const float* input, float* output, int* count, const uint netWidth, const uint netHeight, __global__ void gpuYoloLayer_nc(const float* input, float* boxes, float* scores, int* classes, const uint netWidth,
const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY, const uint netHeight, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes,
const float* anchors, const int* mask) const uint64_t lastInputSize, const float scaleXY, const float* anchors, const int* mask)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -19,8 +19,6 @@ __global__ void gpuYoloLayer_nc(const float* input, float* output, int* count, c
const int numGridCells = gridSizeX * gridSizeY; const int numGridCells = gridSizeX * gridSizeY;
const int bbindex = y_id * gridSizeX + x_id; const int bbindex = y_id * gridSizeX + x_id;
const float objectness = input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)];
const float alpha = scaleXY; const float alpha = scaleXY;
const float beta = -0.5 * (scaleXY - 1); const float beta = -0.5 * (scaleXY - 1);
@@ -34,6 +32,8 @@ __global__ void gpuYoloLayer_nc(const float* input, float* output, int* count, c
float h = __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)] * 2, 2) * anchors[mask[z_id] * 2 + 1]; float h = __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)] * 2, 2) * anchors[mask[z_id] * 2 + 1];
const float objectness = input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)];
float maxProb = 0.0f; float maxProb = 0.0f;
int maxIndex = -1; int maxIndex = -1;
@@ -45,25 +45,25 @@ __global__ void gpuYoloLayer_nc(const float* input, float* output, int* count, c
} }
} }
int _count = (int)atomicAdd(count, 1); int count = z_id * gridSizeX * gridSizeY + y_id * gridSizeY + x_id + lastInputSize;
output[_count * 6 + 0] = xc; boxes[count * 4 + 0] = xc;
output[_count * 6 + 1] = yc; boxes[count * 4 + 1] = yc;
output[_count * 6 + 2] = w; boxes[count * 4 + 2] = w;
output[_count * 6 + 3] = h; boxes[count * 4 + 3] = h;
output[_count * 6 + 4] = maxProb * objectness; scores[count] = maxProb * objectness;
output[_count * 6 + 5] = maxIndex; classes[count] = maxIndex;
} }
cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer_nc(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream); const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer_nc(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream) const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -72,9 +72,10 @@ cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const
for (unsigned int batch = 0; batch < batchSize; ++batch) { for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*> (input) + (batch * inputSize), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<float*> (output) + (batch * 6 * outputSize), reinterpret_cast<float*> (boxes) + (batch * 4 * outputSize),
reinterpret_cast<int*> (count) + (batch), reinterpret_cast<float*> (scores) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY, reinterpret_cast<int*> (classes) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, lastInputSize, scaleXY,
reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask)); reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask));
} }
return cudaGetLastError(); return cudaGetLastError();

View File

@@ -27,9 +27,9 @@ __device__ void softmaxGPU(const float* input, const int bbindex, const int numG
} }
} }
__global__ void gpuRegionLayer(const float* input, float* softmax, float* output, int* count, const uint netWidth, __global__ void gpuRegionLayer(const float* input, float* softmax, float* boxes, float* scores, int* classes,
const uint netHeight, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const uint netWidth, const uint netHeight, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
const float* anchors) const uint numBBoxes, const uint64_t lastInputSize, const float* anchors)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -41,8 +41,6 @@ __global__ void gpuRegionLayer(const float* input, float* softmax, float* output
const int numGridCells = gridSizeX * gridSizeY; const int numGridCells = gridSizeX * gridSizeY;
const int bbindex = y_id * gridSizeX + x_id; const int bbindex = y_id * gridSizeX + x_id;
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
float xc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) + x_id) * netWidth / gridSizeX; float xc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) + x_id) * netWidth / gridSizeX;
float yc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) + y_id) * netHeight / gridSizeY; float yc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) + y_id) * netHeight / gridSizeY;
@@ -53,6 +51,8 @@ __global__ void gpuRegionLayer(const float* input, float* softmax, float* output
float h = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[z_id * 2 + 1] * netHeight / float h = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[z_id * 2 + 1] * netHeight /
gridSizeY; gridSizeY;
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
softmaxGPU(input, bbindex, numGridCells, z_id, numOutputClasses, 1.0, softmax); softmaxGPU(input, bbindex, numGridCells, z_id, numOutputClasses, 1.0, softmax);
float maxProb = 0.0f; float maxProb = 0.0f;
@@ -66,23 +66,25 @@ __global__ void gpuRegionLayer(const float* input, float* softmax, float* output
} }
} }
int _count = (int)atomicAdd(count, 1); int count = z_id * gridSizeX * gridSizeY + y_id * gridSizeY + x_id + lastInputSize;
output[_count * 6 + 0] = xc; boxes[count * 4 + 0] = xc;
output[_count * 6 + 1] = yc; boxes[count * 4 + 1] = yc;
output[_count * 6 + 2] = w; boxes[count * 4 + 2] = w;
output[_count * 6 + 3] = h; boxes[count * 4 + 3] = h;
output[_count * 6 + 4] = maxProb * objectness; scores[count] = maxProb * objectness;
output[_count * 6 + 5] = maxIndex; classes[count] = maxIndex;
} }
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* boxes, void* scores, void* classes,
uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& batchSize, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize,
const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream); const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const void* anchors, cudaStream_t stream);
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* boxes, void* scores, void* classes,
uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& batchSize, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize,
const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream) const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const void* anchors, cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -92,9 +94,10 @@ cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void
gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*> (input) + (batch * inputSize), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<float*> (softmax) + (batch * inputSize), reinterpret_cast<float*> (softmax) + (batch * inputSize),
reinterpret_cast<float*> (output) + (batch * 6 * outputSize), reinterpret_cast<float*> (boxes) + (batch * 4 * outputSize),
reinterpret_cast<int*> (count) + (batch), reinterpret_cast<float*> (scores) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, reinterpret_cast<int*> (classes) + (batch * 1 * outputSize),
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, lastInputSize,
reinterpret_cast<const float*> (anchors)); reinterpret_cast<const float*> (anchors));
} }
return cudaGetLastError(); return cudaGetLastError();

View File

@@ -38,19 +38,20 @@ namespace {
} }
} }
cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer_nc(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream); const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize, cudaError_t cudaYoloLayer(const void* input, void* boxes, void* scores, void* classes, const uint& batchSize,
uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize, const uint& netWidth,
const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
cudaStream_t stream); const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* boxes, void* scores, void* classes,
uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& batchSize, const uint64_t& inputSize, const uint64_t& outputSize, const uint64_t& lastInputSize,
const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream); const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const void* anchors, cudaStream_t stream);
YoloLayer::YoloLayer(const void* data, size_t length) { YoloLayer::YoloLayer(const void* data, size_t length) {
const char* d = static_cast<const char*>(data); const char* d = static_cast<const char*>(data);
@@ -99,96 +100,10 @@ YoloLayer::YoloLayer(const uint& netWidth, const uint& netHeight, const uint& nu
assert(m_NetHeight > 0); assert(m_NetHeight > 0);
}; };
nvinfer1::Dims nvinfer1::IPluginV2DynamicExt*
YoloLayer::getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept YoloLayer::clone() const noexcept
{ {
assert(index == 0); return new YoloLayer(m_NetWidth, m_NetHeight, m_NumClasses, m_NewCoords, m_YoloTensors, m_OutputSize);
return nvinfer1::Dims{2, {static_cast<int>(m_OutputSize), 6}};
}
bool
YoloLayer::supportsFormat(nvinfer1::DataType type, nvinfer1::PluginFormat format) const noexcept {
return (type == nvinfer1::DataType::kFLOAT && format == nvinfer1::PluginFormat::kLINEAR);
}
void
YoloLayer::configureWithFormat(const nvinfer1::Dims* inputDims, int nbInputs, const nvinfer1::Dims* outputDims,
int nbOutputs, nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) noexcept
{
assert(nbInputs > 0);
assert(format == nvinfer1::PluginFormat::kLINEAR);
assert(inputDims != nullptr);
}
#ifdef LEGACY
int
YoloLayer::enqueue(int batchSize, const void* const* inputs, void** outputs, void* workspace, cudaStream_t stream)
#else
int32_t
YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream)
noexcept
#endif
{
void* output = outputs[0];
CUDA_CHECK(cudaMemsetAsync((float*) output, 0, sizeof(float) * m_OutputSize * 6 * batchSize, stream));
void* count = workspace;
CUDA_CHECK(cudaMemsetAsync((int*) count, 0, sizeof(int) * batchSize, stream));
uint yoloTensorsSize = m_YoloTensors.size();
for (uint i = 0; i < yoloTensorsSize; ++i) {
TensorInfo& curYoloTensor = m_YoloTensors.at(i);
uint numBBoxes = curYoloTensor.numBBoxes;
float scaleXY = curYoloTensor.scaleXY;
uint gridSizeX = curYoloTensor.gridSizeX;
uint gridSizeY = curYoloTensor.gridSizeY;
std::vector<float> anchors = curYoloTensor.anchors;
std::vector<int> mask = curYoloTensor.mask;
void* v_anchors;
void* v_mask;
if (anchors.size() > 0) {
CUDA_CHECK(cudaMalloc(&v_anchors, sizeof(float) * anchors.size()));
CUDA_CHECK(cudaMemcpyAsync(v_anchors, anchors.data(), sizeof(float) * anchors.size(), cudaMemcpyHostToDevice, stream));
}
if (mask.size() > 0) {
CUDA_CHECK(cudaMalloc(&v_mask, sizeof(int) * mask.size()));
CUDA_CHECK(cudaMemcpyAsync(v_mask, mask.data(), sizeof(int) * mask.size(), cudaMemcpyHostToDevice, stream));
}
uint64_t inputSize = gridSizeX * gridSizeY * (numBBoxes * (4 + 1 + m_NumClasses));
if (mask.size() > 0) {
if (m_NewCoords) {
CUDA_CHECK(cudaYoloLayer_nc(inputs[i], output, count, batchSize, inputSize, m_OutputSize, m_NetWidth, m_NetHeight,
gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
}
else {
CUDA_CHECK(cudaYoloLayer(inputs[i], output, count, batchSize, inputSize, m_OutputSize, m_NetWidth, m_NetHeight,
gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
}
}
else {
void* softmax;
CUDA_CHECK(cudaMalloc(&softmax, sizeof(float) * inputSize * batchSize));
CUDA_CHECK(cudaMemsetAsync((float*)softmax, 0, sizeof(float) * inputSize * batchSize, stream));
CUDA_CHECK(cudaRegionLayer(inputs[i], softmax, output, count, batchSize, inputSize, m_OutputSize, m_NetWidth,
m_NetHeight, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, v_anchors, stream));
CUDA_CHECK(cudaFree(softmax));
}
if (anchors.size() > 0) {
CUDA_CHECK(cudaFree(v_anchors));
}
if (mask.size() > 0) {
CUDA_CHECK(cudaFree(v_mask));
}
}
return 0;
} }
size_t size_t
@@ -250,10 +165,113 @@ YoloLayer::serialize(void* buffer) const noexcept
} }
} }
nvinfer1::IPluginV2* nvinfer1::DimsExprs
YoloLayer::clone() const noexcept YoloLayer::getOutputDimensions(INT index, const nvinfer1::DimsExprs* inputs, INT nbInputDims,
nvinfer1::IExprBuilder& exprBuilder)noexcept
{ {
return new YoloLayer(m_NetWidth, m_NetHeight, m_NumClasses, m_NewCoords, m_YoloTensors, m_OutputSize); assert(index < 3);
if (index == 0) {
return nvinfer1::DimsExprs{3, {inputs->d[0], exprBuilder.constant(static_cast<int>(m_OutputSize)),
exprBuilder.constant(4)}};
}
return nvinfer1::DimsExprs{3, {inputs->d[0], exprBuilder.constant(static_cast<int>(m_OutputSize)),
exprBuilder.constant(1)}};
}
bool
YoloLayer::supportsFormatCombination(INT pos, const nvinfer1::PluginTensorDesc* inOut, INT nbInputs, INT nbOutputs) noexcept
{
return inOut[pos].format == nvinfer1::TensorFormat::kLINEAR && (inOut[pos].type == nvinfer1::DataType::kFLOAT ||
inOut[pos].type == nvinfer1::DataType::kINT32);
}
nvinfer1::DataType
YoloLayer::getOutputDataType(INT index, const nvinfer1::DataType* inputTypes, INT nbInputs) const noexcept
{
assert(index < 3);
if (index == 2) {
return nvinfer1::DataType::kINT32;
}
return nvinfer1::DataType::kFLOAT;
}
void
YoloLayer::configurePlugin(const nvinfer1::DynamicPluginTensorDesc* in, INT nbInput,
const nvinfer1::DynamicPluginTensorDesc* out, INT nbOutput) noexcept
{
assert(nbInput > 0);
assert(in->desc.format == nvinfer1::PluginFormat::kLINEAR);
assert(in->desc.dims.d != nullptr);
}
INT
YoloLayer::enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc,
void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept
{
INT batchSize = inputDesc[0].dims.d[0];
void* boxes = outputs[0];
void* scores = outputs[1];
void* classes = outputs[2];
uint64_t lastInputSize = 0;
uint yoloTensorsSize = m_YoloTensors.size();
for (uint i = 0; i < yoloTensorsSize; ++i) {
TensorInfo& curYoloTensor = m_YoloTensors.at(i);
const uint numBBoxes = curYoloTensor.numBBoxes;
const float scaleXY = curYoloTensor.scaleXY;
const uint gridSizeX = curYoloTensor.gridSizeX;
const uint gridSizeY = curYoloTensor.gridSizeY;
const std::vector<float> anchors = curYoloTensor.anchors;
const std::vector<int> mask = curYoloTensor.mask;
void* v_anchors;
void* v_mask;
if (anchors.size() > 0) {
CUDA_CHECK(cudaMalloc(&v_anchors, sizeof(float) * anchors.size()));
CUDA_CHECK(cudaMemcpyAsync(v_anchors, anchors.data(), sizeof(float) * anchors.size(), cudaMemcpyHostToDevice, stream));
}
if (mask.size() > 0) {
CUDA_CHECK(cudaMalloc(&v_mask, sizeof(int) * mask.size()));
CUDA_CHECK(cudaMemcpyAsync(v_mask, mask.data(), sizeof(int) * mask.size(), cudaMemcpyHostToDevice, stream));
}
const uint64_t inputSize = (numBBoxes * (4 + 1 + m_NumClasses)) * gridSizeY * gridSizeX;
if (mask.size() > 0) {
if (m_NewCoords) {
CUDA_CHECK(cudaYoloLayer_nc(inputs[i], boxes, scores, classes, batchSize, inputSize, m_OutputSize, lastInputSize,
m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
}
else {
CUDA_CHECK(cudaYoloLayer(inputs[i], boxes, scores, classes, batchSize, inputSize, m_OutputSize, lastInputSize,
m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
}
}
else {
void* softmax;
CUDA_CHECK(cudaMalloc(&softmax, sizeof(float) * inputSize * batchSize));
CUDA_CHECK(cudaMemsetAsync((float*)softmax, 0, sizeof(float) * inputSize * batchSize, stream));
CUDA_CHECK(cudaRegionLayer(inputs[i], softmax, boxes, scores, classes, batchSize, inputSize, m_OutputSize,
lastInputSize, m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, v_anchors, stream));
CUDA_CHECK(cudaFree(softmax));
}
if (anchors.size() > 0) {
CUDA_CHECK(cudaFree(v_anchors));
}
if (mask.size() > 0) {
CUDA_CHECK(cudaFree(v_mask));
}
lastInputSize += numBBoxes * gridSizeY * gridSizeX;
}
return 0;
} }
REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator); REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator);

View File

@@ -38,57 +38,68 @@
} \ } \
} }
#if NV_TENSORRT_MAJOR >= 8
#define INT int32_t
#else
#define INT int
#endif
namespace { namespace {
const char* YOLOLAYER_PLUGIN_VERSION {"1"}; const char* YOLOLAYER_PLUGIN_VERSION {"1"};
const char* YOLOLAYER_PLUGIN_NAME {"YoloLayer_TRT"}; const char* YOLOLAYER_PLUGIN_NAME {"YoloLayer_TRT"};
} // namespace } // namespace
class YoloLayer : public nvinfer1::IPluginV2 { class YoloLayer : public nvinfer1::IPluginV2DynamicExt {
public: public:
YoloLayer(const void* data, size_t length); YoloLayer(const void* data, size_t length);
YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords, YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords,
const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize); const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize);
const char* getPluginType() const noexcept override { return YOLOLAYER_PLUGIN_NAME; } nvinfer1::IPluginV2DynamicExt* clone() const noexcept override;
const char* getPluginVersion() const noexcept override { return YOLOLAYER_PLUGIN_VERSION; }
int getNbOutputs() const noexcept override { return 1; }
nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept override;
bool supportsFormat(nvinfer1::DataType type, nvinfer1::PluginFormat format) const noexcept override;
void configureWithFormat(const nvinfer1::Dims* inputDims, int nbInputs, const nvinfer1::Dims* outputDims, int nbOutputs,
nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) noexcept override;
int initialize() noexcept override { return 0; } int initialize() noexcept override { return 0; }
void terminate() noexcept override {} void terminate() noexcept override {}
size_t getWorkspaceSize(int maxBatchSize) const noexcept override { void destroy() noexcept override { delete this; }
return maxBatchSize * sizeof(int);
}
#ifdef LEGACY
int enqueue(int batchSize, const void* const* inputs, void** outputs, void* workspace, cudaStream_t stream) override;
#else
int32_t enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream)
noexcept override;
#endif
size_t getSerializationSize() const noexcept override; size_t getSerializationSize() const noexcept override;
void serialize(void* buffer) const noexcept override; void serialize(void* buffer) const noexcept override;
void destroy() noexcept override { delete this; } int getNbOutputs() const noexcept override { return 3; }
nvinfer1::IPluginV2* clone() const noexcept override; nvinfer1::DimsExprs getOutputDimensions(INT index, const nvinfer1::DimsExprs* inputs, INT nbInputDims,
nvinfer1::IExprBuilder& exprBuilder) noexcept override;
size_t getWorkspaceSize(const nvinfer1::PluginTensorDesc* inputs, INT nbInputs,
const nvinfer1::PluginTensorDesc* outputs, INT nbOutputs) const noexcept override { return 0; }
bool supportsFormatCombination(INT pos, const nvinfer1::PluginTensorDesc* inOut, INT nbInputs, INT nbOutputs) noexcept
override;
const char* getPluginType() const noexcept override { return YOLOLAYER_PLUGIN_NAME; }
const char* getPluginVersion() const noexcept override { return YOLOLAYER_PLUGIN_VERSION; }
void setPluginNamespace(const char* pluginNamespace) noexcept override { m_Namespace = pluginNamespace; } void setPluginNamespace(const char* pluginNamespace) noexcept override { m_Namespace = pluginNamespace; }
virtual const char* getPluginNamespace() const noexcept override { return m_Namespace.c_str(); } const char* getPluginNamespace() const noexcept override { return m_Namespace.c_str(); }
nvinfer1::DataType getOutputDataType(INT index, const nvinfer1::DataType* inputTypes, INT nbInputs) const noexcept
override;
void attachToContext(cudnnContext* cudnnContext, cublasContext* cublasContext, nvinfer1::IGpuAllocator* gpuAllocator)
noexcept override {}
void configurePlugin(const nvinfer1::DynamicPluginTensorDesc* in, INT nbInput,
const nvinfer1::DynamicPluginTensorDesc* out, INT nbOutput) noexcept override;
void detachFromContext() noexcept override {}
INT enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc,
void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept override;
private: private:
std::string m_Namespace {""}; std::string m_Namespace {""};
@@ -115,12 +126,14 @@ class YoloLayerPluginCreator : public nvinfer1::IPluginCreator {
return nullptr; return nullptr;
} }
nvinfer1::IPluginV2* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) noexcept override { nvinfer1::IPluginV2DynamicExt* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) noexcept
override {
std::cerr<< "YoloLayerPluginCreator::getFieldNames is not implemented"; std::cerr<< "YoloLayerPluginCreator::getFieldNames is not implemented";
return nullptr; return nullptr;
} }
nvinfer1::IPluginV2* deserializePlugin(const char* name, const void* serialData, size_t serialLength) noexcept override { nvinfer1::IPluginV2DynamicExt* deserializePlugin(const char* name, const void* serialData, size_t serialLength) noexcept
override {
std::cout << "Deserialize yoloLayer plugin: " << name << std::endl; std::cout << "Deserialize yoloLayer plugin: " << name << std::endl;
return new YoloLayer(serialData, serialLength); return new YoloLayer(serialData, serialLength);
} }

View File

@@ -18,7 +18,7 @@ class DeepStreamOutput(nn.Module):
def forward(self, x): def forward(self, x):
boxes = x[1] boxes = x[1]
scores, classes = torch.max(x[0], 2, keepdim=True) scores, classes = torch.max(x[0], 2, keepdim=True)
return torch.cat((boxes, scores, classes.float()), dim=2) return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -65,21 +65,27 @@ def main(args):
img_size = args.size * 2 if len(args.size) == 1 else args.size img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = cfg.miscs['exp_name'] + '.onnx' onnx_output_file = cfg.miscs['exp_name'] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('Exporting the model to ONNX') print('Exporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -100,11 +106,14 @@ def parse_args():
parser.add_argument('--opset', type=int, default=11, help='ONNX opset version') parser.add_argument('--opset', type=int, default=11, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if not os.path.isfile(args.config): if not os.path.isfile(args.config):
raise SystemExit('Invalid config file') raise SystemExit('Invalid config file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -19,8 +19,8 @@ class DeepStreamOutput(nn.Layer):
boxes = x['bbox'] boxes = x['bbox']
x['bbox_num'] = x['bbox_num'].transpose([0, 2, 1]) x['bbox_num'] = x['bbox_num'].transpose([0, 2, 1])
scores = paddle.max(x['bbox_num'], 2, keepdim=True) scores = paddle.max(x['bbox_num'], 2, keepdim=True)
classes = paddle.cast(paddle.argmax(x['bbox_num'], 2, keepdim=True), dtype='float32') classes = paddle.argmax(x['bbox_num'], 2, keepdim=True)
return paddle.concat((boxes, scores, classes), axis=2) return boxes, scores, classes
def ppyoloe_export(FLAGS): def ppyoloe_export(FLAGS):
@@ -65,8 +65,8 @@ def main(FLAGS):
img_size = [cfg.eval_height, cfg.eval_width] img_size = [cfg.eval_height, cfg.eval_width]
onnx_input_im = {} onnx_input_im = {}
onnx_input_im['image'] = paddle.static.InputSpec(shape=[None, 3, *img_size], dtype='float32', name='image') onnx_input_im['image'] = paddle.static.InputSpec(shape=[FLAGS.batch, 3, *img_size], dtype='float32', name='image')
onnx_input_im['scale_factor'] = paddle.static.InputSpec(shape=[None, 2], dtype='float32', name='scale_factor') onnx_input_im['scale_factor'] = paddle.static.InputSpec(shape=[FLAGS.batch, 2], dtype='float32', name='scale_factor')
onnx_output_file = cfg.filename + '.onnx' onnx_output_file = cfg.filename + '.onnx'
print('\nExporting the model to ONNX\n') print('\nExporting the model to ONNX\n')
@@ -88,7 +88,15 @@ def parse_args():
parser.add_argument('--slim_config', default=None, type=str, help='Slim configuration file of slim method') parser.add_argument('--slim_config', default=None, type=str, help='Slim configuration file of slim method')
parser.add_argument('--opset', type=int, default=11, help='ONNX opset version') parser.add_argument('--opset', type=int, default=11, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('\nInvalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('\nCannot set dynamic batch-size and implicit batch-size at same time')
elif args.dynamic:
args.batch = None
return args return args

View File

@@ -19,7 +19,8 @@ class DeepStreamOutput(nn.Module):
boxes = x[:, :, :4] boxes = x[:, :, :4]
objectness = x[:, :, 4:5] objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores * objectness, classes.float()), dim=2) scores *= objectness
return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -63,21 +64,27 @@ def main(args):
if img_size == [640, 640] and args.p6: if img_size == [640, 640] and args.p6:
img_size = [1280] * 2 img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -98,9 +105,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=17, help='ONNX opset version') parser.add_argument('--opset', type=int, default=17, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -23,7 +23,8 @@ class DeepStreamOutput(nn.Module):
boxes = x[:, :, :4] boxes = x[:, :, :4]
objectness = x[:, :, 4:5] objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores * objectness, classes.float()), dim=2) scores *= objectness
return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -66,21 +67,27 @@ def main(args):
if img_size == [640, 640] and args.p6: if img_size == [640, 640] and args.p6:
img_size = [1280] * 2 img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -101,9 +108,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=13, help='ONNX opset version') parser.add_argument('--opset', type=int, default=13, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -19,7 +19,8 @@ class DeepStreamOutput(nn.Module):
boxes = x[:, :, :4] boxes = x[:, :, :4]
objectness = x[:, :, 4:5] objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores * objectness, classes.float()), dim=2) scores *= objectness
return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -67,21 +68,27 @@ def main(args):
if img_size == [640, 640] and args.p6: if img_size == [640, 640] and args.p6:
img_size = [1280] * 2 img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -102,9 +109,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=12, help='ONNX opset version') parser.add_argument('--opset', type=int, default=12, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -18,7 +18,7 @@ class DeepStreamOutput(nn.Module):
x = x.transpose(1, 2) x = x.transpose(1, 2)
boxes = x[:, :, :4] boxes = x[:, :, :4]
scores, classes = torch.max(x[:, :, 4:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 4:], 2, keepdim=True)
return torch.cat((boxes, scores, classes.float()), dim=2) return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -59,21 +59,27 @@ def main(args):
img_size = args.size * 2 if len(args.size) == 1 else args.size img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -93,9 +99,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=12, help='ONNX opset version') parser.add_argument('--opset', type=int, default=12, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -19,7 +19,7 @@ class DeepStreamOutput(nn.Module):
x = x.transpose(1, 2) x = x.transpose(1, 2)
boxes = x[:, :, :4] boxes = x[:, :, :4]
scores, classes = torch.max(x[:, :, 4:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 4:], 2, keepdim=True)
return torch.cat((boxes, scores, classes.float()), dim=2) return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -67,21 +67,27 @@ def main(args):
img_size = args.size * 2 if len(args.size) == 1 else args.size img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -101,9 +107,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=16, help='ONNX opset version') parser.add_argument('--opset', type=int, default=16, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -15,7 +15,7 @@ class DeepStreamOutput(nn.Module):
def forward(self, x): def forward(self, x):
boxes = x[0] boxes = x[0]
scores, classes = torch.max(x[1], 2, keepdim=True) scores, classes = torch.max(x[1], 2, keepdim=True)
return torch.cat((boxes, scores, classes.float()), dim=2) return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -46,21 +46,27 @@ def main(args):
img_size = args.size * 2 if len(args.size) == 1 else args.size img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -82,11 +88,14 @@ def parse_args():
parser.add_argument('--opset', type=int, default=14, help='ONNX opset version') parser.add_argument('--opset', type=int, default=14, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if args.model == '': if args.model == '':
raise SystemExit('Invalid model name') raise SystemExit('Invalid model name')
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -16,7 +16,8 @@ class DeepStreamOutput(nn.Module):
boxes = x[:, :, :4] boxes = x[:, :, :4]
objectness = x[:, :, 4:5] objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores * objectness, classes.float()), dim=2) scores *= objectness
return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -79,21 +80,27 @@ def main(args):
if img_size == [640, 640] and args.p6: if img_size == [640, 640] and args.p6:
img_size = [1280] * 2 img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('\nExporting the model to ONNX') print('\nExporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -115,9 +122,12 @@ def parse_args():
parser.add_argument('--opset', type=int, default=12, help='ONNX opset version') parser.add_argument('--opset', type=int, default=12, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args

View File

@@ -18,7 +18,8 @@ class DeepStreamOutput(nn.Module):
boxes = x[:, :, :4] boxes = x[:, :, :4]
objectness = x[:, :, 4:5] objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True) scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores * objectness, classes.float()), dim=2) scores *= objectness
return boxes, scores, classes
def suppress_warnings(): def suppress_warnings():
@@ -54,21 +55,27 @@ def main(args):
img_size = [exp.input_size[1], exp.input_size[0]] img_size = [exp.input_size[1], exp.input_size[0]]
onnx_input_im = torch.zeros(1, 3, *img_size).to(device) onnx_input_im = torch.zeros(args.batch, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx' onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
dynamic_axes = { dynamic_axes = {
'input': { 'input': {
0: 'batch' 0: 'batch'
}, },
'output': { 'boxes': {
0: 'batch'
},
'scores': {
0: 'batch'
},
'classes': {
0: 'batch' 0: 'batch'
} }
} }
print('Exporting the model to ONNX') print('Exporting the model to ONNX')
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset, torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], do_constant_folding=True, input_names=['input'], output_names=['boxes', 'scores', 'classes'],
dynamic_axes=dynamic_axes if args.dynamic else None) dynamic_axes=dynamic_axes if args.dynamic else None)
if args.simplify: if args.simplify:
@@ -88,11 +95,14 @@ def parse_args():
parser.add_argument('--opset', type=int, default=11, help='ONNX opset version') parser.add_argument('--opset', type=int, default=11, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model') parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size') parser.add_argument('--dynamic', action='store_true', help='Dynamic batch-size')
parser.add_argument('--batch', type=int, default=1, help='Implicit batch-size')
args = parser.parse_args() args = parser.parse_args()
if not os.path.isfile(args.weights): if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file') raise SystemExit('Invalid weights file')
if not os.path.isfile(args.exp): if not os.path.isfile(args.exp):
raise SystemExit('Invalid exp file') raise SystemExit('Invalid exp file')
if args.dynamic and args.batch > 1:
raise SystemExit('Cannot set dynamic batch-size and implicit batch-size at same time')
return args return args