Big update

This commit is contained in:
Marcos Luciano
2023-05-19 03:05:43 -03:00
parent 68f762d5bd
commit 07feae9509
86 changed files with 1523 additions and 5223 deletions

View File

@@ -2,39 +2,25 @@
NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 configuration for YOLO models NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 configuration for YOLO models
### **I will be back with updates soon, I'm full of work from my jobs right now. Sorry for the delay.** -------------------------------------
### **Big update on DeepStream-Yolo**
-------------------------------------
### Future updates ### Future updates
* Models benchmarks
* DeepStream tutorials * DeepStream tutorials
* Dynamic batch-size * Dynamic batch-size
* Segmentation model support * Updated INT8 calibration
* Classification model support * Support for segmentation models
* Support for classification models
### Improvements on this repository ### Improvements on this repository
* Darknet cfg params parser (no need to edit `nvdsparsebbox_Yolo.cpp` or other files)
* Support for `new_coords` and `scale_x_y` params
* Support for new models
* Support for new layers
* Support for new activations
* Support for convolutional groups
* Support for INT8 calibration * Support for INT8 calibration
* Support for non square models * Support for non square models
* New documentation for multiple models * **Support for Darknet YOLO models (YOLOv4, etc) using cfg and weights conversion with GPU post-processing**
* YOLOv5 >= 2.0 support * **Support for YOLO-NAS, PPYOLOE+, PPYOLOE, YOLOX, YOLOR, YOLOv8, YOLOv7, YOLOv6 and YOLOv5 using ONNX conversion with GPU post-processing**
* YOLOR support
* GPU YOLO Decoder [#138](https://github.com/marcoslucianops/DeepStream-Yolo/issues/138)
* PP-YOLOE support
* YOLOv7 support
* Optimized NMS [#142](https://github.com/marcoslucianops/DeepStream-Yolo/issues/142)
* Models benchmarks
* YOLOv8 support
* YOLOX support
* PP-YOLOE+ support
* YOLOv6 >= 2.0 support
* **ONNX model support with GPU post-processing**
* **YOLO-NAS support (ONNX)**
## ##
@@ -55,6 +41,7 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 configuration for YOLO mod
* [YOLOR usage](docs/YOLOR.md) * [YOLOR usage](docs/YOLOR.md)
* [YOLOX usage](docs/YOLOX.md) * [YOLOX usage](docs/YOLOX.md)
* [PP-YOLOE / PP-YOLOE+ usage](docs/PPYOLOE.md) * [PP-YOLOE / PP-YOLOE+ usage](docs/PPYOLOE.md)
* [YOLO-NAS usage](docs/YOLONAS.md)
* [Using your custom model](docs/customModels.md) * [Using your custom model](docs/customModels.md)
* [Multiple YOLO GIEs](docs/multipleGIEs.md) * [Multiple YOLO GIEs](docs/multipleGIEs.md)
@@ -133,13 +120,14 @@ NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 configuration for YOLO mod
* [Darknet YOLO](https://github.com/AlexeyAB/darknet) * [Darknet YOLO](https://github.com/AlexeyAB/darknet)
* [MobileNet-YOLO](https://github.com/dog-qiuqiu/MobileNet-Yolo) * [MobileNet-YOLO](https://github.com/dog-qiuqiu/MobileNet-Yolo)
* [YOLO-Fastest](https://github.com/dog-qiuqiu/Yolo-Fastest) * [YOLO-Fastest](https://github.com/dog-qiuqiu/Yolo-Fastest)
* [YOLOv5 >= 2.0](https://github.com/ultralytics/yolov5) * [YOLOv5](https://github.com/ultralytics/yolov5)
* [YOLOv6 >= 2.0](https://github.com/meituan/YOLOv6) * [YOLOv6](https://github.com/meituan/YOLOv6)
* [YOLOv7](https://github.com/WongKinYiu/yolov7) * [YOLOv7](https://github.com/WongKinYiu/yolov7)
* [YOLOv8](https://github.com/ultralytics/ultralytics) * [YOLOv8](https://github.com/ultralytics/ultralytics)
* [YOLOR](https://github.com/WongKinYiu/yolor) * [YOLOR](https://github.com/WongKinYiu/yolor)
* [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) * [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX)
* [PP-YOLOE / PP-YOLOE+](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe) * [PP-YOLOE / PP-YOLOE+](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe)
* [YOLO-NAS](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md)
## ##
@@ -161,7 +149,7 @@ sample = 1920x1080 video
- Eval - Eval
``` ```
nms-iou-threshold = 0.6 (Darknet and YOLOv8) / 0.65 (YOLOv5, YOLOv6, YOLOv7, YOLOR and YOLOX) / 0.7 (Paddle) nms-iou-threshold = 0.6 (Darknet) / 0.65 (YOLOv5, YOLOv6, YOLOv7, YOLOR and YOLOX) / 0.7 (Paddle, YOLO-NAS and YOLOv8)
pre-cluster-threshold = 0.001 pre-cluster-threshold = 0.001
topk = 300 topk = 300
``` ```
@@ -169,7 +157,7 @@ topk = 300
- Test - Test
``` ```
nms-iou-threshold = 0.45 / 0.7 (Paddle) nms-iou-threshold = 0.45
pre-cluster-threshold = 0.25 pre-cluster-threshold = 0.25
topk = 300 topk = 300
``` ```
@@ -182,30 +170,7 @@ topk = 300
| DeepStream | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS<br />(without display) | | DeepStream | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS<br />(without display) |
|:------------------:|:---------:|:----------:|:------------:|:-------:|:--------:|:--------------------------:| |:------------------:|:---------:|:----------:|:------------:|:-------:|:--------:|:--------------------------:|
| PP-YOLOE-x | FP16 | 640 | 0.506 | 0.681 | 0.551 | 116.54 | | Coming soon | FP16 | 640 | | | | |
| PP-YOLOE-l | FP16 | 640 | 0.498 | 0.674 | 0.545 | 187.93 |
| PP-YOLOE-m | FP16 | 640 | 0.476 | 0.646 | 0.522 | 257.42 |
| PP-YOLOE-s (400) | FP16 | 640 | 0.422 | 0.589 | 0.463 | 465.23 |
| YOLOv7-E6E | FP16 | 1280 | 0.476 | 0.648 | 0.521 | 47.82 |
| YOLOv7-D6 | FP16 | 1280 | 0.479 | 0.648 | 0.520 | 60.66 |
| YOLOv7-E6 | FP16 | 1280 | 0.471 | 0.640 | 0.516 | 73.05 |
| YOLOv7-W6 | FP16 | 1280 | 0.444 | 0.610 | 0.483 | 110.29 |
| YOLOv7-X* | FP16 | 640 | 0.496 | 0.679 | 0.536 | 162.31 |
| YOLOv7* | FP16 | 640 | 0.476 | 0.660 | 0.518 | 237.79 |
| YOLOv7-Tiny Leaky* | FP16 | 640 | 0.345 | 0.516 | 0.372 | 611.36 |
| YOLOv7-Tiny Leaky* | FP16 | 416 | 0.328 | 0.493 | 0.348 | 633.73 |
| YOLOv5x6 6.1 | FP16 | 1280 | 0.508 | 0.683 | 0.554 | 54.88 |
| YOLOv5l6 6.1 | FP16 | 1280 | 0.494 | 0.668 | 0.540 | 87.86 |
| YOLOv5m6 6.1 | FP16 | 1280 | 0.469 | 0.644 | 0.514 | 142.68 |
| YOLOv5s6 6.1 | FP16 | 1280 | 0.399 | 0.581 | 0.438 | 271.19 |
| YOLOv5n6 6.1 | FP16 | 1280 | 0.317 | 0.487 | 0.344 | 392.20 |
| YOLOv5x 6.1 | FP16 | 640 | 0.470 | 0.652 | 0.513 | 152.99 |
| YOLOv5l 6.1 | FP16 | 640 | 0.454 | 0.636 | 0.496 | 247.60 |
| YOLOv5m 6.1 | FP16 | 640 | 0.421 | 0.604 | 0.458 | 375.06 |
| YOLOv5s 6.1 | FP16 | 640 | 0.344 | 0.528 | 0.371 | 602.44 |
| YOLOv5n 6.1 | FP16 | 640 | 0.247 | 0.413 | 0.256 | 629.04 |
| YOLOv4** | FP16 | 608 | 0.497 | 0.739 | 0.549 | 206.23 |
| YOLOv4-Tiny | FP16 | 416 | 0.215 | 0.402 | 0.205 | 634.69 |
## ##
@@ -326,7 +291,7 @@ sudo prime-select nvidia
* Run * Run
``` ```
sudo sh NVIDIA-Linux-x86_64-510.47.03.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig sudo sh NVIDIA-Linux-x86_64-525.105.17.run --no-cc-version-check --silent --disable-nouveau --dkms --install-libglvnd --run-nvidia-xconfig
``` ```
</blockquote></details> </blockquote></details>
@@ -1005,7 +970,7 @@ config-file=config_infer_primary_yoloV2.txt
### NMS Configuration ### NMS Configuration
To change the `nms-iou-threshold`, `pre-cluster-threshold` and `topk` values, modify the config_infer file and regenerate the model engine file To change the `nms-iou-threshold`, `pre-cluster-threshold` and `topk` values, modify the config_infer file
``` ```
[class-attrs-all] [class-attrs-all]
@@ -1014,16 +979,14 @@ pre-cluster-threshold=0.25
topk=300 topk=300
``` ```
**NOTE**: It is important to regenerate the engine to get the max detection speed based on `pre-cluster-threshold` you set.
**NOTE**: Lower `topk` values will result in more performance.
**NOTE**: Make sure to set `cluster-mode=2` in the config_infer file. **NOTE**: Make sure to set `cluster-mode=2` in the config_infer file.
## ##
### INT8 calibration ### INT8 calibration
**NOTE**: For now, Only for Darknet YOLO model.
#### 1. Install OpenCV #### 1. Install OpenCV
``` ```
@@ -1123,7 +1086,7 @@ sudo apt-get install libopencv-dev
deepstream-app -c deepstream_app_config.txt deepstream-app -c deepstream_app_config.txt
``` ```
**NOTE**: NVIDIA recommends at least 500 images to get a good accuracy. On this example, I used 1000 images to get better accuracy (more images = more accuracy). Higher `INT8_CALIB_BATCH_SIZE` values will result in more accuracy and faster calibration speed. Set it according to you GPU memory. This process can take a long time. **NOTE**: NVIDIA recommends at least 500 images to get a good accuracy. On this example, I recommend to use 1000 images to get better accuracy (more images = more accuracy). Higher `INT8_CALIB_BATCH_SIZE` values will result in more accuracy and faster calibration speed. Set it according to you GPU memory. This process may take a long time.
## ##

View File

@@ -3,9 +3,8 @@ gpu-id=0
net-scale-factor=0.0173520735727919486 net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
model-color-format=0 model-color-format=0
custom-network-config=ppyoloe_crn_s_400e_coco.cfg onnx-file=ppyoloe_crn_s_400e_coco.onnx
model-file=ppyoloe_crn_s_400e_coco.wts model-engine-file=ppyoloe_crn_s_400e_coco.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -17,11 +16,10 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYoloE
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.7 nms-iou-threshold=0.45
pre-cluster-threshold=0.25 pre-cluster-threshold=0.25
topk=300 topk=300

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53
model-color-format=0
onnx-file=ppyoloe_crn_s_400e_coco.onnx
model-engine-file=ppyoloe_crn_s_400e_coco.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParse_PPYOLOE_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.7
pre-cluster-threshold=0.25
topk=300

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=ppyoloe_plus_crn_s_80e_coco.cfg onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx
model-file=ppyoloe_plus_crn_s_80e_coco.wts model-engine-file=ppyoloe_plus_crn_s_80e_coco.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -16,11 +15,10 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=0 maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYoloE
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.7 nms-iou-threshold=0.45
pre-cluster-threshold=0.25 pre-cluster-threshold=0.25
topk=300 topk=300

View File

@@ -1,24 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx
model-engine-file=ppyoloe_plus_crn_s_80e_coco.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParse_PPYOLOE_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.7
pre-cluster-threshold=0.25
topk=300

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=yolov5s.cfg onnx-file=yolov5s.onnx
model-file=yolov5s.wts model-engine-file=yolov5s.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolov5s.onnx
model-engine-file=yolov5s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParse_YOLO_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=yolov6s.cfg onnx-file=yolov6s.onnx
model-file=yolov6s.wts model-engine-file=yolov6s.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolov6s.onnx
model-engine-file=yolov6s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParse_YOLO_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=yolov7.cfg onnx-file=yolov7.onnx
model-file=yolov7.wts model-engine-file=yolov7.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=yolov8s.cfg onnx-file=yolov8s.onnx
model-file=yolov8s.wts model-engine-file=yolov8s.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolov8s.onnx
model-engine-file=yolov8s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParse_YOLOV8_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolo_nas_s.onnx
model-engine-file=yolo_nas_s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParse_YOLO_NAS_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -2,8 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
onnx-file=yolov7.onnx onnx-file=yolo_nas_s_coco.onnx
model-engine-file=yolov7.onnx_b1_gpu0_fp32.engine model-engine-file=yolo_nas_s_coco.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -15,8 +15,8 @@ process-mode=1
network-type=0 network-type=0
cluster-mode=2 cluster-mode=2
maintain-aspect-ratio=1 maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=0
parse-bbox-func-name=NvDsInferParse_YOLO_ONNX parse-bbox-func-name=NvDsInferParseYoloE
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all] [class-attrs-all]

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0.0039215697906911373 net-scale-factor=0.0039215697906911373
model-color-format=0 model-color-format=0
custom-network-config=yolor_csp.cfg onnx-file=yolor_csp.onnx
model-file=yolor_csp.wts model-engine-file=yolor_csp.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=1 symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -2,9 +2,8 @@
gpu-id=0 gpu-id=0
net-scale-factor=0 net-scale-factor=0
model-color-format=0 model-color-format=0
custom-network-config=yolox_s.cfg onnx-file=yolox_s.onnx
model-file=yolox_s.wts model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -19,7 +18,6 @@ maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -3,9 +3,8 @@ gpu-id=0
net-scale-factor=0.0173520735727919486 net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53 offsets=123.675;116.28;103.53
model-color-format=0 model-color-format=0
custom-network-config=yolox_s.cfg onnx-file=yolox_s.onnx
model-file=yolox_s.wts model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table #int8-calib-file=calib.table
labelfile-path=labels.txt labelfile-path=labels.txt
batch-size=1 batch-size=1
@@ -20,7 +19,6 @@ maintain-aspect-ratio=1
symmetric-padding=0 symmetric-padding=0
parse-bbox-func-name=NvDsInferParseYolo parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] [class-attrs-all]
nms-iou-threshold=0.45 nms-iou-threshold=0.45

View File

@@ -1,26 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0.0173520735727919486
offsets=123.675;116.28;103.53
model-color-format=0
onnx-file=yolox_s.onnx
model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=0
parse-bbox-func-name=NvDsInferParse_YOLOX_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -1,25 +0,0 @@
[property]
gpu-id=0
net-scale-factor=0
model-color-format=0
onnx-file=yolox_s.onnx
model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=0
parse-bbox-func-name=NvDsInferParse_YOLOX_ONNX
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

View File

@@ -1,5 +1,7 @@
# PP-YOLOE / PP-YOLOE+ usage # PP-YOLOE / PP-YOLOE+ usage
**NOTE**: You can use the release/2.6 branch of the PPYOLOE repo to convert all model versions.
* [Convert model](#convert-model) * [Convert model](#convert-model)
* [Compile the lib](#compile-the-lib) * [Compile the lib](#compile-the-lib)
* [Edit the config_infer_primary_ppyoloe_plus file](#edit-the-config_infer_primary_ppyoloe_plus-file) * [Edit the config_infer_primary_ppyoloe_plus file](#edit-the-config_infer_primary_ppyoloe_plus-file)
@@ -12,35 +14,36 @@
#### 1. Download the PaddleDetection repo and install the requirements #### 1. Download the PaddleDetection repo and install the requirements
https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/INSTALL.md https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/docs/tutorials/INSTALL.md
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_ppyoloe.py` file from `DeepStream-Yolo/utils` directory to the `PaddleDetection` folder. Copy the `export_ppyoloe.py` file from `DeepStream-Yolo/utils` directory to the `PaddleDetection` folder.
#### 3. Download the model #### 3. Download the model
Download the `pdparams` file from [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe) releases (example for PP-YOLOE+_s) Download the `pdparams` file from [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.6/configs/ppyoloe) releases (example for PP-YOLOE+_s)
``` ```
wget https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams wget https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`ppyoloe_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg` and `wts` files (example for PP-YOLOE+_s) Generate the ONNX model file (example for PP-YOLOE+_s)
``` ```
python3 gen_wts_ppyoloe.py -w ppyoloe_plus_crn_s_80e_coco.pdparams -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml pip3 install onnx onnxsim onnxruntime
python3 export_ppyoloe.py -w ppyoloe_plus_crn_s_80e_coco.pdparams -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml --simplify
``` ```
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -93,11 +96,13 @@ Edit the `config_infer_primary_ppyoloe_plus.txt` file according to your model (e
``` ```
[property] [property]
... ...
custom-network-config=ppyoloe_plus_crn_s_80e_coco.cfg onnx-file=ppyoloe_plus_crn_s_80e_coco.onnx
model-file=ppyoloe_plus_crn_s_80e_coco.wts model-engine-file=ppyoloe_plus_crn_s_80e_coco.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYoloE
...
``` ```
**NOTE**: If you use the **legacy** model, you should edit the `config_infer_primary_ppyoloe.txt` file. **NOTE**: If you use the **legacy** model, you should edit the `config_infer_primary_ppyoloe.txt` file.

171
docs/YOLONAS.md Normal file
View File

@@ -0,0 +1,171 @@
# YOLONAS usage
**NOTE**: The yaml file is not required.
* [Convert model](#convert-model)
* [Compile the lib](#compile-the-lib)
* [Edit the config_infer_primary_yolonas file](#edit-the-config_infer_primary_yolonas-file)
* [Edit the deepstream_app_config file](#edit-the-deepstream_app_config-file)
* [Testing the model](#testing-the-model)
##
### Convert model
#### 1. Download the YOLO-NAS repo and install the requirements
```
git clone https://github.com/Deci-AI/super-gradients.git
cd super-gradients
pip3 install -r requirements.txt
python3 setup.py install
pip3 install onnx onnxsim onnxruntime
```
**NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor
Copy the `export_yolonas.py` file from `DeepStream-Yolo/utils` directory to the `super-gradients` folder.
#### 3. Download the model
Download the `pth` file from [YOLO-NAS](https://sghub.deci.ai/) website (example for YOLO-NAS S)
```
wget https://sghub.deci.ai/models/yolo_nas_s_coco.pth
```
**NOTE**: You can use your custom model.
#### 4. Convert model
Generate the ONNX model file (example for YOLO-NAS S)
```
python3 export_yolonas.py -m yolo_nas_s -w yolo_nas_s_coco.pth --simplify
```
**NOTE**: Model names
```
-m yolo_nas_s
```
or
```
-m yolo_nas_m
```
or
```
-m yolo_nas_l
```
**NOTE**: To change the inference size (defaut: 640)
```
-s SIZE
--size SIZE
-s HEIGHT WIDTH
--size HEIGHT WIDTH
```
Example for 1280
```
-s 1280
```
or
```
-s 1280 1280
```
#### 5. Copy generated files
Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
##
### Compile the lib
Open the `DeepStream-Yolo` folder and compile the lib
* DeepStream 6.2 on x86 platform
```
CUDA_VER=11.8 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1.1 on x86 platform
```
CUDA_VER=11.7 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.1 on x86 platform
```
CUDA_VER=11.6 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 on x86 platform
```
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.2 / 6.1.1 / 6.1 on Jetson platform
```
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo
```
* DeepStream 6.0.1 / 6.0 on Jetson platform
```
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
```
##
### Edit the config_infer_primary_yolonas file
Edit the `config_infer_primary_yolonas.txt` file according to your model (example for YOLO-NAS S with 80 classes)
```
[property]
...
onnx-file=yolo_nas_s_coco.onnx
model-engine-file=yolo_nas_s_coco.onnx_b1_gpu0_fp32.engine
...
num-detected-classes=80
...
parse-bbox-func-name=NvDsInferParseYoloE
...
```
##
### Edit the deepstream_app_config file
```
...
[primary-gie]
...
config-file=config_infer_primary_yolonas.txt
```
##
### Testing the model
```
deepstream-app -c deepstream_app_config.txt
```
**NOTE**: For more information about custom models configuration (`batch-size`, `network-mode`, etc), please check the [`docs/customModels.md`](customModels.md) file.

View File

@@ -1,8 +1,8 @@
# YOLOR usage # YOLOR usage
**NOTE**: You need to use the main branch of the YOLOR repo to convert the model. **NOTE**: Select the correct branch of the YOLOR repo before the conversion.
**NOTE**: The cfg file is required. **NOTE**: The cfg file is required for the main branch.
* [Convert model](#convert-model) * [Convert model](#convert-model)
* [Compile the lib](#compile-the-lib) * [Compile the lib](#compile-the-lib)
@@ -20,31 +20,71 @@
git clone https://github.com/WongKinYiu/yolor.git git clone https://github.com/WongKinYiu/yolor.git
cd yolor cd yolor
pip3 install -r requirements.txt pip3 install -r requirements.txt
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yolor.py` file from `DeepStream-Yolo/utils` directory to the `yolor` folder. Copy the `export_yolor.py` file from `DeepStream-Yolo/utils` directory to the `yolor` folder.
#### 3. Download the model #### 3. Download the model
Download the `pt` file from [YOLOR](https://github.com/WongKinYiu/yolor) repo. Download the `pt` file from [YOLOR](https://github.com/WongKinYiu/yolor) repo.
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolor_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg` and `wts` files (example for YOLOR-CSP) Generate the ONNX model file
- Main branch
Example for YOLOR-CSP
``` ```
python3 gen_wts_yolor.py -w yolor_csp.pt -c cfg/yolor_csp.cfg python3 export_yolor.py -w yolor_csp.pt -c cfg/yolor_csp.cfg --simplify
```
- Paper branch
Example for YOLOR-P6
```
python3 export_yolor.py -w yolor-p6.pt --simplify
```
**NOTE**: To convert a P6 model
```
--p6
```
**NOTE**: To change the inference size (defaut: 640)
```
-s SIZE
--size SIZE
-s HEIGHT WIDTH
--size HEIGHT WIDTH
```
Example for 1280
```
-s 1280
```
or
```
-s 1280 1280
``` ```
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder Copy the generated ONNX model file to the `DeepStream-Yolo` folder
## ##
@@ -97,11 +137,13 @@ Edit the `config_infer_primary_yolor.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolor_csp.cfg onnx-file=yolor_csp.onnx
model-file=yolor_csp.wts model-engine-file=yolor_csp.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
## ##

View File

@@ -1,5 +1,7 @@
# YOLOX usage # YOLOX usage
**NOTE**: You can use the main branch of the YOLOX repo to convert all model versions.
**NOTE**: The yaml file is not required. **NOTE**: The yaml file is not required.
* [Convert model](#convert-model) * [Convert model](#convert-model)
@@ -18,13 +20,15 @@
git clone https://github.com/Megvii-BaseDetection/YOLOX.git git clone https://github.com/Megvii-BaseDetection/YOLOX.git
cd YOLOX cd YOLOX
pip3 install -r requirements.txt pip3 install -r requirements.txt
python3 setup.py develop
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yolox.py` file from `DeepStream-Yolo/utils` directory to the `YOLOX` folder. Copy the `export_yolox.py` file from `DeepStream-Yolo/utils` directory to the `YOLOX` folder.
#### 3. Download the model #### 3. Download the model
@@ -34,19 +38,19 @@ Download the `pth` file from [YOLOX](https://github.com/Megvii-BaseDetection/YOL
wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolox_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg` and `wts` files (example for YOLOX-s standard) Generate the ONNX model file (example for YOLOX-s standard)
``` ```
python3 gen_wts_yolox.py -w yolox_s.pth -e exps/default/yolox_s.py python3 export_yolox.py -w yolox_s.pth -c exps/default/yolox_s.py --simplify
``` ```
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -99,11 +103,13 @@ Edit the `config_infer_primary_yolox.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolox_s.cfg onnx-file=yolox_s.onnx
model-file=yolox_s.wts model-engine-file=yolox_s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
**NOTE**: If you use the **legacy** model, you should edit the `config_infer_primary_yolox_legacy.txt` file. **NOTE**: If you use the **legacy** model, you should edit the `config_infer_primary_yolox_legacy.txt` file.

View File

@@ -1,6 +1,6 @@
# YOLOv5 usage # YOLOv5 usage
**NOTE**: You can use the main branch of the YOLOv5 repo to convert all model versions. **NOTE**: You can use the master branch of the YOLOv5 repo to convert all model versions.
**NOTE**: The yaml file is not required. **NOTE**: The yaml file is not required.
@@ -20,30 +20,31 @@
git clone https://github.com/ultralytics/yolov5.git git clone https://github.com/ultralytics/yolov5.git
cd yolov5 cd yolov5
pip3 install -r requirements.txt pip3 install -r requirements.txt
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yoloV5.py` file from `DeepStream-Yolo/utils` directory to the `yolov5` folder. Copy the `export_yoloV5.py` file from `DeepStream-Yolo/utils` directory to the `yolov5` folder.
#### 3. Download the model #### 3. Download the model
Download the `pt` file from [YOLOv5](https://github.com/ultralytics/yolov5/releases/) releases (example for YOLOv5s 6.1) Download the `pt` file from [YOLOv5](https://github.com/ultralytics/yolov5/releases/) releases (example for YOLOv5s 7.0)
``` ```
wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolov5_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg` and `wts` files (example for YOLOv5s) Generate the ONNX model file (example for YOLOv5s)
``` ```
python3 gen_wts_yoloV5.py -w yolov5s.pt python3 export_yoloV5.py -w yolov5s.pt --simplify
``` ```
**NOTE**: To convert a P6 model **NOTE**: To convert a P6 model
@@ -75,7 +76,7 @@ or
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -128,11 +129,13 @@ Edit the `config_infer_primary_yoloV5.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolov5s.cfg onnx-file=yolov5s.onnx
model-file=yolov5s.wts model-engine-file=yolov5s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
## ##

View File

@@ -18,13 +18,14 @@
git clone https://github.com/meituan/YOLOv6.git git clone https://github.com/meituan/YOLOv6.git
cd YOLOv6 cd YOLOv6
pip3 install -r requirements.txt pip3 install -r requirements.txt
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yoloV6.py` file from `DeepStream-Yolo/utils` directory to the `YOLOv6` folder. Copy the `export_yoloV6.py` file from `DeepStream-Yolo/utils` directory to the `YOLOv6` folder.
#### 3. Download the model #### 3. Download the model
@@ -34,14 +35,14 @@ Download the `pt` file from [YOLOv6](https://github.com/meituan/YOLOv6/releases/
wget https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6s.pt wget https://github.com/meituan/YOLOv6/releases/download/0.3.0/yolov6s.pt
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolov6_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg` and `wts` files (example for YOLOv6-S 3.0) Generate the ONNX model file (example for YOLOv6-S 3.0)
``` ```
python3 gen_wts_yoloV6.py -w yolov6s.pt python3 export_yoloV6.py -w yolov6s.pt --simplify
``` ```
**NOTE**: To convert a P6 model **NOTE**: To convert a P6 model
@@ -73,7 +74,7 @@ or
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -126,11 +127,13 @@ Edit the `config_infer_primary_yoloV6.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolov6s.cfg onnx-file=yolov6s.onnx
model-file=yolov6s.wts model-engine-file=yolov6s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
## ##

View File

@@ -18,13 +18,14 @@
git clone https://github.com/WongKinYiu/yolov7.git git clone https://github.com/WongKinYiu/yolov7.git
cd yolov7 cd yolov7
pip3 install -r requirements.txt pip3 install -r requirements.txt
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yoloV7.py` file from `DeepStream-Yolo/utils` directory to the `yolov7` folder. Copy the `export_yoloV7.py` file from `DeepStream-Yolo/utils` directory to the `yolov7` folder.
#### 3. Download the model #### 3. Download the model
@@ -34,18 +35,18 @@ Download the `pt` file from [YOLOv7](https://github.com/WongKinYiu/yolov7/releas
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolov7_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Reparameterize your model #### 4. Reparameterize your model
[YOLOv7](https://github.com/WongKinYiu/yolov7/releases/) and it's variants can't be directly converted to engine file. Therefore, you will have to reparameterize your model using the code [here](https://github.com/WongKinYiu/yolov7/blob/main/tools/reparameterization.ipynb). Make sure to convert your checkpoints in yolov7 repository, and then save your reparmeterized checkpoints for conversion in the next step. [YOLOv7](https://github.com/WongKinYiu/yolov7/releases/) and its variants cannot be directly converted to engine file. Therefore, you will have to reparameterize your model using the code [here](https://github.com/WongKinYiu/yolov7/blob/main/tools/reparameterization.ipynb). Make sure to convert your custom checkpoints in yolov7 repository, and then save your reparmeterized checkpoints for conversion in the next step.
#### 5. Convert model #### 5. Convert model
Generate the `cfg` and `wts` files (example for YOLOv7) Generate the ONNX model file (example for YOLOv7)
``` ```
python3 gen_wts_yoloV7.py -w yolov7.pt python3 export_yoloV7.py -w yolov7.pt --simplify
``` ```
**NOTE**: To convert a P6 model **NOTE**: To convert a P6 model
@@ -77,7 +78,7 @@ or
#### 6. Copy generated files #### 6. Copy generated files
Copy the generated `cfg` and `wts` files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -130,11 +131,13 @@ Edit the `config_infer_primary_yoloV7.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolov7.cfg onnx-file=yolov7.onnx
model-file=yolov7.wts model-engine-file=yolov7.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
## ##

View File

@@ -18,13 +18,15 @@
git clone https://github.com/ultralytics/ultralytics.git git clone https://github.com/ultralytics/ultralytics.git
cd ultralytics cd ultralytics
pip3 install -r requirements.txt pip3 install -r requirements.txt
python3 setup.py install
pip3 install onnx onnxsim onnxruntime
``` ```
**NOTE**: It is recommended to use Python virtualenv. **NOTE**: It is recommended to use Python virtualenv.
#### 2. Copy conversor #### 2. Copy conversor
Copy the `gen_wts_yoloV8.py` file from `DeepStream-Yolo/utils` directory to the `ultralytics` folder. Copy the `export_yoloV8.py` file from `DeepStream-Yolo/utils` directory to the `ultralytics` folder.
#### 3. Download the model #### 3. Download the model
@@ -34,14 +36,14 @@ Download the `pt` file from [YOLOv8](https://github.com/ultralytics/assets/relea
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt
``` ```
**NOTE**: You can use your custom model, but it is important to keep the YOLO model reference (`yolov8_`) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly. **NOTE**: You can use your custom model.
#### 4. Convert model #### 4. Convert model
Generate the `cfg`, `wts` and `labels.txt` (if available) files (example for YOLOv8s) Generate the ONNX model file (example for YOLOv8s)
``` ```
python3 gen_wts_yoloV8.py -w yolov8s.pt python3 export_yoloV8.py -w yolov8s.pt --simplify
``` ```
**NOTE**: To change the inference size (defaut: 640) **NOTE**: To change the inference size (defaut: 640)
@@ -67,7 +69,7 @@ or
#### 5. Copy generated files #### 5. Copy generated files
Copy the generated `cfg`, `wts` and `labels.txt` (if generated), files to the `DeepStream-Yolo` folder. Copy the generated ONNX model file to the `DeepStream-Yolo` folder.
## ##
@@ -120,11 +122,13 @@ Edit the `config_infer_primary_yoloV8.txt` file according to your model (example
``` ```
[property] [property]
... ...
custom-network-config=yolov8s.cfg onnx-file=yolov8s.onnx
model-file=yolov8s.wts model-engine-file=yolov8s.onnx_b1_gpu0_fp32.engine
... ...
num-detected-classes=80 num-detected-classes=80
... ...
parse-bbox-func-name=NvDsInferParseYolo
...
``` ```
## ##

View File

@@ -19,9 +19,7 @@ cd DeepStream-Yolo
#### 2. Copy the class names file to DeepStream-Yolo folder and remane it to `labels.txt` #### 2. Copy the class names file to DeepStream-Yolo folder and remane it to `labels.txt`
#### 3. Copy the `cfg` and `weights`/`wts` files to DeepStream-Yolo folder #### 3. Copy the `onnx` or `cfg` and `weights` files to DeepStream-Yolo folder
**NOTE**: It is important to keep the YOLO model reference (`yolov4_`, `yolov5_`, `yolor_`, etc) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly.
## ##
@@ -189,24 +187,25 @@ To understand and edit `config_infer_primary.txt` file, read the [DeepStream Plu
model-color-format=0 model-color-format=0
``` ```
**NOTE**: Set it according to the number of channels in the `cfg` file (1=GRAYSCALE, 3=RGB). **NOTE**: Set it according to the number of channels in the `cfg` file (1=GRAYSCALE, 3=RGB for Darknet YOLO) or your model configuration (ONNX).
* custom-network-config * custom-network-config and model-file (Darknet YOLO)
* Example for custom YOLOv4 model * Example for custom YOLOv4 model
``` ```
custom-network-config=yolov4_custom.cfg custom-network-config=yolov4_custom.cfg
```
* model-file
* Example for custom YOLOv4 model
```
model-file=yolov4_custom.weights model-file=yolov4_custom.weights
``` ```
* onnx-file (ONNX)
* Example for custom YOLOv8 model
```
onnx-file=yolov8s_custom.onnx
```
* model-engine-file * model-engine-file
* Example for `batch-size=1` and `network-mode=2` * Example for `batch-size=1` and `network-mode=2`
@@ -233,7 +232,7 @@ To understand and edit `config_infer_primary.txt` file, read the [DeepStream Plu
model-engine-file=model_b2_gpu0_fp32.engine model-engine-file=model_b2_gpu0_fp32.engine
``` ```
**NOTE**: To change the generated engine filename, you need to edit and rebuild the `nvdsinfer_model_builder.cpp` file (`/opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_model_builder.cpp`, lines 825-827) **NOTE**: To change the generated engine filename (Darknet YOLO), you need to edit and rebuild the `nvdsinfer_model_builder.cpp` file (`/opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_model_builder.cpp`, lines 825-827)
``` ```
suggestedPathName = suggestedPathName =
@@ -260,7 +259,7 @@ To understand and edit `config_infer_primary.txt` file, read the [DeepStream Plu
num-detected-classes=80 num-detected-classes=80
``` ```
**NOTE**: Set it according to number of classes in `cfg` file. **NOTE**: Set it according to number of classes in `cfg` file (Darknet YOLO) or your model configuration (ONNX).
* interval * interval

View File

@@ -26,9 +26,7 @@ cd DeepStream-Yolo
#### 3. Copy the class names file to each GIE folder and remane it to `labels.txt` #### 3. Copy the class names file to each GIE folder and remane it to `labels.txt`
#### 4. Copy the `cfg` and `weights`/`wts` files to each GIE folder #### 4. Copy the `onnx` or `cfg` and `weights` files to each GIE folder
**NOTE**: It is important to keep the YOLO model reference (`yolov4_`, `yolov5_`, `yolor_`, etc) in you `cfg` and `weights`/`wts` filenames to generate the engine correctly.
## ##
@@ -92,22 +90,36 @@ const char* YOLOLAYER_PLUGIN_VERSION {"2"};
### Edit the config_infer_primary files ### Edit the config_infer_primary files
**NOTE**: Edit the files according to the model you will use (YOLOv4, YOLOv5, YOLOR, etc). **NOTE**: Edit the files according to the model you will use (YOLOv8, YOLOv5, YOLOv4, etc).
**NOTE**: Do it for each GIE folder. **NOTE**: Do it for each GIE folder.
* Edit the path of the `cfg` file * Edit the path of the `cfg` file
Example for gie1 Example for gie1 (Darknet YOLO)
``` ```
custom-network-config=gie1/yolo.cfg custom-network-config=gie1/yolo.cfg
model-file=yolo.weights
``` ```
Example for gie2 Example for gie2 (Darknet YOLO)
``` ```
custom-network-config=gie2/yolo.cfg custom-network-config=gie2/yolo.cfg
model-file=yolo.weights
```
Example for gie1 (ONNX)
```
onnx-file=yolo.onnx
```
Example for gie2 (ONNX)
```
onnx-file=yolo.onnx
``` ```
* Edit the gie-unique-id * Edit the gie-unique-id

View File

@@ -10,7 +10,7 @@
nvinfer1::ITensor* nvinfer1::ITensor*
batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, float eps, nvinfer1::ITensor* input, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network) nvinfer1::INetworkDefinition* network)
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
@@ -26,7 +26,6 @@ batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vec
std::vector<float> bnRunningMean; std::vector<float> bnRunningMean;
std::vector<float> bnRunningVar; std::vector<float> bnRunningVar;
if (weightsType == "weights") {
for (int i = 0; i < filters; ++i) { for (int i = 0; i < filters; ++i) {
bnBiases.push_back(weights[weightPtr]); bnBiases.push_back(weights[weightPtr]);
++weightPtr; ++weightPtr;
@@ -43,25 +42,6 @@ batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vec
bnRunningVar.push_back(sqrt(weights[weightPtr] + 1.0e-5)); bnRunningVar.push_back(sqrt(weights[weightPtr] + 1.0e-5));
++weightPtr; ++weightPtr;
} }
}
else {
for (int i = 0; i < filters; ++i) {
bnWeights.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnBiases.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnRunningMean.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnRunningVar.push_back(sqrt(weights[weightPtr] + eps));
++weightPtr;
}
}
int size = filters; int size = filters;
nvinfer1::Weights shift {nvinfer1::DataType::kFLOAT, nullptr, size}; nvinfer1::Weights shift {nvinfer1::DataType::kFLOAT, nullptr, size};

View File

@@ -14,7 +14,7 @@
#include "activation_layer.h" #include "activation_layer.h"
nvinfer1::ITensor* batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, nvinfer1::ITensor* batchnormLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, float eps, nvinfer1::ITensor* input, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network); nvinfer1::INetworkDefinition* network);
#endif #endif

View File

@@ -1,82 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "c2f_layer.h"
#include <cassert>
#include "convolutional_layer.h"
nvinfer1::ITensor*
c2fLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, float eps, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "c2f");
assert(block.find("n") != block.end());
assert(block.find("shortcut") != block.end());
assert(block.find("filters") != block.end());
int n = std::stoi(block.at("n"));
bool shortcut = (block.at("shortcut") == "1");
int filters = std::stoi(block.at("filters"));
nvinfer1::Dims inputDims = input->getDimensions();
nvinfer1::ISliceLayer* sliceLt = network->addSlice(*input,nvinfer1::Dims{3, {0, 0, 0}},
nvinfer1::Dims{3, {inputDims.d[0] / 2, inputDims.d[1], inputDims.d[2]}}, nvinfer1::Dims{3, {1, 1, 1}});
assert(sliceLt != nullptr);
std::string sliceLtLayerName = "slice_lt_" + std::to_string(layerIdx);
sliceLt->setName(sliceLtLayerName.c_str());
nvinfer1::ITensor* lt = sliceLt->getOutput(0);
nvinfer1::ISliceLayer* sliceRb = network->addSlice(*input,nvinfer1::Dims{3, {inputDims.d[0] / 2, 0, 0}},
nvinfer1::Dims{3, {inputDims.d[0] / 2, inputDims.d[1], inputDims.d[2]}}, nvinfer1::Dims{3, {1, 1, 1}});
assert(sliceRb != nullptr);
std::string sliceRbLayerName = "slice_rb_" + std::to_string(layerIdx);
sliceRb->setName(sliceRbLayerName.c_str());
nvinfer1::ITensor* rb = sliceRb->getOutput(0);
std::vector<nvinfer1::ITensor*> concatInputs;
concatInputs.push_back(lt);
concatInputs.push_back(rb);
output = rb;
for (int i = 0; i < n; ++i) {
std::string cv1MlayerName = "c2f_1_" + std::to_string(i + 1) + "_";
nvinfer1::ITensor* cv1M = convolutionalLayer(layerIdx, block, weights, trtWeights, weightPtr, weightsType, filters, eps,
output, network, cv1MlayerName);
assert(cv1M != nullptr);
std::string cv2MlayerName = "c2f_2_" + std::to_string(i + 1) + "_";
nvinfer1::ITensor* cv2M = convolutionalLayer(layerIdx, block, weights, trtWeights, weightPtr, weightsType, filters, eps,
cv1M, network, cv2MlayerName);
assert(cv2M != nullptr);
if (shortcut) {
nvinfer1::IElementWiseLayer* ew = network->addElementWise(*output, *cv2M, nvinfer1::ElementWiseOperation::kSUM);
assert(ew != nullptr);
std::string ewLayerName = "shortcut_c2f_" + std::to_string(i + 1) + "_" + std::to_string(layerIdx);
ew->setName(ewLayerName.c_str());
output = ew->getOutput(0);
concatInputs.push_back(output);
}
else {
output = cv2M;
concatInputs.push_back(output);
}
}
nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size());
assert(concat != nullptr);
std::string concatLayerName = "route_" + std::to_string(layerIdx);
concat->setName(concatLayerName.c_str());
concat->setAxis(0);
output = concat->getOutput(0);
return output;
}

View File

@@ -1,18 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __C2F_LAYER_H__
#define __C2F_LAYER_H__
#include <map>
#include <vector>
#include "NvInfer.h"
nvinfer1::ITensor* c2fLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, float eps, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -1,29 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "cls_layer.h"
#include <cassert>
nvinfer1::ITensor*
clsLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "cls");
nvinfer1::IShuffleLayer* shuffle = network->addShuffle(*input);
assert(shuffle != nullptr);
std::string shuffleLayerName = "shuffle_" + std::to_string(layerIdx);
shuffle->setName(shuffleLayerName.c_str());
nvinfer1::Permutation permutation;
permutation.order[0] = 1;
permutation.order[1] = 0;
shuffle->setFirstTranspose(permutation);
output = shuffle->getOutput(0);
return output;
}

View File

@@ -1,16 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __CLS_LAYER_H__
#define __CLS_LAYER_H__
#include <map>
#include "NvInfer.h"
nvinfer1::ITensor* clsLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -10,8 +10,8 @@
nvinfer1::ITensor* nvinfer1::ITensor*
convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, int& inputChannels, float eps, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, int& inputChannels, nvinfer1::ITensor* input,
nvinfer1::ITensor* input, nvinfer1::INetworkDefinition* network, std::string layerName) nvinfer1::INetworkDefinition* network, std::string layerName)
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
@@ -58,7 +58,6 @@ convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std:
nvinfer1::Weights convWt {nvinfer1::DataType::kFLOAT, nullptr, size}; nvinfer1::Weights convWt {nvinfer1::DataType::kFLOAT, nullptr, size};
nvinfer1::Weights convBias {nvinfer1::DataType::kFLOAT, nullptr, bias}; nvinfer1::Weights convBias {nvinfer1::DataType::kFLOAT, nullptr, bias};
if (weightsType == "weights") {
if (batchNormalize == false) { if (batchNormalize == false) {
float* val; float* val;
if (bias != 0) { if (bias != 0) {
@@ -114,62 +113,6 @@ convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std:
if (bias != 0) if (bias != 0)
trtWeights.push_back(convBias); trtWeights.push_back(convBias);
} }
}
else {
if (batchNormalize == false) {
float* val = new float[size];
for (int i = 0; i < size; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convWt.values = val;
trtWeights.push_back(convWt);
if (bias != 0) {
val = new float[filters];
for (int i = 0; i < filters; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convBias.values = val;
trtWeights.push_back(convBias);
}
}
else {
float* val = new float[size];
for (int i = 0; i < size; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convWt.values = val;
if (bias != 0) {
val = new float[filters];
for (int i = 0; i < filters; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convBias.values = val;
}
for (int i = 0; i < filters; ++i) {
bnWeights.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnBiases.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnRunningMean.push_back(weights[weightPtr]);
++weightPtr;
}
for (int i = 0; i < filters; ++i) {
bnRunningVar.push_back(sqrt(weights[weightPtr] + eps));
++weightPtr;
}
trtWeights.push_back(convWt);
if (bias != 0)
trtWeights.push_back(convBias);
}
}
nvinfer1::IConvolutionLayer* conv = network->addConvolutionNd(*input, filters, nvinfer1::Dims{2, {kernelSize, kernelSize}}, nvinfer1::IConvolutionLayer* conv = network->addConvolutionNd(*input, filters, nvinfer1::Dims{2, {kernelSize, kernelSize}},
convWt, convBias); convWt, convBias);

View File

@@ -14,7 +14,7 @@
#include "activation_layer.h" #include "activation_layer.h"
nvinfer1::ITensor* convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, nvinfer1::ITensor* convolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, int& inputChannels, float eps, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, int& inputChannels, nvinfer1::ITensor* input,
nvinfer1::ITensor* input, nvinfer1::INetworkDefinition* network, std::string layerName = ""); nvinfer1::INetworkDefinition* network, std::string layerName = "");
#endif #endif

View File

@@ -9,8 +9,8 @@
nvinfer1::ITensor* nvinfer1::ITensor*
deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, int& inputChannels, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, int& inputChannels, nvinfer1::ITensor* input,
nvinfer1::ITensor* input, nvinfer1::INetworkDefinition* network, std::string layerName) nvinfer1::INetworkDefinition* network, std::string layerName)
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
@@ -47,7 +47,6 @@ deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, st
nvinfer1::Weights convWt {nvinfer1::DataType::kFLOAT, nullptr, size}; nvinfer1::Weights convWt {nvinfer1::DataType::kFLOAT, nullptr, size};
nvinfer1::Weights convBias {nvinfer1::DataType::kFLOAT, nullptr, bias}; nvinfer1::Weights convBias {nvinfer1::DataType::kFLOAT, nullptr, bias};
if (weightsType == "weights") {
float* val; float* val;
if (bias != 0) { if (bias != 0) {
val = new float[filters]; val = new float[filters];
@@ -65,25 +64,6 @@ deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, st
} }
convWt.values = val; convWt.values = val;
trtWeights.push_back(convWt); trtWeights.push_back(convWt);
}
else {
float* val = new float[size];
for (int i = 0; i < size; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convWt.values = val;
trtWeights.push_back(convWt);
if (bias != 0) {
val = new float[filters];
for (int i = 0; i < filters; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
convBias.values = val;
trtWeights.push_back(convBias);
}
}
nvinfer1::IDeconvolutionLayer* conv = network->addDeconvolutionNd(*input, filters, nvinfer1::IDeconvolutionLayer* conv = network->addDeconvolutionNd(*input, filters,
nvinfer1::Dims{2, {kernelSize, kernelSize}}, convWt, convBias); nvinfer1::Dims{2, {kernelSize, kernelSize}}, convWt, convBias);

View File

@@ -12,7 +12,7 @@
#include "NvInfer.h" #include "NvInfer.h"
nvinfer1::ITensor* deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights, nvinfer1::ITensor* deconvolutionalLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, std::string weightsType, int& inputChannels, std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, int& inputChannels, nvinfer1::ITensor* input,
nvinfer1::ITensor* input, nvinfer1::INetworkDefinition* network, std::string layerName = ""); nvinfer1::INetworkDefinition* network, std::string layerName = "");
#endif #endif

View File

@@ -1,196 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "detect_v8_layer.h"
#include <cassert>
nvinfer1::ITensor*
detectV8Layer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "detect_v8");
assert(block.find("num") != block.end());
assert(block.find("classes") != block.end());
int num = std::stoi(block.at("num"));
int classes = std::stoi(block.at("classes"));
int reg_max = num / 4;
nvinfer1::Dims inputDims = input->getDimensions();
nvinfer1::ISliceLayer* sliceBox = network->addSlice(*input, nvinfer1::Dims{2, {0, 0}},
nvinfer1::Dims{2, {num, inputDims.d[1]}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceBox != nullptr);
std::string sliceBoxLayerName = "slice_box_" + std::to_string(layerIdx);
sliceBox->setName(sliceBoxLayerName.c_str());
nvinfer1::ITensor* box = sliceBox->getOutput(0);
nvinfer1::ISliceLayer* sliceCls = network->addSlice(*input, nvinfer1::Dims{2, {num, 0}},
nvinfer1::Dims{2, {classes, inputDims.d[1]}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceCls != nullptr);
std::string sliceClsLayerName = "slice_cls_" + std::to_string(layerIdx);
sliceCls->setName(sliceClsLayerName.c_str());
nvinfer1::ITensor* cls = sliceCls->getOutput(0);
nvinfer1::IShuffleLayer* shuffle1Box = network->addShuffle(*box);
assert(shuffle1Box != nullptr);
std::string shuffle1BoxLayerName = "shuffle1_box_" + std::to_string(layerIdx);
shuffle1Box->setName(shuffle1BoxLayerName.c_str());
nvinfer1::Dims reshape1Dims = {3, {4, reg_max, inputDims.d[1]}};
shuffle1Box->setReshapeDimensions(reshape1Dims);
nvinfer1::Permutation permutation1Box;
permutation1Box.order[0] = 1;
permutation1Box.order[1] = 0;
permutation1Box.order[2] = 2;
shuffle1Box->setSecondTranspose(permutation1Box);
box = shuffle1Box->getOutput(0);
nvinfer1::ISoftMaxLayer* softmax = network->addSoftMax(*box);
assert(softmax != nullptr);
std::string softmaxLayerName = "softmax_box_" + std::to_string(layerIdx);
softmax->setName(softmaxLayerName.c_str());
softmax->setAxes(1 << 0);
box = softmax->getOutput(0);
nvinfer1::Weights dflWt {nvinfer1::DataType::kFLOAT, nullptr, reg_max};
float* val = new float[reg_max];
for (int i = 0; i < reg_max; ++i) {
val[i] = i;
}
dflWt.values = val;
nvinfer1::IConvolutionLayer* conv = network->addConvolutionNd(*box, 1, nvinfer1::Dims{2, {1, 1}}, dflWt,
nvinfer1::Weights{});
assert(conv != nullptr);
std::string convLayerName = "conv_box_" + std::to_string(layerIdx);
conv->setName(convLayerName.c_str());
conv->setStrideNd(nvinfer1::Dims{2, {1, 1}});
conv->setPaddingNd(nvinfer1::Dims{2, {0, 0}});
box = conv->getOutput(0);
nvinfer1::IShuffleLayer* shuffle2Box = network->addShuffle(*box);
assert(shuffle2Box != nullptr);
std::string shuffle2BoxLayerName = "shuffle2_box_" + std::to_string(layerIdx);
shuffle2Box->setName(shuffle2BoxLayerName.c_str());
nvinfer1::Dims reshape2Dims = {2, {4, inputDims.d[1]}};
shuffle2Box->setReshapeDimensions(reshape2Dims);
box = shuffle2Box->getOutput(0);
nvinfer1::Dims shuffle2BoxDims = box->getDimensions();
nvinfer1::ISliceLayer* sliceLtBox = network->addSlice(*box, nvinfer1::Dims{2, {0, 0}},
nvinfer1::Dims{2, {2, shuffle2BoxDims.d[1]}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceLtBox != nullptr);
std::string sliceLtBoxLayerName = "slice_lt_box_" + std::to_string(layerIdx);
sliceLtBox->setName(sliceLtBoxLayerName.c_str());
nvinfer1::ITensor* lt = sliceLtBox->getOutput(0);
nvinfer1::ISliceLayer* sliceRbBox = network->addSlice(*box, nvinfer1::Dims{2, {2, 0}},
nvinfer1::Dims{2, {2, shuffle2BoxDims.d[1]}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceRbBox != nullptr);
std::string sliceRbBoxLayerName = "slice_rb_box_" + std::to_string(layerIdx);
sliceRbBox->setName(sliceRbBoxLayerName.c_str());
nvinfer1::ITensor* rb = sliceRbBox->getOutput(0);
int channels = 2 * shuffle2BoxDims.d[1];
nvinfer1::Weights anchorPointsWt {nvinfer1::DataType::kFLOAT, nullptr, channels};
val = new float[channels];
for (int i = 0; i < channels; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
anchorPointsWt.values = val;
trtWeights.push_back(anchorPointsWt);
nvinfer1::IConstantLayer* anchorPoints = network->addConstant(nvinfer1::Dims{2, {2, shuffle2BoxDims.d[1]}},
anchorPointsWt);
assert(anchorPoints != nullptr);
std::string anchorPointsLayerName = "anchor_points_" + std::to_string(layerIdx);
anchorPoints->setName(anchorPointsLayerName.c_str());
nvinfer1::ITensor* anchorPointsTensor = anchorPoints->getOutput(0);
nvinfer1::IElementWiseLayer* x1y1 = network->addElementWise(*anchorPointsTensor, *lt,
nvinfer1::ElementWiseOperation::kSUB);
assert(x1y1 != nullptr);
std::string x1y1LayerName = "x1y1_" + std::to_string(layerIdx);
x1y1->setName(x1y1LayerName.c_str());
nvinfer1::ITensor* x1y1Tensor = x1y1->getOutput(0);
nvinfer1::IElementWiseLayer* x2y2 = network->addElementWise(*rb, *anchorPointsTensor,
nvinfer1::ElementWiseOperation::kSUM);
assert(x2y2 != nullptr);
std::string x2y2LayerName = "x2y2_" + std::to_string(layerIdx);
x2y2->setName(x2y2LayerName.c_str());
nvinfer1::ITensor* x2y2Tensor = x2y2->getOutput(0);
std::vector<nvinfer1::ITensor*> concatBoxInputs;
concatBoxInputs.push_back(x1y1Tensor);
concatBoxInputs.push_back(x2y2Tensor);
nvinfer1::IConcatenationLayer* concatBox = network->addConcatenation(concatBoxInputs.data(), concatBoxInputs.size());
assert(concatBox != nullptr);
std::string concatBoxLayerName = "concat_box_" + std::to_string(layerIdx);
concatBox->setName(concatBoxLayerName.c_str());
concatBox->setAxis(0);
box = concatBox->getOutput(0);
channels = shuffle2BoxDims.d[1];
nvinfer1::Weights stridePointsWt {nvinfer1::DataType::kFLOAT, nullptr, channels};
val = new float[channels];
for (int i = 0; i < channels; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
stridePointsWt.values = val;
trtWeights.push_back(stridePointsWt);
nvinfer1::IConstantLayer* stridePoints = network->addConstant(nvinfer1::Dims{2, {1, shuffle2BoxDims.d[1]}},
stridePointsWt);
assert(stridePoints != nullptr);
std::string stridePointsLayerName = "stride_points_" + std::to_string(layerIdx);
stridePoints->setName(stridePointsLayerName.c_str());
nvinfer1::ITensor* stridePointsTensor = stridePoints->getOutput(0);
nvinfer1::IElementWiseLayer* pred = network->addElementWise(*box, *stridePointsTensor,
nvinfer1::ElementWiseOperation::kPROD);
assert(pred != nullptr);
std::string predLayerName = "pred_" + std::to_string(layerIdx);
pred->setName(predLayerName.c_str());
box = pred->getOutput(0);
nvinfer1::IActivationLayer* sigmoid = network->addActivation(*cls, nvinfer1::ActivationType::kSIGMOID);
assert(sigmoid != nullptr);
std::string sigmoidLayerName = "sigmoid_cls_" + std::to_string(layerIdx);
sigmoid->setName(sigmoidLayerName.c_str());
cls = sigmoid->getOutput(0);
std::vector<nvinfer1::ITensor*> concatInputs;
concatInputs.push_back(box);
concatInputs.push_back(cls);
nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size());
assert(concat != nullptr);
std::string concatLayerName = "concat_" + std::to_string(layerIdx);
concat->setName(concatLayerName.c_str());
concat->setAxis(0);
output = concat->getOutput(0);
nvinfer1::IShuffleLayer* shuffle = network->addShuffle(*output);
assert(shuffle != nullptr);
std::string shuffleLayerName = "shuffle_" + std::to_string(layerIdx);
shuffle->setName(shuffleLayerName.c_str());
nvinfer1::Permutation permutation;
permutation.order[0] = 1;
permutation.order[1] = 0;
shuffle->setFirstTranspose(permutation);
output = shuffle->getOutput(0);
return output;
}

View File

@@ -1,18 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __DETECT_V8_LAYER_H__
#define __DETECT_V8_LAYER_H__
#include <map>
#include <vector>
#include "NvInfer.h"
nvinfer1::ITensor* detectV8Layer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -13,7 +13,7 @@ implicitLayer(int layerIdx, std::map<std::string, std::string>& block, std::vect
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
assert(block.at("type") == "implicit_add" || block.at("type") == "implicit_mul"); assert(block.at("type") == "implicit" || block.at("type") == "implicit_add" || block.at("type") == "implicit_mul");
assert(block.find("filters") != block.end()); assert(block.find("filters") != block.end());
int filters = std::stoi(block.at("filters")); int filters = std::stoi(block.at("filters"));

View File

@@ -14,9 +14,10 @@ poolingLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
assert(block.at("type") == "maxpool" || block.at("type") == "avgpool"); assert(block.at("type") == "max" || block.at("type") == "maxpool" || block.at("type") == "avg" ||
block.at("type") == "avgpool");
if (block.at("type") == "maxpool") { if (block.at("type") == "max" || block.at("type") == "maxpool") {
assert(block.find("size") != block.end()); assert(block.find("size") != block.end());
assert(block.find("stride") != block.end()); assert(block.find("stride") != block.end());
@@ -36,7 +37,7 @@ poolingLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::
} }
output = maxpool->getOutput(0); output = maxpool->getOutput(0);
} }
else if (block.at("type") == "avgpool") { else if (block.at("type") == "avg" || block.at("type") == "avgpool") {
nvinfer1::Dims inputDims = input->getDimensions(); nvinfer1::Dims inputDims = input->getDimensions();
nvinfer1::IPoolingLayer* avgpool = network->addPoolingNd(*input, nvinfer1::PoolingType::kAVERAGE, nvinfer1::IPoolingLayer* avgpool = network->addPoolingNd(*input, nvinfer1::PoolingType::kAVERAGE,
nvinfer1::Dims{2, {inputDims.d[1], inputDims.d[2]}}); nvinfer1::Dims{2, {inputDims.d[1], inputDims.d[2]}});

View File

@@ -1,54 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "reduce_layer.h"
nvinfer1::ITensor*
reduceLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "reduce");
assert(block.find("mode") != block.end());
assert(block.find("axes") != block.end());
std::string mode = block.at("mode");
nvinfer1::ReduceOperation operation;
if (mode == "mean")
operation = nvinfer1::ReduceOperation::kAVG;
std::string strAxes = block.at("axes");
std::vector<int32_t> axes;
size_t lastPos = 0, pos = 0;
while ((pos = strAxes.find(',', lastPos)) != std::string::npos) {
int vL = std::stoi(trim(strAxes.substr(lastPos, pos - lastPos)));
axes.push_back(vL);
lastPos = pos + 1;
}
if (lastPos < strAxes.length()) {
std::string lastV = trim(strAxes.substr(lastPos));
if (!lastV.empty())
axes.push_back(std::stoi(lastV));
}
assert(!axes.empty());
uint32_t axisMask = 0;
for (int axis : axes)
axisMask |= 1 << axis;
bool keepDims = false;
if (block.find("keep") != block.end())
keepDims = std::stoi(block.at("keep")) == 1 ? true : false;
nvinfer1::IReduceLayer* reduce = network->addReduce(*input, operation, axisMask, keepDims);
assert(reduce != nullptr);
std::string reduceLayerName = "reduce_" + std::to_string(layerIdx);
reduce->setName(reduceLayerName.c_str());
output = reduce->getOutput(0);
return output;
}

View File

@@ -1,14 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __REDUCE_LAYER_H__
#define __REDUCE_LAYER_H__
#include "../utils.h"
nvinfer1::ITensor* reduceLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -1,109 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "reg_layer.h"
#include <cassert>
nvinfer1::ITensor*
regLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "reg");
nvinfer1::IShuffleLayer* shuffle = network->addShuffle(*input);
assert(shuffle != nullptr);
std::string shuffleLayerName = "shuffle_" + std::to_string(layerIdx);
shuffle->setName(shuffleLayerName.c_str());
nvinfer1::Permutation permutation;
permutation.order[0] = 1;
permutation.order[1] = 0;
shuffle->setFirstTranspose(permutation);
output = shuffle->getOutput(0);
nvinfer1::Dims shuffleDims = output->getDimensions();
nvinfer1::ISliceLayer* sliceLt = network->addSlice(*output, nvinfer1::Dims{2, {0, 0}},
nvinfer1::Dims{2, {shuffleDims.d[0], 2}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceLt != nullptr);
std::string sliceLtLayerName = "slice_lt_" + std::to_string(layerIdx);
sliceLt->setName(sliceLtLayerName.c_str());
nvinfer1::ITensor* lt = sliceLt->getOutput(0);
nvinfer1::ISliceLayer* sliceRb = network->addSlice(*output, nvinfer1::Dims{2, {0, 2}},
nvinfer1::Dims{2, {shuffleDims.d[0], 2}}, nvinfer1::Dims{2, {1, 1}});
assert(sliceRb != nullptr);
std::string sliceRbLayerName = "slice_rb_" + std::to_string(layerIdx);
sliceRb->setName(sliceRbLayerName.c_str());
nvinfer1::ITensor* rb = sliceRb->getOutput(0);
int channels = shuffleDims.d[0] * 2;
nvinfer1::Weights anchorPointsWt {nvinfer1::DataType::kFLOAT, nullptr, channels};
float* val = new float[channels];
for (int i = 0; i < channels; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
anchorPointsWt.values = val;
trtWeights.push_back(anchorPointsWt);
nvinfer1::IConstantLayer* anchorPoints = network->addConstant(nvinfer1::Dims{2, {shuffleDims.d[0], 2}}, anchorPointsWt);
assert(anchorPoints != nullptr);
std::string anchorPointsLayerName = "anchor_points_" + std::to_string(layerIdx);
anchorPoints->setName(anchorPointsLayerName.c_str());
nvinfer1::ITensor* anchorPointsTensor = anchorPoints->getOutput(0);
nvinfer1::IElementWiseLayer* x1y1 = network->addElementWise(*anchorPointsTensor, *lt,
nvinfer1::ElementWiseOperation::kSUB);
assert(x1y1 != nullptr);
std::string x1y1LayerName = "x1y1_" + std::to_string(layerIdx);
x1y1->setName(x1y1LayerName.c_str());
nvinfer1::ITensor* x1y1Tensor = x1y1->getOutput(0);
nvinfer1::IElementWiseLayer* x2y2 = network->addElementWise(*rb, *anchorPointsTensor,
nvinfer1::ElementWiseOperation::kSUM);
assert(x2y2 != nullptr);
std::string x2y2LayerName = "x2y2_" + std::to_string(layerIdx);
x2y2->setName(x2y2LayerName.c_str());
nvinfer1::ITensor* x2y2Tensor = x2y2->getOutput(0);
std::vector<nvinfer1::ITensor*> concatInputs;
concatInputs.push_back(x1y1Tensor);
concatInputs.push_back(x2y2Tensor);
nvinfer1::IConcatenationLayer* concat = network->addConcatenation(concatInputs.data(), concatInputs.size());
assert(concat != nullptr);
std::string concatLayerName = "concat_" + std::to_string(layerIdx);
concat->setName(concatLayerName.c_str());
concat->setAxis(1);
output = concat->getOutput(0);
channels = shuffleDims.d[0];
nvinfer1::Weights stridePointsWt {nvinfer1::DataType::kFLOAT, nullptr, channels};
val = new float[channels];
for (int i = 0; i < channels; ++i) {
val[i] = weights[weightPtr];
++weightPtr;
}
stridePointsWt.values = val;
trtWeights.push_back(stridePointsWt);
nvinfer1::IConstantLayer* stridePoints = network->addConstant(nvinfer1::Dims{2, {shuffleDims.d[0], 1}}, stridePointsWt);
assert(stridePoints != nullptr);
std::string stridePointsLayerName = "stride_points_" + std::to_string(layerIdx);
stridePoints->setName(stridePointsLayerName.c_str());
nvinfer1::ITensor* stridePointsTensor = stridePoints->getOutput(0);
nvinfer1::IElementWiseLayer* pred = network->addElementWise(*output, *stridePointsTensor,
nvinfer1::ElementWiseOperation::kPROD);
assert(pred != nullptr);
std::string predLayerName = "pred_" + std::to_string(layerIdx);
pred->setName(predLayerName.c_str());
output = pred->getOutput(0);
return output;
}

View File

@@ -1,18 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __REG_LAYER_H__
#define __REG_LAYER_H__
#include <map>
#include <vector>
#include "NvInfer.h"
nvinfer1::ITensor* regLayer(int layerIdx, std::map<std::string, std::string>& block, std::vector<float>& weights,
std::vector<nvinfer1::Weights>& trtWeights, int& weightPtr, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -14,7 +14,7 @@ reorgLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::IT
{ {
nvinfer1::ITensor* output; nvinfer1::ITensor* output;
assert(block.at("type") == "reorg"); assert(block.at("type") == "reorg3d");
nvinfer1::Dims inputDims = input->getDimensions(); nvinfer1::Dims inputDims = input->getDimensions();

View File

@@ -0,0 +1,28 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "sam_layer.h"
#include <cassert>
nvinfer1::ITensor*
samLayer(int layerIdx, std::string activation, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::ITensor* samInput, nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "sam");
nvinfer1::IElementWiseLayer* sam = network->addElementWise(*input, *samInput, nvinfer1::ElementWiseOperation::kPROD);
assert(sam != nullptr);
std::string samLayerName = "sam_" + std::to_string(layerIdx);
sam->setName(samLayerName.c_str());
output = sam->getOutput(0);
output = activationLayer(layerIdx, activation, output, network);
assert(output != nullptr);
return output;
}

View File

@@ -0,0 +1,18 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __SAM_LAYER_H__
#define __SAM_LAYER_H__
#include <map>
#include "NvInfer.h"
#include "activation_layer.h"
nvinfer1::ITensor* samLayer(int layerIdx, std::string activation, std::map<std::string, std::string>& block,
nvinfer1::ITensor* input, nvinfer1::ITensor* samInput, nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -8,7 +8,7 @@
#include <cassert> #include <cassert>
nvinfer1::ITensor* nvinfer1::ITensor*
shortcutLayer(int layerIdx, std::string mode, std::string activation, std::string inputVol, std::string shortcutVol, shortcutLayer(int layerIdx, std::string activation, std::string inputVol, std::string shortcutVol,
std::map<std::string, std::string>& block, nvinfer1::ITensor* input, nvinfer1::ITensor* shortcutInput, std::map<std::string, std::string>& block, nvinfer1::ITensor* input, nvinfer1::ITensor* shortcutInput,
nvinfer1::INetworkDefinition* network) nvinfer1::INetworkDefinition* network)
{ {
@@ -16,12 +16,7 @@ shortcutLayer(int layerIdx, std::string mode, std::string activation, std::strin
assert(block.at("type") == "shortcut"); assert(block.at("type") == "shortcut");
nvinfer1::ElementWiseOperation operation = nvinfer1::ElementWiseOperation::kSUM; if (inputVol != shortcutVol) {
if (mode == "mul")
operation = nvinfer1::ElementWiseOperation::kPROD;
if (mode == "add" && inputVol != shortcutVol) {
nvinfer1::ISliceLayer* slice = network->addSlice(*shortcutInput, nvinfer1::Dims{3, {0, 0, 0}}, input->getDimensions(), nvinfer1::ISliceLayer* slice = network->addSlice(*shortcutInput, nvinfer1::Dims{3, {0, 0, 0}}, input->getDimensions(),
nvinfer1::Dims{3, {1, 1, 1}}); nvinfer1::Dims{3, {1, 1, 1}});
assert(slice != nullptr); assert(slice != nullptr);
@@ -32,7 +27,7 @@ shortcutLayer(int layerIdx, std::string mode, std::string activation, std::strin
else else
output = shortcutInput; output = shortcutInput;
nvinfer1::IElementWiseLayer* shortcut = network->addElementWise(*input, *output, operation); nvinfer1::IElementWiseLayer* shortcut = network->addElementWise(*input, *output, nvinfer1::ElementWiseOperation::kSUM);
assert(shortcut != nullptr); assert(shortcut != nullptr);
std::string shortcutLayerName = "shortcut_" + std::to_string(layerIdx); std::string shortcutLayerName = "shortcut_" + std::to_string(layerIdx);
shortcut->setName(shortcutLayerName.c_str()); shortcut->setName(shortcutLayerName.c_str());

View File

@@ -12,8 +12,8 @@
#include "activation_layer.h" #include "activation_layer.h"
nvinfer1::ITensor* shortcutLayer(int layerIdx, std::string mode, std::string activation, std::string inputVol, nvinfer1::ITensor* shortcutLayer(int layerIdx, std::string activation, std::string inputVol, std::string shortcutVol,
std::string shortcutVol, std::map<std::string, std::string>& block, nvinfer1::ITensor* input, std::map<std::string, std::string>& block, nvinfer1::ITensor* input, nvinfer1::ITensor* shortcut,
nvinfer1::ITensor* shortcut, nvinfer1::INetworkDefinition* network); nvinfer1::INetworkDefinition* network);
#endif #endif

View File

@@ -1,128 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "shuffle_layer.h"
nvinfer1::ITensor*
shuffleLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
std::vector<nvinfer1::ITensor*> tensorOutputs, nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "shuffle");
nvinfer1::IShuffleLayer* shuffle = network->addShuffle(*input);
assert(shuffle != nullptr);
std::string shuffleLayerName = "shuffle_" + std::to_string(layerIdx);
shuffle->setName(shuffleLayerName.c_str());
if (block.find("reshape") != block.end()) {
nvinfer1::Dims inputTensorDims = input->getDimensions();
std::string strReshape = block.at("reshape");
std::vector<int32_t> reshape;
size_t lastPos = 0, pos = 0;
while ((pos = strReshape.find(',', lastPos)) != std::string::npos) {
std::string V = trim(strReshape.substr(lastPos, pos - lastPos));
if (V == "c")
reshape.push_back(inputTensorDims.d[0]);
else if (V == "ch")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[1]);
else if (V == "cw")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[2]);
else if (V == "h")
reshape.push_back(inputTensorDims.d[1]);
else if (V == "hw")
reshape.push_back(inputTensorDims.d[1] * inputTensorDims.d[2]);
else if (V == "w")
reshape.push_back(inputTensorDims.d[2]);
else if (V == "chw")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[1] * inputTensorDims.d[2]);
else
reshape.push_back(std::stoi(V));
lastPos = pos + 1;
}
if (lastPos < strReshape.length()) {
std::string lastV = trim(strReshape.substr(lastPos));
if (!lastV.empty()) {
if (lastV == "c")
reshape.push_back(inputTensorDims.d[0]);
else if (lastV == "ch")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[1]);
else if (lastV == "cw")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[2]);
else if (lastV == "h")
reshape.push_back(inputTensorDims.d[1]);
else if (lastV == "hw")
reshape.push_back(inputTensorDims.d[1] * inputTensorDims.d[2]);
else if (lastV == "w")
reshape.push_back(inputTensorDims.d[2]);
else if (lastV == "chw")
reshape.push_back(inputTensorDims.d[0] * inputTensorDims.d[1] * inputTensorDims.d[2]);
else
reshape.push_back(std::stoi(lastV));
}
}
assert(!reshape.empty());
nvinfer1::Dims reshapeDims;
reshapeDims.nbDims = reshape.size();
for (uint i = 0; i < reshape.size(); ++i)
reshapeDims.d[i] = reshape[i];
shuffle->setReshapeDimensions(reshapeDims);
}
if (block.find("transpose1") != block.end()) {
std::string strTranspose1 = block.at("transpose1");
std::vector<int32_t> transpose1;
size_t lastPos = 0, pos = 0;
while ((pos = strTranspose1.find(',', lastPos)) != std::string::npos) {
int vL = std::stoi(trim(strTranspose1.substr(lastPos, pos - lastPos)));
transpose1.push_back(vL);
lastPos = pos + 1;
}
if (lastPos < strTranspose1.length()) {
std::string lastV = trim(strTranspose1.substr(lastPos));
if (!lastV.empty())
transpose1.push_back(std::stoi(lastV));
}
assert(!transpose1.empty());
nvinfer1::Permutation permutation1;
for (uint i = 0; i < transpose1.size(); ++i)
permutation1.order[i] = transpose1[i];
shuffle->setFirstTranspose(permutation1);
}
if (block.find("transpose2") != block.end()) {
std::string strTranspose2 = block.at("transpose2");
std::vector<int32_t> transpose2;
size_t lastPos = 0, pos = 0;
while ((pos = strTranspose2.find(',', lastPos)) != std::string::npos) {
int vL = std::stoi(trim(strTranspose2.substr(lastPos, pos - lastPos)));
transpose2.push_back(vL);
lastPos = pos + 1;
}
if (lastPos < strTranspose2.length()) {
std::string lastV = trim(strTranspose2.substr(lastPos));
if (!lastV.empty())
transpose2.push_back(std::stoi(lastV));
}
assert(!transpose2.empty());
nvinfer1::Permutation permutation2;
for (uint i = 0; i < transpose2.size(); ++i)
permutation2.order[i] = transpose2[i];
shuffle->setSecondTranspose(permutation2);
}
output = shuffle->getOutput(0);
return output;
}

View File

@@ -1,14 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __SHUFFLE_LAYER_H__
#define __SHUFFLE_LAYER_H__
#include "../utils.h"
nvinfer1::ITensor* shuffleLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
std::vector<nvinfer1::ITensor*> tensorOutputs, nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -1,29 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include "softmax_layer.h"
#include <cassert>
nvinfer1::ITensor*
softmaxLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network)
{
nvinfer1::ITensor* output;
assert(block.at("type") == "softmax");
assert(block.find("axes") != block.end());
int axes = std::stoi(block.at("axes"));
nvinfer1::ISoftMaxLayer* softmax = network->addSoftMax(*input);
assert(softmax != nullptr);
std::string softmaxLayerName = "softmax_" + std::to_string(layerIdx);
softmax->setName(softmaxLayerName.c_str());
softmax->setAxes(1 << axes);
output = softmax->getOutput(0);
return output;
}

View File

@@ -1,16 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#ifndef __SOFTMAX_LAYER_H__
#define __SOFTMAX_LAYER_H__
#include <map>
#include "NvInfer.h"
nvinfer1::ITensor* softmaxLayer(int layerIdx, std::map<std::string, std::string>& block, nvinfer1::ITensor* input,
nvinfer1::INetworkDefinition* network);
#endif

View File

@@ -44,7 +44,7 @@ getYoloNetworkInfo(NetworkInfo& networkInfo, const NvDsInferContextInitParams* i
yoloType = yoloCfg.substr(0, yoloCfg.find(".cfg")); yoloType = yoloCfg.substr(0, yoloCfg.find(".cfg"));
networkInfo.inputBlobName = "data"; networkInfo.inputBlobName = "input";
networkInfo.networkType = yoloType; networkInfo.networkType = yoloType;
networkInfo.configFilePath = initParams->customNetworkConfigFilePath; networkInfo.configFilePath = initParams->customNetworkConfigFilePath;
networkInfo.wtsFilePath = initParams->modelFilePath; networkInfo.wtsFilePath = initParams->modelFilePath;
@@ -52,7 +52,6 @@ getYoloNetworkInfo(NetworkInfo& networkInfo, const NvDsInferContextInitParams* i
networkInfo.deviceType = (initParams->useDLA ? "kDLA" : "kGPU"); networkInfo.deviceType = (initParams->useDLA ? "kDLA" : "kGPU");
networkInfo.numDetectedClasses = initParams->numDetectedClasses; networkInfo.numDetectedClasses = initParams->numDetectedClasses;
networkInfo.clusterMode = initParams->clusterMode; networkInfo.clusterMode = initParams->clusterMode;
networkInfo.scoreThreshold = initParams->perClassDetectionParams->preClusterThreshold;
if (initParams->networkMode == 0) if (initParams->networkMode == 0)
networkInfo.networkMode = "FP32"; networkInfo.networkMode = "FP32";

View File

@@ -26,12 +26,15 @@
#include "nvdsinfer_custom_impl.h" #include "nvdsinfer_custom_impl.h"
#include "utils.h" #include "utils.h"
#include "yoloPlugins.h"
extern "C" bool extern "C" bool
NvDsInferParseYolo(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo, NvDsInferParseYolo(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList); NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList);
extern "C" bool
NvDsInferParseYoloE(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList);
static NvDsInferParseObjectInfo static NvDsInferParseObjectInfo
convertBBox(const float& bx1, const float& by1, const float& bx2, const float& by2, const uint& netW, const uint& netH) convertBBox(const float& bx1, const float& by1, const float& bx2, const float& by2, const uint& netW, const uint& netH)
{ {
@@ -60,7 +63,9 @@ addBBoxProposal(const float bx1, const float by1, const float bx2, const float b
const int maxIndex, const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo) const int maxIndex, const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{ {
NvDsInferParseObjectInfo bbi = convertBBox(bx1, by1, bx2, by2, netW, netH); NvDsInferParseObjectInfo bbi = convertBBox(bx1, by1, bx2, by2, netW, netH);
if (bbi.width < 1 || bbi.height < 1) return;
if (bbi.width < 1 || bbi.height < 1)
return;
bbi.detectionConfidence = maxProb; bbi.detectionConfidence = maxProb;
bbi.classId = maxIndex; bbi.classId = maxIndex;
@@ -68,23 +73,55 @@ addBBoxProposal(const float bx1, const float by1, const float bx2, const float b
} }
static std::vector<NvDsInferParseObjectInfo> static std::vector<NvDsInferParseObjectInfo>
decodeYoloTensor(const int* counts, const float* boxes, const float* scores, const int* classes, const uint& netW, decodeTensorYolo(const float* detection, const uint& outputSize, const uint& count, const uint& netW, const uint& netH,
const uint& netH) const std::vector<float>& preclusterThreshold)
{ {
std::vector<NvDsInferParseObjectInfo> binfo; std::vector<NvDsInferParseObjectInfo> binfo;
uint numBoxes = counts[0]; for (uint b = 0; b < outputSize; ++b) {
for (uint b = 0; b < numBoxes; ++b) { float maxProb = count == 6 ? detection[b * count + 4] : detection[b * count + 4] * detection[b * count + 6];
float bx1 = boxes[b * 4 + 0]; int maxIndex = (int) detection[b * count + 5];
float by1 = boxes[b * 4 + 1];
float bx2 = boxes[b * 4 + 2];
float by2 = boxes[b * 4 + 3];
float maxProb = scores[b]; if (maxProb < preclusterThreshold[maxIndex])
int maxIndex = classes[b]; continue;
float bxc = detection[b * count + 0];
float byc = detection[b * count + 1];
float bw = detection[b * count + 2];
float bh = detection[b * count + 3];
float bx1 = bxc - bw / 2;
float by1 = byc - bh / 2;
float bx2 = bx1 + bw;
float by2 = by1 + bh;
addBBoxProposal(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo); addBBoxProposal(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
} }
return binfo;
}
static std::vector<NvDsInferParseObjectInfo>
decodeTensorYoloE(const float* detection, const uint& outputSize, const uint& count, const uint& netW, const uint& netH,
const std::vector<float>& preclusterThreshold)
{
std::vector<NvDsInferParseObjectInfo> binfo;
for (uint b = 0; b < outputSize; ++b) {
float maxProb = count == 6 ? detection[b * count + 4] : detection[b * count + 4] * detection[b * count + 6];
int maxIndex = (int) detection[b * count + 5];
if (maxProb < preclusterThreshold[maxIndex])
continue;
float bx1 = detection[b * count + 0];
float by1 = detection[b * count + 1];
float bx2 = detection[b * count + 2];
float by2 = detection[b * count + 3];
addBBoxProposal(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
}
return binfo; return binfo;
} }
@@ -99,14 +136,39 @@ NvDsInferParseCustomYolo(std::vector<NvDsInferLayerInfo> const& outputLayersInfo
std::vector<NvDsInferParseObjectInfo> objects; std::vector<NvDsInferParseObjectInfo> objects;
const NvDsInferLayerInfo& counts = outputLayersInfo[0]; const NvDsInferLayerInfo& layer = outputLayersInfo[0];
const NvDsInferLayerInfo& boxes = outputLayersInfo[1];
const NvDsInferLayerInfo& scores = outputLayersInfo[2];
const NvDsInferLayerInfo& classes = outputLayersInfo[3];
std::vector<NvDsInferParseObjectInfo> outObjs = decodeYoloTensor((const int*) (counts.buffer), const uint outputSize = layer.inferDims.d[0];
(const float*) (boxes.buffer), (const float*) (scores.buffer), (const int*) (classes.buffer), networkInfo.width, const uint count = layer.inferDims.d[1];
networkInfo.height);
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYolo((const float*) (layer.buffer), outputSize, count,
networkInfo.width, networkInfo.height, detectionParams.perClassPreclusterThreshold);
objects.insert(objects.end(), outObjs.begin(), outObjs.end());
objectList = objects;
return true;
}
static bool
NvDsInferParseCustomYoloE(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
std::vector<NvDsInferParseObjectInfo> objects;
const NvDsInferLayerInfo& layer = outputLayersInfo[0];
const uint outputSize = layer.inferDims.d[0];
const uint count = layer.inferDims.d[1];
std::vector<NvDsInferParseObjectInfo> outObjs = decodeTensorYoloE((const float*) (layer.buffer), outputSize, count,
networkInfo.width, networkInfo.height, detectionParams.perClassPreclusterThreshold);
objects.insert(objects.end(), outObjs.begin(), outObjs.end()); objects.insert(objects.end(), outObjs.begin(), outObjs.end());
@@ -122,4 +184,11 @@ NvDsInferParseYolo(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDs
return NvDsInferParseCustomYolo(outputLayersInfo, networkInfo, detectionParams, objectList); return NvDsInferParseCustomYolo(outputLayersInfo, networkInfo, detectionParams, objectList);
} }
extern "C" bool
NvDsInferParseYoloE(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustomYoloE(outputLayersInfo, networkInfo, detectionParams, objectList);
}
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo); CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo);

View File

@@ -1,530 +0,0 @@
/*
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*
* Edited by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include "nvdsinfer_custom_impl.h"
#include "utils.h"
#include "yoloPlugins.h"
__global__ void decodeTensor_YOLO_ONNX(NvDsInferParseObjectInfo *binfo, const float* detections, const int numClasses,
const int outputSize, float netW, float netH, const float* preclusterThreshold, int* numDetections)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numClasses; ++i) {
float prob = detections[x_id * (5 + numClasses) + 5 + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
const float objectness = detections[x_id * (5 + numClasses) + 4];
if (objectness * maxProb < preclusterThreshold[maxIndex])
return;
int count = (int)atomicAdd(numDetections, 1);
const float bxc = detections[x_id * (5 + numClasses) + 0];
const float byc = detections[x_id * (5 + numClasses) + 1];
const float bw = detections[x_id * (5 + numClasses) + 2];
const float bh = detections[x_id * (5 + numClasses) + 3];
float x0 = bxc - bw / 2;
float y0 = byc - bh / 2;
float x1 = x0 + bw;
float y1 = y0 + bh;
x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0));
x1 = fminf(float(netW), fmaxf(float(0.0), x1));
y1 = fminf(float(netH), fmaxf(float(0.0), y1));
binfo[count].left = x0;
binfo[count].top = y0;
binfo[count].width = fminf(float(netW), fmaxf(float(0.0), x1 - x0));
binfo[count].height = fminf(float(netH), fmaxf(float(0.0), y1 - y0));
binfo[count].detectionConfidence = objectness * maxProb;
binfo[count].classId = maxIndex;
}
__global__ void decodeTensor_YOLOV8_ONNX(NvDsInferParseObjectInfo* binfo, const float* detections, const int numClasses,
const int outputSize, float netW, float netH, const float* preclusterThreshold, int* numDetections)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numClasses; ++i) {
float prob = detections[x_id + outputSize * (i + 4)];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
if (maxProb < preclusterThreshold[maxIndex])
return;
int count = (int)atomicAdd(numDetections, 1);
const float bxc = detections[x_id + outputSize * 0];
const float byc = detections[x_id + outputSize * 1];
const float bw = detections[x_id + outputSize * 2];
const float bh = detections[x_id + outputSize * 3];
float x0 = bxc - bw / 2;
float y0 = byc - bh / 2;
float x1 = x0 + bw;
float y1 = y0 + bh;
x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0));
x1 = fminf(float(netW), fmaxf(float(0.0), x1));
y1 = fminf(float(netH), fmaxf(float(0.0), y1));
binfo[count].left = x0;
binfo[count].top = y0;
binfo[count].width = fminf(float(netW), fmaxf(float(0.0), x1 - x0));
binfo[count].height = fminf(float(netH), fmaxf(float(0.0), y1 - y0));
binfo[count].detectionConfidence = maxProb;
binfo[count].classId = maxIndex;
}
__global__ void decodeTensor_YOLOX_ONNX(NvDsInferParseObjectInfo *binfo, const float* detections, const int numClasses,
const int outputSize, float netW, float netH, const int *grid0, const int *grid1, const int *strides,
const float* preclusterThreshold, int* numDetections)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numClasses; ++i) {
float prob = detections[x_id * (5 + numClasses) + 5 + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
const float objectness = detections[x_id * (5 + numClasses) + 4];
if (objectness * maxProb < preclusterThreshold[maxIndex])
return;
int count = (int)atomicAdd(numDetections, 1);
const float bxc = (detections[x_id * (5 + numClasses) + 0] + grid0[x_id]) * strides[x_id];
const float byc = (detections[x_id * (5 + numClasses) + 1] + grid1[x_id]) * strides[x_id];
const float bw = __expf(detections[x_id * (5 + numClasses) + 2]) * strides[x_id];
const float bh = __expf(detections[x_id * (5 + numClasses) + 3]) * strides[x_id];
float x0 = bxc - bw / 2;
float y0 = byc - bh / 2;
float x1 = x0 + bw;
float y1 = y0 + bh;
x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0));
x1 = fminf(float(netW), fmaxf(float(0.0), x1));
y1 = fminf(float(netH), fmaxf(float(0.0), y1));
binfo[count].left = x0;
binfo[count].top = y0;
binfo[count].width = fminf(float(netW), fmaxf(float(0.0), x1 - x0));
binfo[count].height = fminf(float(netH), fmaxf(float(0.0), y1 - y0));
binfo[count].detectionConfidence = objectness * maxProb;
binfo[count].classId = maxIndex;
}
__global__ void decodeTensor_YOLO_NAS_ONNX(NvDsInferParseObjectInfo *binfo, const float* scores, const float* boxes,
const int numClasses, const int outputSize, float netW, float netH, const float* preclusterThreshold, int* numDetections)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numClasses; ++i) {
float prob = scores[x_id * numClasses + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
if (maxProb < preclusterThreshold[maxIndex])
return;
int count = (int)atomicAdd(numDetections, 1);
float x0 = boxes[x_id * 4 + 0];
float y0 = boxes[x_id * 4 + 1];
float x1 = boxes[x_id * 4 + 2];
float y1 = boxes[x_id * 4 + 3];
x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0));
x1 = fminf(float(netW), fmaxf(float(0.0), x1));
y1 = fminf(float(netH), fmaxf(float(0.0), y1));
binfo[count].left = x0;
binfo[count].top = y0;
binfo[count].width = fminf(float(netW), fmaxf(float(0.0), x1 - x0));
binfo[count].height = fminf(float(netH), fmaxf(float(0.0), y1 - y0));
binfo[count].detectionConfidence = maxProb;
binfo[count].classId = maxIndex;
}
__global__ void decodeTensor_PPYOLOE_ONNX(NvDsInferParseObjectInfo *binfo, const float* scores, const float* boxes,
const int numClasses, const int outputSize, float netW, float netH, const float* preclusterThreshold, int* numDetections)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numClasses; ++i) {
float prob = scores[x_id + outputSize * i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
if (maxProb < preclusterThreshold[maxIndex])
return;
int count = (int)atomicAdd(numDetections, 1);
float x0 = boxes[x_id * 4 + 0];
float y0 = boxes[x_id * 4 + 1];
float x1 = boxes[x_id * 4 + 2];
float y1 = boxes[x_id * 4 + 3];
x0 = fminf(float(netW), fmaxf(float(0.0), x0));
y0 = fminf(float(netH), fmaxf(float(0.0), y0));
x1 = fminf(float(netW), fmaxf(float(0.0), x1));
y1 = fminf(float(netH), fmaxf(float(0.0), y1));
binfo[count].left = x0;
binfo[count].top = y0;
binfo[count].width = fminf(float(netW), fmaxf(float(0.0), x1 - x0));
binfo[count].height = fminf(float(netH), fmaxf(float(0.0), y1 - y0));
binfo[count].detectionConfidence = maxProb;
binfo[count].classId = maxIndex;
}
static bool
NvDsInferParseCustom_YOLO_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
const NvDsInferLayerInfo& layer = outputLayersInfo[0];
const uint outputSize = layer.inferDims.d[0];
const uint numClasses = layer.inferDims.d[1] - 5;
if (numClasses != detectionParams.numClassesConfigured) {
std::cerr << "WARNING: Number of classes mismatch, make sure to set num-detected-classes=" << numClasses
<< " in config_infer file\n" << std::endl;
}
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
std::vector<int> numDetections = { 0 };
thrust::device_vector<int> d_numDetections(numDetections);
thrust::device_vector<float> preclusterThreshold(detectionParams.perClassPreclusterThreshold);
int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensor_YOLO_ONNX<<<threads_per_block, number_of_blocks>>>(
thrust::raw_pointer_cast(objects.data()), (const float*) (layer.buffer), numClasses, outputSize,
static_cast<float>(networkInfo.width), static_cast<float>(networkInfo.height),
thrust::raw_pointer_cast(preclusterThreshold.data()), thrust::raw_pointer_cast(d_numDetections.data()));
thrust::copy(d_numDetections.begin(), d_numDetections.end(), numDetections.begin());
objectList.resize(numDetections[0]);
thrust::copy(objects.begin(), objects.begin() + numDetections[0], objectList.begin());
return true;
}
static bool
NvDsInferParseCustom_YOLOV8_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
const NvDsInferLayerInfo& layer = outputLayersInfo[0];
const uint numClasses = layer.inferDims.d[0] - 4;
const uint outputSize = layer.inferDims.d[1];
if (numClasses != detectionParams.numClassesConfigured) {
std::cerr << "WARNING: Number of classes mismatch, make sure to set num-detected-classes=" << numClasses
<< " in config_infer file\n" << std::endl;
}
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
std::vector<int> numDetections = { 0 };
thrust::device_vector<int> d_numDetections(numDetections);
thrust::device_vector<float> preclusterThreshold(detectionParams.perClassPreclusterThreshold);
int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensor_YOLOV8_ONNX<<<threads_per_block, number_of_blocks>>>(
thrust::raw_pointer_cast(objects.data()), (const float*) (layer.buffer), numClasses, outputSize,
static_cast<float>(networkInfo.width), static_cast<float>(networkInfo.height),
thrust::raw_pointer_cast(preclusterThreshold.data()), thrust::raw_pointer_cast(d_numDetections.data()));
thrust::copy(d_numDetections.begin(), d_numDetections.end(), numDetections.begin());
objectList.resize(numDetections[0]);
thrust::copy(objects.begin(), objects.begin() + numDetections[0], objectList.begin());
return true;
}
static bool
NvDsInferParseCustom_YOLOX_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
const NvDsInferLayerInfo& layer = outputLayersInfo[0];
const uint outputSize = layer.inferDims.d[0];
const uint numClasses = layer.inferDims.d[1] - 5;
if (numClasses != detectionParams.numClassesConfigured) {
std::cerr << "WARNING: Number of classes mismatch, make sure to set num-detected-classes=" << numClasses
<< " in config_infer file\n" << std::endl;
}
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
std::vector<int> numDetections = { 0 };
thrust::device_vector<int> d_numDetections(numDetections);
thrust::device_vector<float> preclusterThreshold(detectionParams.perClassPreclusterThreshold);
std::vector<int> strides = {8, 16, 32};
std::vector<int> grid0;
std::vector<int> grid1;
std::vector<int> gridStrides;
for (uint s = 0; s < strides.size(); ++s) {
int num_grid_y = networkInfo.height / strides[s];
int num_grid_x = networkInfo.width / strides[s];
for (int g1 = 0; g1 < num_grid_y; ++g1) {
for (int g0 = 0; g0 < num_grid_x; ++g0) {
grid0.push_back(g0);
grid1.push_back(g1);
gridStrides.push_back(strides[s]);
}
}
}
thrust::device_vector<int> d_grid0(grid0);
thrust::device_vector<int> d_grid1(grid1);
thrust::device_vector<int> d_gridStrides(gridStrides);
int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensor_YOLOX_ONNX<<<threads_per_block, number_of_blocks>>>(
thrust::raw_pointer_cast(objects.data()), (const float*) (layer.buffer), numClasses, outputSize,
static_cast<float>(networkInfo.width), static_cast<float>(networkInfo.height),
thrust::raw_pointer_cast(d_grid0.data()), thrust::raw_pointer_cast(d_grid1.data()),
thrust::raw_pointer_cast(d_gridStrides.data()), thrust::raw_pointer_cast(preclusterThreshold.data()),
thrust::raw_pointer_cast(d_numDetections.data()));
thrust::copy(d_numDetections.begin(), d_numDetections.end(), numDetections.begin());
objectList.resize(numDetections[0]);
thrust::copy(objects.begin(), objects.begin() + numDetections[0], objectList.begin());
return true;
}
static bool
NvDsInferParseCustom_YOLO_NAS_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
const NvDsInferLayerInfo& scores = outputLayersInfo[0];
const NvDsInferLayerInfo& boxes = outputLayersInfo[1];
const uint outputSize = scores.inferDims.d[0];
const uint numClasses = scores.inferDims.d[1];
if (numClasses != detectionParams.numClassesConfigured) {
std::cerr << "WARNING: Number of classes mismatch, make sure to set num-detected-classes=" << numClasses
<< " in config_infer file\n" << std::endl;
}
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
std::vector<int> numDetections = { 0 };
thrust::device_vector<int> d_numDetections(numDetections);
thrust::device_vector<float> preclusterThreshold(detectionParams.perClassPreclusterThreshold);
int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensor_YOLO_NAS_ONNX<<<threads_per_block, number_of_blocks>>>(
thrust::raw_pointer_cast(objects.data()), (const float*) (scores.buffer), (const float*) (boxes.buffer), numClasses,
outputSize, static_cast<float>(networkInfo.width), static_cast<float>(networkInfo.height),
thrust::raw_pointer_cast(preclusterThreshold.data()), thrust::raw_pointer_cast(d_numDetections.data()));
thrust::copy(d_numDetections.begin(), d_numDetections.end(), numDetections.begin());
objectList.resize(numDetections[0]);
thrust::copy(objects.begin(), objects.begin() + numDetections[0], objectList.begin());
return true;
}
static bool
NvDsInferParseCustom_PPYOLOE_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo, NvDsInferParseDetectionParams const& detectionParams,
std::vector<NvDsInferParseObjectInfo>& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
return false;
}
const NvDsInferLayerInfo& scores = outputLayersInfo[0];
const NvDsInferLayerInfo& boxes = outputLayersInfo[1];
const uint numClasses = scores.inferDims.d[0];
const uint outputSize = scores.inferDims.d[1];
if (numClasses != detectionParams.numClassesConfigured) {
std::cerr << "WARNING: Number of classes mismatch, make sure to set num-detected-classes=" << numClasses
<< " in config_infer file\n" << std::endl;
}
thrust::device_vector<NvDsInferParseObjectInfo> objects(outputSize);
std::vector<int> numDetections = { 0 };
thrust::device_vector<int> d_numDetections(numDetections);
thrust::device_vector<float> preclusterThreshold(detectionParams.perClassPreclusterThreshold);
int threads_per_block = 1024;
int number_of_blocks = ((outputSize - 1) / threads_per_block) + 1;
decodeTensor_PPYOLOE_ONNX<<<threads_per_block, number_of_blocks>>>(
thrust::raw_pointer_cast(objects.data()), (const float*) (scores.buffer), (const float*) (boxes.buffer), numClasses,
outputSize, static_cast<float>(networkInfo.width), static_cast<float>(networkInfo.height),
thrust::raw_pointer_cast(preclusterThreshold.data()), thrust::raw_pointer_cast(d_numDetections.data()));
thrust::copy(d_numDetections.begin(), d_numDetections.end(), numDetections.begin());
objectList.resize(numDetections[0]);
thrust::copy(objects.begin(), objects.begin() + numDetections[0], objectList.begin());
return true;
}
extern "C" bool
NvDsInferParse_YOLO_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustom_YOLO_ONNX(outputLayersInfo, networkInfo, detectionParams, objectList);
}
extern "C" bool
NvDsInferParse_YOLOV8_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustom_YOLOV8_ONNX(outputLayersInfo, networkInfo, detectionParams, objectList);
}
extern "C" bool
NvDsInferParse_YOLOX_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustom_YOLOX_ONNX(outputLayersInfo, networkInfo, detectionParams, objectList);
}
extern "C" bool
NvDsInferParse_YOLO_NAS_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustom_YOLO_NAS_ONNX(outputLayersInfo, networkInfo, detectionParams, objectList);
}
extern "C" bool
NvDsInferParse_PPYOLOE_ONNX(std::vector<NvDsInferLayerInfo> const& outputLayersInfo, NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams, std::vector<NvDsInferParseObjectInfo>& objectList)
{
return NvDsInferParseCustom_PPYOLOE_ONNX(outputLayersInfo, networkInfo, detectionParams, objectList);
}

View File

@@ -98,25 +98,6 @@ loadWeights(const std::string weightsFilePath, const std::string& networkType)
break; break;
} }
} }
else if (weightsFilePath.find(".wts") != std::string::npos) {
std::ifstream file(weightsFilePath);
assert(file.good());
int32_t count;
file >> count;
assert(count > 0 && "\nInvalid .wts file.");
uint32_t floatWeight;
std::string name;
uint32_t size;
while (count--) {
file >> name >> std::dec >> size;
for (uint32_t x = 0, y = size; x < y; ++x) {
file >> std::hex >> floatWeight;
weights.push_back(*reinterpret_cast<float*>(&floatWeight));
};
}
}
else { else {
std::cerr << "\nFile " << weightsFilePath << " is not supported" << std::endl; std::cerr << "\nFile " << weightsFilePath << " is not supported" << std::endl;
assert(0); assert(0);

View File

@@ -34,8 +34,8 @@ Yolo::Yolo(const NetworkInfo& networkInfo) : m_InputBlobName(networkInfo.inputBl
m_NetworkType(networkInfo.networkType), m_ConfigFilePath(networkInfo.configFilePath), m_NetworkType(networkInfo.networkType), m_ConfigFilePath(networkInfo.configFilePath),
m_WtsFilePath(networkInfo.wtsFilePath), m_Int8CalibPath(networkInfo.int8CalibPath), m_DeviceType(networkInfo.deviceType), m_WtsFilePath(networkInfo.wtsFilePath), m_Int8CalibPath(networkInfo.int8CalibPath), m_DeviceType(networkInfo.deviceType),
m_NumDetectedClasses(networkInfo.numDetectedClasses), m_ClusterMode(networkInfo.clusterMode), m_NumDetectedClasses(networkInfo.numDetectedClasses), m_ClusterMode(networkInfo.clusterMode),
m_NetworkMode(networkInfo.networkMode), m_ScoreThreshold(networkInfo.scoreThreshold), m_InputH(0), m_InputW(0), m_NetworkMode(networkInfo.networkMode), m_InputH(0), m_InputW(0), m_InputC(0), m_InputSize(0), m_NumClasses(0),
m_InputC(0), m_InputSize(0), m_NumClasses(0), m_LetterBox(0), m_NewCoords(0), m_YoloCount(0) m_LetterBox(0), m_NewCoords(0), m_YoloCount(0)
{ {
} }
@@ -130,18 +130,6 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
{ {
int weightPtr = 0; int weightPtr = 0;
std::string weightsType = "wts";
if (m_WtsFilePath.find(".weights") != std::string::npos)
weightsType = "weights";
float eps = 1.0e-5;
if (m_NetworkType.find("yolov5") != std::string::npos || m_NetworkType.find("yolov6") != std::string::npos ||
m_NetworkType.find("yolov7") != std::string::npos || m_NetworkType.find("yolov8") != std::string::npos ||
m_NetworkType.find("yolox") != std::string::npos)
eps = 1.0e-3;
else if (m_NetworkType.find("yolor") != std::string::npos)
eps = 1.0e-4;
nvinfer1::ITensor* data = network.addInput(m_InputBlobName.c_str(), nvinfer1::DataType::kFLOAT, nvinfer1::ITensor* data = network.addInput(m_InputBlobName.c_str(), nvinfer1::DataType::kFLOAT,
nvinfer1::Dims{3, {static_cast<int>(m_InputC), static_cast<int>(m_InputH), static_cast<int>(m_InputW)}}); nvinfer1::Dims{3, {static_cast<int>(m_InputC), static_cast<int>(m_InputH), static_cast<int>(m_InputW)}});
assert(data != nullptr && data->getDimensions().nbDims > 0); assert(data != nullptr && data->getDimensions().nbDims > 0);
@@ -152,18 +140,15 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
nvinfer1::ITensor* yoloTensorInputs[m_YoloCount]; nvinfer1::ITensor* yoloTensorInputs[m_YoloCount];
uint yoloCountInputs = 0; uint yoloCountInputs = 0;
int modelType = -1;
for (uint i = 0; i < m_ConfigBlocks.size(); ++i) { for (uint i = 0; i < m_ConfigBlocks.size(); ++i) {
std::string layerIndex = "(" + std::to_string(tensorOutputs.size()) + ")"; std::string layerIndex = "(" + std::to_string(tensorOutputs.size()) + ")";
if (m_ConfigBlocks.at(i).at("type") == "net") if (m_ConfigBlocks.at(i).at("type") == "net")
printLayerInfo("", "Layer", "Input Shape", "Output Shape", "WeightPtr"); printLayerInfo("", "Layer", "Input Shape", "Output Shape", "WeightPtr");
else if (m_ConfigBlocks.at(i).at("type") == "convolutional") { else if (m_ConfigBlocks.at(i).at("type") == "conv" || m_ConfigBlocks.at(i).at("type") == "convolutional") {
int channels = getNumChannels(previous); int channels = getNumChannels(previous);
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
previous = convolutionalLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, weightsType, channels, eps, previous = convolutionalLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, channels, previous, &network);
previous, &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
@@ -173,39 +158,30 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
else if (m_ConfigBlocks.at(i).at("type") == "deconvolutional") { else if (m_ConfigBlocks.at(i).at("type") == "deconvolutional") {
int channels = getNumChannels(previous); int channels = getNumChannels(previous);
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
previous = deconvolutionalLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, weightsType, channels, previous = deconvolutionalLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, channels, previous,
previous, &network); &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = "deconv"; std::string layerName = "deconv";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr)); printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr));
} }
else if (m_ConfigBlocks.at(i).at("type") == "c2f") {
std::string inputVol = dimsToString(previous->getDimensions());
previous = c2fLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, weightsType, eps, previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "c2f_" + m_ConfigBlocks.at(i).at("activation");
printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr));
}
else if (m_ConfigBlocks.at(i).at("type") == "batchnorm") { else if (m_ConfigBlocks.at(i).at("type") == "batchnorm") {
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
previous = batchnormLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, weightsType, eps, previous, previous = batchnormLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, previous, &network);
&network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = "batchnorm_" + m_ConfigBlocks.at(i).at("activation"); std::string layerName = "batchnorm_" + m_ConfigBlocks.at(i).at("activation");
printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr)); printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr));
} }
else if (m_ConfigBlocks.at(i).at("type") == "implicit_add" || m_ConfigBlocks.at(i).at("type") == "implicit_mul") { else if (m_ConfigBlocks.at(i).at("type") == "implicit" || m_ConfigBlocks.at(i).at("type") == "implicit_add" ||
m_ConfigBlocks.at(i).at("type") == "implicit_mul") {
previous = implicitLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, &network); previous = implicitLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = m_ConfigBlocks.at(i).at("type"); std::string layerName = "implicit";
printLayerInfo(layerIndex, layerName, "-", outputVol, std::to_string(weightPtr)); printLayerInfo(layerIndex, layerName, "-", outputVol, std::to_string(weightPtr));
} }
else if (m_ConfigBlocks.at(i).at("type") == "shift_channels" || m_ConfigBlocks.at(i).at("type") == "control_channels") { else if (m_ConfigBlocks.at(i).at("type") == "shift_channels" || m_ConfigBlocks.at(i).at("type") == "control_channels") {
@@ -234,27 +210,44 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
assert((i + from - 1 >= 0) && (i + from - 1 < tensorOutputs.size())); assert((i + from - 1 >= 0) && (i + from - 1 < tensorOutputs.size()));
assert(i + from - 1 < i - 2); assert(i + from - 1 < i - 2);
std::string mode = "add";
if (m_ConfigBlocks.at(i).find("mode") != m_ConfigBlocks.at(i).end())
mode = m_ConfigBlocks.at(i).at("mode");
std::string activation = "linear"; std::string activation = "linear";
if (m_ConfigBlocks.at(i).find("activation") != m_ConfigBlocks.at(i).end()) if (m_ConfigBlocks.at(i).find("activation") != m_ConfigBlocks.at(i).end())
activation = m_ConfigBlocks.at(i).at("activation"); activation = m_ConfigBlocks.at(i).at("activation");
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
std::string shortcutVol = dimsToString(tensorOutputs[i + from - 1]->getDimensions()); std::string shortcutVol = dimsToString(tensorOutputs[i + from - 1]->getDimensions());
previous = shortcutLayer(i, mode, activation, inputVol, shortcutVol, m_ConfigBlocks.at(i), previous, previous = shortcutLayer(i, activation, inputVol, shortcutVol, m_ConfigBlocks.at(i), previous,
tensorOutputs[i + from - 1], &network); tensorOutputs[i + from - 1], &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = "shortcut_" + mode + "_" + activation + ": " + std::to_string(i + from - 1); std::string layerName = "shortcut_" + activation + ": " + std::to_string(i + from - 1);
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
if (mode == "add" && inputVol != shortcutVol) if (inputVol != shortcutVol)
std::cout << inputVol << " +" << shortcutVol << std::endl; std::cout << inputVol << " +" << shortcutVol << std::endl;
} }
else if (m_ConfigBlocks.at(i).at("type") == "sam") {
assert(m_ConfigBlocks.at(i).find("from") != m_ConfigBlocks.at(i).end());
int from = stoi(m_ConfigBlocks.at(i).at("from"));
if (from > 0)
from = from - i + 1;
assert((i - 2 >= 0) && (i - 2 < tensorOutputs.size()));
assert((i + from - 1 >= 0) && (i + from - 1 < tensorOutputs.size()));
assert(i + from - 1 < i - 2);
std::string activation = "linear";
if (m_ConfigBlocks.at(i).find("activation") != m_ConfigBlocks.at(i).end())
activation = m_ConfigBlocks.at(i).at("activation");
std::string inputVol = dimsToString(previous->getDimensions());
previous = samLayer(i, activation, m_ConfigBlocks.at(i), previous, tensorOutputs[i + from - 1], &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "sam_" + activation + ": " + std::to_string(i + from - 1);
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "route") { else if (m_ConfigBlocks.at(i).at("type") == "route") {
std::string layers; std::string layers;
previous = routeLayer(i, layers, m_ConfigBlocks.at(i), tensorOutputs, &network); previous = routeLayer(i, layers, m_ConfigBlocks.at(i), tensorOutputs, &network);
@@ -273,7 +266,8 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
std::string layerName = "upsample"; std::string layerName = "upsample";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "maxpool" || m_ConfigBlocks.at(i).at("type") == "avgpool") { else if (m_ConfigBlocks.at(i).at("type") == "max" || m_ConfigBlocks.at(i).at("type") == "maxpool" ||
m_ConfigBlocks.at(i).at("type") == "avg" || m_ConfigBlocks.at(i).at("type") == "avgpool") {
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
previous = poolingLayer(i, m_ConfigBlocks.at(i), previous, &network); previous = poolingLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr); assert(previous != nullptr);
@@ -282,9 +276,17 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
std::string layerName = m_ConfigBlocks.at(i).at("type"); std::string layerName = m_ConfigBlocks.at(i).at("type");
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "reorg3d") {
std::string inputVol = dimsToString(previous->getDimensions());
previous = reorgLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "reorg3d";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "reorg") { else if (m_ConfigBlocks.at(i).at("type") == "reorg") {
std::string inputVol = dimsToString(previous->getDimensions()); std::string inputVol = dimsToString(previous->getDimensions());
if (m_NetworkType.find("yolov2") != std::string::npos) {
nvinfer1::IPluginV2* reorgPlugin = createReorgPlugin(2); nvinfer1::IPluginV2* reorgPlugin = createReorgPlugin(2);
assert(reorgPlugin != nullptr); assert(reorgPlugin != nullptr);
nvinfer1::IPluginV2Layer* reorg = network.addPluginV2(&previous, 1, *reorgPlugin); nvinfer1::IPluginV2Layer* reorg = network.addPluginV2(&previous, 1, *reorgPlugin);
@@ -292,52 +294,15 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
std::string reorglayerName = "reorg_" + std::to_string(i); std::string reorglayerName = "reorg_" + std::to_string(i);
reorg->setName(reorglayerName.c_str()); reorg->setName(reorglayerName.c_str());
previous = reorg->getOutput(0); previous = reorg->getOutput(0);
}
else
previous = reorgLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr); assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions()); std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
std::string layerName = "reorg"; std::string layerName = "reorg";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-"); printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "reduce") {
std::string inputVol = dimsToString(previous->getDimensions());
previous = reduceLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "reduce";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "shuffle") {
std::string inputVol = dimsToString(previous->getDimensions());
previous = shuffleLayer(i, m_ConfigBlocks.at(i), previous, tensorOutputs, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "shuffle";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "softmax") {
std::string inputVol = dimsToString(previous->getDimensions());
previous = softmaxLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
std::string layerName = "softmax";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "yolo" || m_ConfigBlocks.at(i).at("type") == "region") { else if (m_ConfigBlocks.at(i).at("type") == "yolo" || m_ConfigBlocks.at(i).at("type") == "region") {
if (m_ConfigBlocks.at(i).at("type") == "yolo") std::string blobName = m_ConfigBlocks.at(i).at("type") == "yolo" ? "yolo_" + std::to_string(i) :
if (m_NetworkType.find("yolor") != std::string::npos) "region_" + std::to_string(i);
modelType = 2;
else
modelType = 1;
else
modelType = 0;
std::string blobName = modelType != 0 ? "yolo_" + std::to_string(i) : "region_" + std::to_string(i);
nvinfer1::Dims prevTensorDims = previous->getDimensions(); nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs); TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName; curYoloTensor.blobName = blobName;
@@ -348,83 +313,11 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
tensorOutputs.push_back(previous); tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous; yoloTensorInputs[yoloCountInputs] = previous;
++yoloCountInputs; ++yoloCountInputs;
std::string layerName = modelType != 0 ? "yolo" : "region"; std::string layerName = m_ConfigBlocks.at(i).at("type") == "yolo" ? "yolo" : "region";
printLayerInfo(layerIndex, layerName, inputVol, "-", "-"); printLayerInfo(layerIndex, layerName, inputVol, "-", "-");
} }
else if (m_ConfigBlocks.at(i).at("type") == "cls") { else if (m_ConfigBlocks.at(i).at("type") == "dropout") {
modelType = 3; // pass
std::string blobName = "cls_" + std::to_string(i);
nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName;
curYoloTensor.numBBoxes = prevTensorDims.d[1];
m_NumClasses = prevTensorDims.d[0];
std::string inputVol = dimsToString(previous->getDimensions());
previous = clsLayer(i, m_ConfigBlocks.at(i), previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous;
++yoloCountInputs;
std::string layerName = "cls";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, "-");
}
else if (m_ConfigBlocks.at(i).at("type") == "reg") {
modelType = 3;
std::string blobName = "reg_" + std::to_string(i);
nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName;
curYoloTensor.numBBoxes = prevTensorDims.d[1];
std::string inputVol = dimsToString(previous->getDimensions());
previous = regLayer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous;
++yoloCountInputs;
std::string layerName = "reg";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr));
}
else if (m_ConfigBlocks.at(i).at("type") == "detect_v8") {
modelType = 4;
std::string blobName = "detect_v8_" + std::to_string(i);
nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName;
curYoloTensor.numBBoxes = prevTensorDims.d[1];
std::string inputVol = dimsToString(previous->getDimensions());
previous = detectV8Layer(i, m_ConfigBlocks.at(i), weights, m_TrtWeights, weightPtr, previous, &network);
assert(previous != nullptr);
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous;
++yoloCountInputs;
std::string layerName = "detect_v8";
printLayerInfo(layerIndex, layerName, inputVol, outputVol, std::to_string(weightPtr));
}
else if (m_ConfigBlocks.at(i).at("type") == "detect_x") {
modelType = 5;
std::string blobName = "detect_x_" + std::to_string(i);
nvinfer1::Dims prevTensorDims = previous->getDimensions();
TensorInfo& curYoloTensor = m_YoloTensors.at(yoloCountInputs);
curYoloTensor.blobName = blobName;
curYoloTensor.numBBoxes = prevTensorDims.d[0];
m_NumClasses = prevTensorDims.d[1] - 5;
std::string outputVol = dimsToString(previous->getDimensions());
tensorOutputs.push_back(previous);
yoloTensorInputs[yoloCountInputs] = previous;
++yoloCountInputs;
std::string layerName = "detect_x";
printLayerInfo(layerIndex, layerName, "-", outputVol, std::to_string(weightPtr));
} }
else { else {
std::cerr << "\nUnsupported layer type --> \"" << m_ConfigBlocks.at(i).at("type") << "\"" << std::endl; std::cerr << "\nUnsupported layer type --> \"" << m_ConfigBlocks.at(i).at("type") << "\"" << std::endl;
@@ -438,42 +331,24 @@ Yolo::buildYoloNetwork(std::vector<float>& weights, nvinfer1::INetworkDefinition
} }
if (m_YoloCount == yoloCountInputs) { if (m_YoloCount == yoloCountInputs) {
assert((modelType != -1) && "\nCould not determine model type");
uint64_t outputSize = 0; uint64_t outputSize = 0;
for (uint j = 0; j < yoloCountInputs; ++j) { for (uint j = 0; j < yoloCountInputs; ++j) {
TensorInfo& curYoloTensor = m_YoloTensors.at(j); TensorInfo& curYoloTensor = m_YoloTensors.at(j);
if (modelType == 3 || modelType == 4 || modelType == 5)
outputSize = curYoloTensor.numBBoxes;
else
outputSize += curYoloTensor.gridSizeX * curYoloTensor.gridSizeY * curYoloTensor.numBBoxes; outputSize += curYoloTensor.gridSizeX * curYoloTensor.gridSizeY * curYoloTensor.numBBoxes;
} }
nvinfer1::IPluginV2* yoloPlugin = new YoloLayer(m_InputW, m_InputH, m_NumClasses, m_NewCoords, m_YoloTensors, outputSize, nvinfer1::IPluginV2* yoloPlugin = new YoloLayer(m_InputW, m_InputH, m_NumClasses, m_NewCoords, m_YoloTensors,
modelType, m_ScoreThreshold); outputSize);
assert(yoloPlugin != nullptr); assert(yoloPlugin != nullptr);
nvinfer1::IPluginV2Layer* yolo = network.addPluginV2(yoloTensorInputs, m_YoloCount, *yoloPlugin); nvinfer1::IPluginV2Layer* yolo = network.addPluginV2(yoloTensorInputs, m_YoloCount, *yoloPlugin);
assert(yolo != nullptr); assert(yolo != nullptr);
std::string yoloLayerName = "yolo"; std::string yoloLayerName = "yolo";
yolo->setName(yoloLayerName.c_str()); yolo->setName(yoloLayerName.c_str());
std::string outputlayerName; nvinfer1::ITensor* outputYolo = yolo->getOutput(0);
nvinfer1::ITensor* num_detections = yolo->getOutput(0); std::string outputYoloLayerName = "output";
outputlayerName = "num_detections"; outputYolo->setName(outputYoloLayerName.c_str());
num_detections->setName(outputlayerName.c_str()); network.markOutput(*outputYolo);
nvinfer1::ITensor* detection_boxes = yolo->getOutput(1);
outputlayerName = "detection_boxes";
detection_boxes->setName(outputlayerName.c_str());
nvinfer1::ITensor* detection_scores = yolo->getOutput(2);
outputlayerName = "detection_scores";
detection_scores->setName(outputlayerName.c_str());
nvinfer1::ITensor* detection_classes = yolo->getOutput(3);
outputlayerName = "detection_classes";
detection_classes->setName(outputlayerName.c_str());
network.markOutput(*num_detections);
network.markOutput(*detection_boxes);
network.markOutput(*detection_scores);
network.markOutput(*detection_classes);
} }
else { else {
std::cerr << "\nError in yolo cfg file" << std::endl; std::cerr << "\nError in yolo cfg file" << std::endl;
@@ -600,54 +475,6 @@ Yolo::parseConfigBlocks()
outputTensor.numBBoxes = outputTensor.mask.size() > 0 ? outputTensor.mask.size() : std::stoul(trim(block.at("num"))); outputTensor.numBBoxes = outputTensor.mask.size() > 0 ? outputTensor.mask.size() : std::stoul(trim(block.at("num")));
m_YoloTensors.push_back(outputTensor);
}
else if ((block.at("type") == "cls") || (block.at("type") == "reg")) {
++m_YoloCount;
TensorInfo outputTensor;
m_YoloTensors.push_back(outputTensor);
}
else if (block.at("type") == "detect_v8") {
++m_YoloCount;
m_NumClasses = std::stoul(block.at("classes"));
TensorInfo outputTensor;
m_YoloTensors.push_back(outputTensor);
}
else if (block.at("type") == "detect_x") {
++m_YoloCount;
TensorInfo outputTensor;
std::vector<int> strides;
std::string stridesString = block.at("strides");
while (!stridesString.empty()) {
int npos = stridesString.find_first_of(',');
if (npos != -1) {
int stride = std::stof(trim(stridesString.substr(0, npos)));
strides.push_back(stride);
stridesString.erase(0, npos + 1);
}
else {
int stride = std::stof(trim(stridesString));
strides.push_back(stride);
break;
}
}
for (uint i = 0; i < strides.size(); ++i) {
int num_grid_y = m_InputH / strides[i];
int num_grid_x = m_InputW / strides[i];
for (int g1 = 0; g1 < num_grid_y; ++g1) {
for (int g0 = 0; g0 < num_grid_x; ++g0) {
outputTensor.anchors.push_back((float) g0);
outputTensor.anchors.push_back((float) g1);
outputTensor.mask.push_back(strides[i]);
}
}
}
m_YoloTensors.push_back(outputTensor); m_YoloTensors.push_back(outputTensor);
} }
} }

View File

@@ -31,21 +31,15 @@
#include "layers/convolutional_layer.h" #include "layers/convolutional_layer.h"
#include "layers/deconvolutional_layer.h" #include "layers/deconvolutional_layer.h"
#include "layers/c2f_layer.h"
#include "layers/batchnorm_layer.h" #include "layers/batchnorm_layer.h"
#include "layers/implicit_layer.h" #include "layers/implicit_layer.h"
#include "layers/channels_layer.h" #include "layers/channels_layer.h"
#include "layers/shortcut_layer.h" #include "layers/shortcut_layer.h"
#include "layers/sam_layer.h"
#include "layers/route_layer.h" #include "layers/route_layer.h"
#include "layers/upsample_layer.h" #include "layers/upsample_layer.h"
#include "layers/pooling_layer.h" #include "layers/pooling_layer.h"
#include "layers/reorg_layer.h" #include "layers/reorg_layer.h"
#include "layers/reduce_layer.h"
#include "layers/shuffle_layer.h"
#include "layers/softmax_layer.h"
#include "layers/cls_layer.h"
#include "layers/reg_layer.h"
#include "layers/detect_v8_layer.h"
struct NetworkInfo struct NetworkInfo
{ {
@@ -57,7 +51,6 @@ struct NetworkInfo
std::string deviceType; std::string deviceType;
uint numDetectedClasses; uint numDetectedClasses;
int clusterMode; int clusterMode;
float scoreThreshold;
std::string networkMode; std::string networkMode;
}; };
@@ -98,7 +91,6 @@ class Yolo : public IModelParser {
const uint m_NumDetectedClasses; const uint m_NumDetectedClasses;
const int m_ClusterMode; const int m_ClusterMode;
const std::string m_NetworkMode; const std::string m_NetworkMode;
const float m_ScoreThreshold;
uint m_InputH; uint m_InputH;
uint m_InputW; uint m_InputW;

View File

@@ -4,13 +4,13 @@
*/ */
#include <stdint.h> #include <stdint.h>
#include <stdio.h>
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); } inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
__global__ void gpuYoloLayer(const float* input, int* num_detections, float* detection_boxes, float* detection_scores, __global__ void gpuYoloLayer(const float* input, float* output, int* count, const uint netWidth, const uint netHeight,
int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight, const uint gridSizeX, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY,
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY, const float* anchors, const float* anchors, const int* mask)
const int* mask)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -24,18 +24,13 @@ __global__ void gpuYoloLayer(const float* input, int* num_detections, float* det
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]); const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
if (objectness < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
const float alpha = scaleXY; const float alpha = scaleXY;
const float beta = -0.5 * (scaleXY - 1); const float beta = -0.5 * (scaleXY - 1);
float x = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) * alpha + beta + x_id) float xc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) * alpha + beta + x_id)
* netWidth / gridSizeX; * netWidth / gridSizeX;
float y = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) * alpha + beta + y_id) float yc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) * alpha + beta + y_id)
* netHeight / gridSizeY; * netHeight / gridSizeY;
float w = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[mask[z_id] * 2]; float w = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[mask[z_id] * 2];
@@ -53,23 +48,26 @@ __global__ void gpuYoloLayer(const float* input, int* num_detections, float* det
} }
} }
detection_boxes[count * 4 + 0] = x - 0.5 * w; int _count = (int)atomicAdd(count, 1);
detection_boxes[count * 4 + 1] = y - 0.5 * h;
detection_boxes[count * 4 + 2] = x + 0.5 * w; output[_count * 7 + 0] = xc;
detection_boxes[count * 4 + 3] = y + 0.5 * h; output[_count * 7 + 1] = yc;
detection_scores[count] = objectness * maxProb; output[_count * 7 + 2] = w;
detection_classes[count] = maxIndex; output[_count * 7 + 3] = h;
output[_count * 7 + 4] = maxProb;
output[_count * 7 + 5] = maxIndex;
output[_count * 7 + 6] = objectness;
} }
cudaError_t cudaYoloLayer(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream); cudaStream_t stream);
cudaError_t cudaYoloLayer(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream) cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -77,12 +75,11 @@ cudaError_t cudaYoloLayer(const void* input, void* num_detections, void* detecti
for (unsigned int batch = 0; batch < batchSize; ++batch) { for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * inputSize), reinterpret_cast<int*>(num_detections) + (batch), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize), reinterpret_cast<float*> (output) + (batch * 7 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize), reinterpret_cast<int*> (count) + (batch),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize), scoreThreshold, netWidth, netHeight, gridSizeX, netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY,
gridSizeY, numOutputClasses, numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors), reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask));
reinterpret_cast<const int*>(mask));
} }
return cudaGetLastError(); return cudaGetLastError();
} }

View File

@@ -1,64 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include <stdint.h>
__global__ void gpuYoloLayer_e(const float* cls, const float* reg, int* num_detections, float* detection_boxes,
float* detection_scores, int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight,
const uint numOutputClasses, const uint64_t outputSize)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numOutputClasses; ++i) {
float prob = cls[x_id * numOutputClasses + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
if (maxProb < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
detection_boxes[count * 4 + 0] = reg[x_id * 4 + 0];
detection_boxes[count * 4 + 1] = reg[x_id * 4 + 1];
detection_boxes[count * 4 + 2] = reg[x_id * 4 + 2];
detection_boxes[count * 4 + 3] = reg[x_id * 4 + 3];
detection_scores[count] = maxProb;
detection_classes[count] = maxIndex;
}
cudaError_t cudaYoloLayer_e(const void* cls, const void* reg, void* num_detections, void* detection_boxes,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& outputSize,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& numOutputClasses,
cudaStream_t stream);
cudaError_t cudaYoloLayer_e(const void* cls, const void* reg, void* num_detections, void* detection_boxes,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& outputSize,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& numOutputClasses,
cudaStream_t stream)
{
int threads_per_block = 16;
int number_of_blocks = (outputSize / threads_per_block) + 1;
for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_e<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(cls) + (batch * numOutputClasses * outputSize),
reinterpret_cast<const float*>(reg) + (batch * 4 * outputSize), reinterpret_cast<int*>(num_detections) + (batch),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize), scoreThreshold, netWidth, netHeight,
numOutputClasses, outputSize);
}
return cudaGetLastError();
}

View File

@@ -5,10 +5,9 @@
#include <stdint.h> #include <stdint.h>
__global__ void gpuYoloLayer_nc(const float* input, int* num_detections, float* detection_boxes, float* detection_scores, __global__ void gpuYoloLayer_nc(const float* input, float* output, int* count, const uint netWidth, const uint netHeight,
int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight, const uint gridSizeX, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY,
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY, const float* anchors, const float* anchors, const int* mask)
const int* mask)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -22,18 +21,13 @@ __global__ void gpuYoloLayer_nc(const float* input, int* num_detections, float*
const float objectness = input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]; const float objectness = input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)];
if (objectness < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
const float alpha = scaleXY; const float alpha = scaleXY;
const float beta = -0.5 * (scaleXY - 1); const float beta = -0.5 * (scaleXY - 1);
float x = (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)] * alpha + beta + x_id) * netWidth / float xc = (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)] * alpha + beta + x_id) * netWidth /
gridSizeX; gridSizeX;
float y = (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)] * alpha + beta + y_id) * netHeight / float yc = (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)] * alpha + beta + y_id) * netHeight /
gridSizeY; gridSizeY;
float w = __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)] * 2, 2) * anchors[mask[z_id] * 2]; float w = __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)] * 2, 2) * anchors[mask[z_id] * 2];
@@ -51,23 +45,26 @@ __global__ void gpuYoloLayer_nc(const float* input, int* num_detections, float*
} }
} }
detection_boxes[count * 4 + 0] = x - 0.5 * w; int _count = (int)atomicAdd(count, 1);
detection_boxes[count * 4 + 1] = y - 0.5 * h;
detection_boxes[count * 4 + 2] = x + 0.5 * w; output[_count * 7 + 0] = xc;
detection_boxes[count * 4 + 3] = y + 0.5 * h; output[_count * 7 + 1] = yc;
detection_scores[count] = objectness * maxProb; output[_count * 7 + 2] = w;
detection_classes[count] = maxIndex; output[_count * 7 + 3] = h;
output[_count * 7 + 4] = maxProb;
output[_count * 7 + 5] = maxIndex;
output[_count * 7 + 6] = objectness;
} }
cudaError_t cudaYoloLayer_nc(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream); cudaStream_t stream);
cudaError_t cudaYoloLayer_nc(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream) cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -75,12 +72,11 @@ cudaError_t cudaYoloLayer_nc(const void* input, void* num_detections, void* dete
for (unsigned int batch = 0; batch < batchSize; ++batch) { for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * inputSize), reinterpret_cast<int*>(num_detections) + (batch), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize), reinterpret_cast<float*> (output) + (batch * 7 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize), reinterpret_cast<int*> (count) + (batch),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize), scoreThreshold, netWidth, netHeight, gridSizeX, netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY,
gridSizeY, numOutputClasses, numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors), reinterpret_cast<const float*> (anchors), reinterpret_cast<const int*> (mask));
reinterpret_cast<const int*>(mask));
} }
return cudaGetLastError(); return cudaGetLastError();
} }

View File

@@ -1,90 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include <stdint.h>
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
__global__ void gpuYoloLayer_r(const float* input, int* num_detections, float* detection_boxes, float* detection_scores,
int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight, const uint gridSizeX,
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY, const float* anchors,
const int* mask)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
uint z_id = blockIdx.z * blockDim.z + threadIdx.z;
if (x_id >= gridSizeX || y_id >= gridSizeY || z_id >= numBBoxes)
return;
const int numGridCells = gridSizeX * gridSizeY;
const int bbindex = y_id * gridSizeX + x_id;
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
if (objectness < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
const float alpha = scaleXY;
const float beta = -0.5 * (scaleXY - 1);
float x = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) * alpha + beta + x_id)
* netWidth / gridSizeX;
float y = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) * alpha + beta + y_id)
* netHeight / gridSizeY;
float w = __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * 2, 2)
* anchors[mask[z_id] * 2];
float h = __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * 2, 2)
* anchors[mask[z_id] * 2 + 1];
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numOutputClasses; ++i) {
float prob = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + (5 + i))]);
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
detection_boxes[count * 4 + 0] = x - 0.5 * w;
detection_boxes[count * 4 + 1] = y - 0.5 * h;
detection_boxes[count * 4 + 2] = x + 0.5 * w;
detection_boxes[count * 4 + 3] = y + 0.5 * h;
detection_scores[count] = objectness * maxProb;
detection_classes[count] = maxIndex;
}
cudaError_t cudaYoloLayer_r(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer_r(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
{
dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
(numBBoxes / threads_per_block.z) + 1);
for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_r<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * inputSize), reinterpret_cast<int*>(num_detections) + (batch),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize), scoreThreshold, netWidth, netHeight, gridSizeX,
gridSizeY, numOutputClasses, numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors),
reinterpret_cast<const int*>(mask));
}
return cudaGetLastError();
}

View File

@@ -27,9 +27,9 @@ __device__ void softmaxGPU(const float* input, const int bbindex, const int numG
} }
} }
__global__ void gpuRegionLayer(const float* input, float* softmax, int* num_detections, float* detection_boxes, __global__ void gpuRegionLayer(const float* input, float* softmax, float* output, int* count, const uint netWidth,
float* detection_scores, int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight, const uint netHeight, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes,
const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float* anchors) const float* anchors)
{ {
uint x_id = blockIdx.x * blockDim.x + threadIdx.x; uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y; uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
@@ -43,14 +43,9 @@ __global__ void gpuRegionLayer(const float* input, float* softmax, int* num_dete
const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]); const float objectness = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
if (objectness < scoreThreshold) float xc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) + x_id) * netWidth / gridSizeX;
return;
int count = (int)atomicAdd(num_detections, 1); float yc = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) + y_id) * netHeight / gridSizeY;
float x = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) + x_id) * netWidth / gridSizeX;
float y = (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) + y_id) * netHeight / gridSizeY;
float w = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[z_id * 2] * netWidth / float w = __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[z_id * 2] * netWidth /
gridSizeX; gridSizeX;
@@ -71,23 +66,24 @@ __global__ void gpuRegionLayer(const float* input, float* softmax, int* num_dete
} }
} }
detection_boxes[count * 4 + 0] = x - 0.5 * w; int _count = (int)atomicAdd(count, 1);
detection_boxes[count * 4 + 1] = y - 0.5 * h;
detection_boxes[count * 4 + 2] = x + 0.5 * w; output[_count * 7 + 0] = xc;
detection_boxes[count * 4 + 3] = y + 0.5 * h; output[_count * 7 + 1] = yc;
detection_scores[count] = objectness * maxProb; output[_count * 7 + 2] = w;
detection_classes[count] = maxIndex; output[_count * 7 + 3] = h;
output[_count * 7 + 4] = maxProb;
output[_count * 7 + 5] = maxIndex;
output[_count * 7 + 6] = objectness;
} }
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* num_detections, void* detection_boxes, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream);
const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream);
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* num_detections, void* detection_boxes, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream)
const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream)
{ {
dim3 threads_per_block(16, 16, 4); dim3 threads_per_block(16, 16, 4);
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1, dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1, (gridSizeY / threads_per_block.y) + 1,
@@ -95,12 +91,12 @@ cudaError_t cudaRegionLayer(const void* input, void* softmax, void* num_detectio
for (unsigned int batch = 0; batch < batchSize; ++batch) { for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>( gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * inputSize), reinterpret_cast<float*>(softmax) + (batch * inputSize), reinterpret_cast<const float*> (input) + (batch * inputSize),
reinterpret_cast<int*>(num_detections) + (batch), reinterpret_cast<float*> (softmax) + (batch * inputSize),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize), reinterpret_cast<float*> (output) + (batch * 7 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize), reinterpret_cast<int*> (count) + (batch),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize), scoreThreshold, netWidth, netHeight, gridSizeX, netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes,
gridSizeY, numOutputClasses, numBBoxes, reinterpret_cast<const float*>(anchors)); reinterpret_cast<const float*> (anchors));
} }
return cudaGetLastError(); return cudaGetLastError();
} }

View File

@@ -1,62 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include <stdint.h>
__global__ void gpuYoloLayer_v8(const float* input, int* num_detections, float* detection_boxes, float* detection_scores,
int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight,
const uint numOutputClasses, const uint64_t outputSize)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numOutputClasses; ++i) {
float prob = input[x_id * (4 + numOutputClasses) + 4 + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
if (maxProb < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
detection_boxes[count * 4 + 0] = input[x_id * (4 + numOutputClasses) + 0];
detection_boxes[count * 4 + 1] = input[x_id * (4 + numOutputClasses) + 1];
detection_boxes[count * 4 + 2] = input[x_id * (4 + numOutputClasses) + 2];
detection_boxes[count * 4 + 3] = input[x_id * (4 + numOutputClasses) + 3];
detection_scores[count] = maxProb;
detection_classes[count] = maxIndex;
}
cudaError_t cudaYoloLayer_v8(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth,
const uint& netHeight, const uint& numOutputClasses, cudaStream_t stream);
cudaError_t cudaYoloLayer_v8(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth,
const uint& netHeight, const uint& numOutputClasses, cudaStream_t stream)
{
int threads_per_block = 16;
int number_of_blocks = (outputSize / threads_per_block) + 1;
for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_v8<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * (4 + numOutputClasses) * outputSize),
reinterpret_cast<int*>(num_detections) + (batch),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize),
scoreThreshold, netWidth, netHeight, numOutputClasses, outputSize);
}
return cudaGetLastError();
}

View File

@@ -1,73 +0,0 @@
/*
* Created by Marcos Luciano
* https://www.github.com/marcoslucianops
*/
#include <stdint.h>
__global__ void gpuYoloLayer_x(const float* input, int* num_detections, float* detection_boxes, float* detection_scores,
int* detection_classes, const float scoreThreshold, const uint netWidth, const uint netHeight,
const uint numOutputClasses, const uint64_t outputSize, const float* anchors, const int* mask)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
if (x_id >= outputSize)
return;
const float objectness = input[x_id * (5 + numOutputClasses) + 4];
if (objectness < scoreThreshold)
return;
int count = (int)atomicAdd(num_detections, 1);
float x = (input[x_id * (5 + numOutputClasses) + 0] + anchors[x_id * 2]) * mask[x_id];
float y = (input[x_id * (5 + numOutputClasses) + 1] + anchors[x_id * 2 + 1]) * mask[x_id];
float w = __expf(input[x_id * (5 + numOutputClasses) + 2]) * mask[x_id];
float h = __expf(input[x_id * (5 + numOutputClasses) + 3]) * mask[x_id];
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numOutputClasses; ++i) {
float prob = input[x_id * (5 + numOutputClasses) + 5 + i];
if (prob > maxProb) {
maxProb = prob;
maxIndex = i;
}
}
detection_boxes[count * 4 + 0] = x - 0.5 * w;
detection_boxes[count * 4 + 1] = y - 0.5 * h;
detection_boxes[count * 4 + 2] = x + 0.5 * w;
detection_boxes[count * 4 + 3] = y + 0.5 * h;
detection_scores[count] = objectness * maxProb;
detection_classes[count] = maxIndex;
}
cudaError_t cudaYoloLayer_x(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth,
const uint& netHeight, const uint& numOutputClasses, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer_x(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth,
const uint& netHeight, const uint& numOutputClasses, const void* anchors, const void* mask, cudaStream_t stream)
{
int threads_per_block = 16;
int number_of_blocks = (outputSize / threads_per_block) + 1;
for (unsigned int batch = 0; batch < batchSize; ++batch) {
gpuYoloLayer_x<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * (5 + numOutputClasses) * outputSize),
reinterpret_cast<int*>(num_detections) + (batch),
reinterpret_cast<float*>(detection_boxes) + (batch * 4 * outputSize),
reinterpret_cast<float*>(detection_scores) + (batch * outputSize),
reinterpret_cast<int*>(detection_classes) + (batch * outputSize),
scoreThreshold, netWidth, netHeight, numOutputClasses, outputSize, reinterpret_cast<const float*>(anchors),
reinterpret_cast<const int*>(mask));
}
return cudaGetLastError();
}

View File

@@ -38,38 +38,19 @@ namespace {
} }
} }
cudaError_t cudaYoloLayer_x(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer_nc(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netHeight, const uint& numOutputClasses, const void* anchors, const void* mask, cudaStream_t stream); const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
cudaError_t cudaYoloLayer_v8(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& outputSize, const float& scoreThreshold, const uint& netWidth,
const uint& netHeight, const uint& numOutputClasses, cudaStream_t stream);
cudaError_t cudaYoloLayer_e(const void* cls, const void* reg, void* num_detections, void* detection_boxes,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& outputSize,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& numOutputClasses,
cudaStream_t stream); cudaStream_t stream);
cudaError_t cudaYoloLayer_r(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaYoloLayer(const void* input, void* output, void* count, const uint& batchSize, uint64_t& inputSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numOutputClasses, const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream); cudaStream_t stream);
cudaError_t cudaYoloLayer_nc(const void* input, void* num_detections, void* detection_boxes, void* detection_scores, cudaError_t cudaRegionLayer(const void* input, void* softmax, void* output, void* count, const uint& batchSize,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold, uint64_t& inputSize, uint64_t& outputSize, const uint& netWidth, const uint& netHeight, const uint& gridSizeX,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream);
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaYoloLayer(const void* input, void* num_detections, void* detection_boxes, void* detection_scores,
void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize, const float& scoreThreshold,
const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
cudaError_t cudaRegionLayer(const void* input, void* softmax, void* num_detections, void* detection_boxes,
void* detection_scores, void* detection_classes, const uint& batchSize, uint64_t& inputSize, uint64_t& outputSize,
const float& scoreThreshold, const uint& netWidth, const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY,
const uint& numOutputClasses, const uint& numBBoxes, const void* anchors, cudaStream_t stream);
YoloLayer::YoloLayer(const void* data, size_t length) { YoloLayer::YoloLayer(const void* data, size_t length) {
const char* d = static_cast<const char*>(data); const char* d = static_cast<const char*>(data);
@@ -79,10 +60,7 @@ YoloLayer::YoloLayer(const void* data, size_t length) {
read(d, m_NumClasses); read(d, m_NumClasses);
read(d, m_NewCoords); read(d, m_NewCoords);
read(d, m_OutputSize); read(d, m_OutputSize);
read(d, m_Type);
read(d, m_ScoreThreshold);
if (m_Type != 3 && m_Type != 4) {
uint yoloTensorsSize; uint yoloTensorsSize;
read(d, yoloTensorsSize); read(d, yoloTensorsSize);
for (uint i = 0; i < yoloTensorsSize; ++i) { for (uint i = 0; i < yoloTensorsSize; ++i) {
@@ -110,14 +88,12 @@ YoloLayer::YoloLayer(const void* data, size_t length) {
m_YoloTensors.push_back(curYoloTensor); m_YoloTensors.push_back(curYoloTensor);
} }
}
}; };
YoloLayer::YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords, YoloLayer::YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords,
const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize, const uint& modelType, const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize) : m_NetWidth(netWidth),
const float& scoreThreshold) : m_NetWidth(netWidth), m_NetHeight(netHeight), m_NumClasses(numClasses), m_NetHeight(netHeight), m_NumClasses(numClasses), m_NewCoords(newCoords), m_YoloTensors(yoloTensors),
m_NewCoords(newCoords), m_YoloTensors(yoloTensors), m_OutputSize(outputSize), m_Type(modelType), m_OutputSize(outputSize)
m_ScoreThreshold(scoreThreshold)
{ {
assert(m_NetWidth > 0); assert(m_NetWidth > 0);
assert(m_NetHeight > 0); assert(m_NetHeight > 0);
@@ -126,12 +102,8 @@ YoloLayer::YoloLayer(const uint& netWidth, const uint& netHeight, const uint& nu
nvinfer1::Dims nvinfer1::Dims
YoloLayer::getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept YoloLayer::getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept
{ {
assert(index <= 4); assert(index == 0);
if (index == 0) return nvinfer1::Dims{2, {static_cast<int>(m_OutputSize), 7}};
return nvinfer1::Dims{1, {1}};
else if (index == 1)
return nvinfer1::Dims{2, {static_cast<int>(m_OutputSize), 4}};
return nvinfer1::Dims{1, {static_cast<int>(m_OutputSize)}};
} }
bool bool
@@ -152,53 +124,12 @@ int32_t
YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream)
noexcept noexcept
{ {
void* num_detections = outputs[0]; void* output = outputs[0];
void* detection_boxes = outputs[1]; CUDA_CHECK(cudaMemsetAsync((float*) output, 0, sizeof(float) * m_OutputSize * 7 * batchSize, stream));
void* detection_scores = outputs[2];
void* detection_classes = outputs[3];
CUDA_CHECK(cudaMemsetAsync((int*)num_detections, 0, sizeof(int) * batchSize, stream)); void* count = workspace;
CUDA_CHECK(cudaMemsetAsync((float*)detection_boxes, 0, sizeof(float) * m_OutputSize * 4 * batchSize, stream)); CUDA_CHECK(cudaMemsetAsync((int*) count, 0, sizeof(int) * batchSize, stream));
CUDA_CHECK(cudaMemsetAsync((float*)detection_scores, 0, sizeof(float) * m_OutputSize * batchSize, stream));
CUDA_CHECK(cudaMemsetAsync((int*)detection_classes, 0, sizeof(int) * m_OutputSize * batchSize, stream));
if (m_Type == 5) {
TensorInfo& curYoloTensor = m_YoloTensors.at(0);
std::vector<float> anchors = curYoloTensor.anchors;
std::vector<int> mask = curYoloTensor.mask;
void* v_anchors;
void* v_mask;
if (anchors.size() > 0) {
float* f_anchors = anchors.data();
CUDA_CHECK(cudaMalloc(&v_anchors, sizeof(float) * anchors.size()));
CUDA_CHECK(cudaMemcpyAsync(v_anchors, f_anchors, sizeof(float) * anchors.size(), cudaMemcpyHostToDevice, stream));
}
if (mask.size() > 0) {
int* f_mask = mask.data();
CUDA_CHECK(cudaMalloc(&v_mask, sizeof(int) * mask.size()));
CUDA_CHECK(cudaMemcpyAsync(v_mask, f_mask, sizeof(int) * mask.size(), cudaMemcpyHostToDevice, stream));
}
CUDA_CHECK(cudaYoloLayer_x(inputs[0], num_detections, detection_boxes, detection_scores, detection_classes, batchSize,
m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, m_NumClasses, v_anchors, v_mask, stream));
if (anchors.size() > 0) {
CUDA_CHECK(cudaFree(v_anchors));
}
if (mask.size() > 0) {
CUDA_CHECK(cudaFree(v_mask));
}
}
else if (m_Type == 4) {
CUDA_CHECK(cudaYoloLayer_v8(inputs[0], num_detections, detection_boxes, detection_scores, detection_classes, batchSize,
m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, m_NumClasses, stream));
}
else if (m_Type == 3) {
CUDA_CHECK(cudaYoloLayer_e(inputs[0], inputs[1], num_detections, detection_boxes, detection_scores, detection_classes,
batchSize, m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, m_NumClasses, stream));
}
else {
uint yoloTensorsSize = m_YoloTensors.size(); uint yoloTensorsSize = m_YoloTensors.size();
for (uint i = 0; i < yoloTensorsSize; ++i) { for (uint i = 0; i < yoloTensorsSize; ++i) {
TensorInfo& curYoloTensor = m_YoloTensors.at(i); TensorInfo& curYoloTensor = m_YoloTensors.at(i);
@@ -213,33 +144,24 @@ YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* output
void* v_anchors; void* v_anchors;
void* v_mask; void* v_mask;
if (anchors.size() > 0) { if (anchors.size() > 0) {
float* f_anchors = anchors.data();
CUDA_CHECK(cudaMalloc(&v_anchors, sizeof(float) * anchors.size())); CUDA_CHECK(cudaMalloc(&v_anchors, sizeof(float) * anchors.size()));
CUDA_CHECK(cudaMemcpyAsync(v_anchors, f_anchors, sizeof(float) * anchors.size(), cudaMemcpyHostToDevice, stream)); CUDA_CHECK(cudaMemcpyAsync(v_anchors, anchors.data(), sizeof(float) * anchors.size(), cudaMemcpyHostToDevice, stream));
} }
if (mask.size() > 0) { if (mask.size() > 0) {
int* f_mask = mask.data();
CUDA_CHECK(cudaMalloc(&v_mask, sizeof(int) * mask.size())); CUDA_CHECK(cudaMalloc(&v_mask, sizeof(int) * mask.size()));
CUDA_CHECK(cudaMemcpyAsync(v_mask, f_mask, sizeof(int) * mask.size(), cudaMemcpyHostToDevice, stream)); CUDA_CHECK(cudaMemcpyAsync(v_mask, mask.data(), sizeof(int) * mask.size(), cudaMemcpyHostToDevice, stream));
} }
uint64_t inputSize = gridSizeX * gridSizeY * (numBBoxes * (4 + 1 + m_NumClasses)); uint64_t inputSize = gridSizeX * gridSizeY * (numBBoxes * (4 + 1 + m_NumClasses));
if (m_Type == 2) { // YOLOR incorrect param: scale_x_y = 2.0 if (mask.size() > 0) {
CUDA_CHECK(cudaYoloLayer_r(inputs[i], num_detections, detection_boxes, detection_scores, detection_classes,
batchSize, inputSize, m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, gridSizeX, gridSizeY,
m_NumClasses, numBBoxes, 2.0, v_anchors, v_mask, stream));
}
else if (m_Type == 1) {
if (m_NewCoords) { if (m_NewCoords) {
CUDA_CHECK(cudaYoloLayer_nc( inputs[i], num_detections, detection_boxes, detection_scores, detection_classes, CUDA_CHECK(cudaYoloLayer_nc(inputs[i], output, count, batchSize, inputSize, m_OutputSize, m_NetWidth, m_NetHeight,
batchSize, inputSize, m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
} }
else { else {
CUDA_CHECK(cudaYoloLayer(inputs[i], num_detections, detection_boxes, detection_scores, detection_classes, CUDA_CHECK(cudaYoloLayer(inputs[i], output, count, batchSize, inputSize, m_OutputSize, m_NetWidth, m_NetHeight,
batchSize, inputSize, m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
m_NumClasses, numBBoxes, scaleXY, v_anchors, v_mask, stream));
} }
} }
else { else {
@@ -247,9 +169,8 @@ YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* output
CUDA_CHECK(cudaMalloc(&softmax, sizeof(float) * inputSize * batchSize)); CUDA_CHECK(cudaMalloc(&softmax, sizeof(float) * inputSize * batchSize));
CUDA_CHECK(cudaMemsetAsync((float*)softmax, 0, sizeof(float) * inputSize * batchSize, stream)); CUDA_CHECK(cudaMemsetAsync((float*)softmax, 0, sizeof(float) * inputSize * batchSize, stream));
CUDA_CHECK(cudaRegionLayer(inputs[i], softmax, num_detections, detection_boxes, detection_scores, detection_classes, CUDA_CHECK(cudaRegionLayer(inputs[i], softmax, output, count, batchSize, inputSize, m_OutputSize, m_NetWidth,
batchSize, inputSize, m_OutputSize, m_ScoreThreshold, m_NetWidth, m_NetHeight, gridSizeX, gridSizeY, m_NetHeight, gridSizeX, gridSizeY, m_NumClasses, numBBoxes, v_anchors, stream));
m_NumClasses, numBBoxes, v_anchors, stream));
CUDA_CHECK(cudaFree(softmax)); CUDA_CHECK(cudaFree(softmax));
} }
@@ -261,7 +182,6 @@ YoloLayer::enqueue(int batchSize, void const* const* inputs, void* const* output
CUDA_CHECK(cudaFree(v_mask)); CUDA_CHECK(cudaFree(v_mask));
} }
} }
}
return 0; return 0;
} }
@@ -276,10 +196,7 @@ YoloLayer::getSerializationSize() const noexcept
totalSize += sizeof(m_NumClasses); totalSize += sizeof(m_NumClasses);
totalSize += sizeof(m_NewCoords); totalSize += sizeof(m_NewCoords);
totalSize += sizeof(m_OutputSize); totalSize += sizeof(m_OutputSize);
totalSize += sizeof(m_Type);
totalSize += sizeof(m_ScoreThreshold);
if (m_Type != 3 && m_Type != 4) {
uint yoloTensorsSize = m_YoloTensors.size(); uint yoloTensorsSize = m_YoloTensors.size();
totalSize += sizeof(yoloTensorsSize); totalSize += sizeof(yoloTensorsSize);
@@ -292,7 +209,6 @@ YoloLayer::getSerializationSize() const noexcept
totalSize += sizeof(uint) + sizeof(curYoloTensor.anchors[0]) * curYoloTensor.anchors.size(); totalSize += sizeof(uint) + sizeof(curYoloTensor.anchors[0]) * curYoloTensor.anchors.size();
totalSize += sizeof(uint) + sizeof(curYoloTensor.mask[0]) * curYoloTensor.mask.size(); totalSize += sizeof(uint) + sizeof(curYoloTensor.mask[0]) * curYoloTensor.mask.size();
} }
}
return totalSize; return totalSize;
} }
@@ -307,10 +223,7 @@ YoloLayer::serialize(void* buffer) const noexcept
write(d, m_NumClasses); write(d, m_NumClasses);
write(d, m_NewCoords); write(d, m_NewCoords);
write(d, m_OutputSize); write(d, m_OutputSize);
write(d, m_Type);
write(d, m_ScoreThreshold);
if (m_Type != 3 && m_Type != 4) {
uint yoloTensorsSize = m_YoloTensors.size(); uint yoloTensorsSize = m_YoloTensors.size();
write(d, yoloTensorsSize); write(d, yoloTensorsSize);
for (uint i = 0; i < yoloTensorsSize; ++i) { for (uint i = 0; i < yoloTensorsSize; ++i) {
@@ -331,13 +244,11 @@ YoloLayer::serialize(void* buffer) const noexcept
write(d, curYoloTensor.mask[j]); write(d, curYoloTensor.mask[j]);
} }
} }
}
nvinfer1::IPluginV2* nvinfer1::IPluginV2*
YoloLayer::clone() const noexcept YoloLayer::clone() const noexcept
{ {
return new YoloLayer(m_NetWidth, m_NetHeight, m_NumClasses, m_NewCoords, m_YoloTensors, m_OutputSize, m_Type, return new YoloLayer(m_NetWidth, m_NetHeight, m_NumClasses, m_NewCoords, m_YoloTensors, m_OutputSize);
m_ScoreThreshold);
} }
REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator); REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator);

View File

@@ -48,14 +48,13 @@ class YoloLayer : public nvinfer1::IPluginV2 {
YoloLayer(const void* data, size_t length); YoloLayer(const void* data, size_t length);
YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords, YoloLayer(const uint& netWidth, const uint& netHeight, const uint& numClasses, const uint& newCoords,
const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize, const uint& modelType, const std::vector<TensorInfo>& yoloTensors, const uint64_t& outputSize);
const float& scoreThreshold);
const char* getPluginType() const noexcept override { return YOLOLAYER_PLUGIN_NAME; } const char* getPluginType() const noexcept override { return YOLOLAYER_PLUGIN_NAME; }
const char* getPluginVersion() const noexcept override { return YOLOLAYER_PLUGIN_VERSION; } const char* getPluginVersion() const noexcept override { return YOLOLAYER_PLUGIN_VERSION; }
int getNbOutputs() const noexcept override { return 4; } int getNbOutputs() const noexcept override { return 1; }
nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept override; nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims) noexcept override;
@@ -68,7 +67,9 @@ class YoloLayer : public nvinfer1::IPluginV2 {
void terminate() noexcept override {} void terminate() noexcept override {}
size_t getWorkspaceSize(int maxBatchSize) const noexcept override { return 0; } size_t getWorkspaceSize(int maxBatchSize) const noexcept override {
return maxBatchSize * sizeof(int);
}
int32_t enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) int32_t enqueue(int batchSize, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream)
noexcept override; noexcept override;
@@ -93,8 +94,6 @@ class YoloLayer : public nvinfer1::IPluginV2 {
uint m_NewCoords {0}; uint m_NewCoords {0};
std::vector<TensorInfo> m_YoloTensors; std::vector<TensorInfo> m_YoloTensors;
uint64_t m_OutputSize {0}; uint64_t m_OutputSize {0};
uint m_Type {0};
float m_ScoreThreshold {0};
}; };
class YoloLayerPluginCreator : public nvinfer1::IPluginCreator { class YoloLayerPluginCreator : public nvinfer1::IPluginCreator {

79
utils/export_ppyoloe.py Normal file
View File

@@ -0,0 +1,79 @@
import os
import sys
import onnx
import paddle
import paddle.nn as nn
from ppdet.core.workspace import load_config, merge_config
from ppdet.utils.check import check_version, check_config
from ppdet.utils.cli import ArgsParser
from ppdet.engine import Trainer
from ppdet.slim import build_slim_model
class DeepStreamOutput(nn.Layer):
def __init__(self):
super().__init__()
def forward(self, x):
boxes = x['bbox']
x['bbox_num'] = x['bbox_num'].transpose([0, 2, 1])
scores = paddle.max(x['bbox_num'], 2, keepdim=True)
classes = paddle.cast(paddle.argmax(x['bbox_num'], 2, keepdim=True), dtype='float32')
return paddle.concat((boxes, scores, classes), axis=2)
def ppyoloe_export(FLAGS):
cfg = load_config(FLAGS.config)
FLAGS.opt['weights'] = FLAGS.weights
FLAGS.opt['exclude_nms'] = True
merge_config(FLAGS.opt)
if FLAGS.slim_config:
cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test')
merge_config(FLAGS.opt)
check_config(cfg)
check_version()
trainer = Trainer(cfg, mode='test')
trainer.load_weights(cfg.weights)
trainer.model.eval()
if not os.path.exists('.tmp'):
os.makedirs('.tmp')
static_model, _ = trainer._get_infer_cfg_and_input_spec('.tmp')
os.system('rm -r .tmp')
return cfg, static_model
def main(FLAGS):
paddle.set_device('cpu')
cfg, model = ppyoloe_export(FLAGS)
model = nn.Sequential(model, DeepStreamOutput())
img_size = [cfg.eval_height, cfg.eval_width]
onnx_input_im = {}
onnx_input_im['image'] = paddle.static.InputSpec(shape=[None, 3, *img_size], dtype='float32', name='image')
onnx_input_im['scale_factor'] = paddle.static.InputSpec(shape=[None, 2], dtype='float32', name='scale_factor')
onnx_output_file = cfg.filename + '.onnx'
paddle.onnx.export(model, cfg.filename, input_spec=[onnx_input_im], opset_version=FLAGS.opset)
if FLAGS.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = ArgsParser()
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pdparams) file path (required)')
parser.add_argument('--slim_config', default=None, type=str, help='Slim configuration file of slim method')
parser.add_argument('--opset', type=int, default=11, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
return args
if __name__ == '__main__':
FLAGS = parse_args()
sys.exit(main(FLAGS))

82
utils/export_yoloV5.py Normal file
View File

@@ -0,0 +1,82 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
from models.experimental import attempt_load
from utils.torch_utils import select_device
from models.yolo import Detect
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
x = x[0]
boxes = x[:, :, :4]
objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores, classes, objectness), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolov5_export(weights, device):
model = attempt_load(weights, device=device, inplace=True, fuse=True)
model.eval()
for k, m in model.named_modules():
if isinstance(m, Detect):
m.inplace = False
m.dynamic = False
m.export = True
return model
def main(args):
suppress_warnings()
device = select_device('cpu')
model = yolov5_export(args.weights, device)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
if img_size == [640, 640] and args.p6:
img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOv5 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--p6', action='store_true', help='P6 model')
parser.add_argument('--opset', type=int, default=17, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

88
utils/export_yoloV6.py Normal file
View File

@@ -0,0 +1,88 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
from yolov6.utils.checkpoint import load_checkpoint
from yolov6.layers.common import RepVGGBlock, ConvModule, SiLU
from yolov6.models.effidehead import Detect
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
print(x)
boxes = x[:, :, :4]
objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores, classes, objectness), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolov6_export(weights, device):
model = load_checkpoint(weights, map_location=device, inplace=True, fuse=True)
for layer in model.modules():
if isinstance(layer, RepVGGBlock):
layer.switch_to_deploy()
elif isinstance(layer, nn.Upsample) and not hasattr(layer, 'recompute_scale_factor'):
layer.recompute_scale_factor = None
model.eval()
for k, m in model.named_modules():
if isinstance(m, ConvModule):
if hasattr(m, 'act') and isinstance(m.act, nn.SiLU):
m.act = SiLU()
elif isinstance(m, Detect):
m.inplace = False
return model
def main(args):
suppress_warnings()
device = torch.device('cpu')
model = yolov6_export(args.weights, device)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
if img_size == [640, 640] and args.p6:
img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOv6 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--p6', action='store_true', help='P6 model')
parser.add_argument('--opset', type=int, default=13, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

86
utils/export_yoloV7.py Normal file
View File

@@ -0,0 +1,86 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
import models
from models.experimental import attempt_load
from utils.torch_utils import select_device
from utils.activations import Hardswish, SiLU
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
boxes = x[:, :, :4]
objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores, classes, objectness), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolov7_export(weights, device):
model = attempt_load(weights, map_location=device)
for k, m in model.named_modules():
m._non_persistent_buffers_set = set()
if isinstance(m, models.common.Conv):
if isinstance(m.act, nn.Hardswish):
m.act = Hardswish()
elif isinstance(m.act, nn.SiLU):
m.act = SiLU()
model.model[-1].export = False
model.model[-1].concat = True
model.eval()
return model
def main(args):
suppress_warnings()
device = select_device('cpu')
model = yolov7_export(args.weights, device)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
if img_size == [640, 640] and args.p6:
img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOv7 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--p6', action='store_true', help='P6 model')
parser.add_argument('--opset', type=int, default=12, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

85
utils/export_yoloV8.py Normal file
View File

@@ -0,0 +1,85 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
from copy import deepcopy
from ultralytics import YOLO
from ultralytics.yolo.utils.torch_utils import select_device
from ultralytics.nn.modules import C2f, Detect
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
x = x.transpose(1, 2)
boxes = x[:, :, :4]
scores, classes = torch.max(x[:, :, 4:], 2, keepdim=True)
return torch.cat((boxes, scores, classes), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolov8_export(weights, device):
model = YOLO(weights)
model = deepcopy(model.model).to(device)
for p in model.parameters():
p.requires_grad = False
model.eval()
model.float()
model = model.fuse()
for k, m in model.named_modules():
if isinstance(m, Detect):
m.dynamic = False
m.export = True
m.format = 'onnx'
elif isinstance(m, C2f):
m.forward = m.forward_split
return model
def main(args):
suppress_warnings()
device = select_device('cpu')
model = yolov8_export(args.weights, device)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOv8 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--opset', type=int, default=16, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

75
utils/export_yolonas.py Normal file
View File

@@ -0,0 +1,75 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
from super_gradients.training import models
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
boxes = x[0]
scores, classes = torch.max(x[1], 2, keepdim=True)
return torch.cat((boxes, scores, classes), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolonas_export(model_name, weights, num_classes, size):
img_size = size * 2 if len(size) == 1 else size
model = models.get(model_name, num_classes=num_classes, checkpoint_path=weights)
model.eval()
model.prep_model_for_conversion(input_size=[1, 3, *img_size])
return model
def main(args):
suppress_warnings()
device = torch.device('cpu')
model = yolonas_export(args.model, args.weights, args.classes, args.size)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLO-NAS conversion')
parser.add_argument('-m', '--model', required=True, help='Model name (required)')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pth) file path (required)')
parser.add_argument('-n', '--classes', type=int, default=80, help='Number of trained classes (default 80)')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--opset', type=int, default=14, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if args.model == '':
raise SystemExit('Invalid model name')
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

99
utils/export_yolor.py Normal file
View File

@@ -0,0 +1,99 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
x = x[0]
boxes = x[:, :, :4]
objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores, classes, objectness), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolor_export(weights, cfg, size, device):
if os.path.isfile('models/experimental.py'):
import models
from models.experimental import attempt_load
from utils.activations import Hardswish
model = attempt_load(weights, map_location=device)
for k, m in model.named_modules():
m._non_persistent_buffers_set = set()
if isinstance(m, models.common.Conv) and isinstance(m.act, nn.Hardswish):
m.act = Hardswish()
elif isinstance(m, nn.Upsample) and not hasattr(m, 'recompute_scale_factor'):
m.recompute_scale_factor = None
model.model[-1].training = False
model.model[-1].export = False
else:
from models.models import Darknet
model_name = os.path.basename(weights).split('.pt')[0]
if cfg == '':
cfg = 'cfg/' + model_name + '.cfg'
if not os.path.isfile(cfg):
raise SystemExit('CFG file not found')
model = Darknet(cfg, img_size=size[::-1]).to(device)
model.load_state_dict(torch.load(weights, map_location=device)['model'])
model.float()
model.fuse()
model.eval()
model.module_list[-1].training = False
return model
def main(args):
suppress_warnings()
device = torch.device('cpu')
model = yolor_export(args.weights, args.cfg, args.size, device)
model = nn.Sequential(model, DeepStreamOutput())
img_size = args.size * 2 if len(args.size) == 1 else args.size
if img_size == [640, 640] and args.p6:
img_size = [1280] * 2
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOR conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-c', '--cfg', default='', help='Input cfg (.cfg) file path')
parser.add_argument('-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
parser.add_argument('--p6', action='store_true', help='P6 model')
parser.add_argument('--opset', type=int, default=12, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

81
utils/export_yolox.py Normal file
View File

@@ -0,0 +1,81 @@
import os
import sys
import argparse
import warnings
import onnx
import torch
import torch.nn as nn
from yolox.exp import get_exp
from yolox.utils import replace_module
from yolox.models.network_blocks import SiLU
class DeepStreamOutput(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
boxes = x[:, :, :4]
objectness = x[:, :, 4:5]
scores, classes = torch.max(x[:, :, 5:], 2, keepdim=True)
return torch.cat((boxes, scores, classes, objectness), dim=2)
def suppress_warnings():
warnings.filterwarnings('ignore', category=torch.jit.TracerWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)
def yolox_export(weights, exp_file):
exp = get_exp(exp_file)
model = exp.get_model()
ckpt = torch.load(weights, map_location='cpu')
model.eval()
if 'model' in ckpt:
ckpt = ckpt['model']
model.load_state_dict(ckpt)
model = replace_module(model, nn.SiLU, SiLU)
model.head.decode_in_inference = True
return model, exp
def main(args):
suppress_warnings()
device = torch.device('cpu')
model, exp = yolox_export(args.weights, args.exp)
model = nn.Sequential(model, DeepStreamOutput())
img_size = [exp.input_size[1], exp.input_size[0]]
onnx_input_im = torch.zeros(1, 3, *img_size).to(device)
onnx_output_file = os.path.basename(args.weights).split('.pt')[0] + '.onnx'
torch.onnx.export(model, onnx_input_im, onnx_output_file, verbose=False, opset_version=args.opset,
do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes=None)
if args.simplify:
import onnxsim
model_onnx = onnx.load(onnx_output_file)
model_onnx, _ = onnxsim.simplify(model_onnx)
onnx.save(model_onnx, onnx_output_file)
def parse_args():
parser = argparse.ArgumentParser(description='DeepStream YOLOX conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pth) file path (required)')
parser.add_argument('-c', '--exp', required=True, help='Input exp (.py) file path (required)')
parser.add_argument('--opset', type=int, default=11, help='ONNX opset version')
parser.add_argument('--simplify', action='store_true', help='ONNX simplify model')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if not os.path.isfile(args.exp):
raise SystemExit('Invalid exp file')
return args
if __name__ == '__main__':
args = parse_args()
sys.exit(main(args))

View File

@@ -1,432 +0,0 @@
import os
import struct
import paddle
import numpy as np
from ppdet.core.workspace import load_config, merge_config
from ppdet.utils.check import check_version, check_config
from ppdet.utils.cli import ArgsParser
from ppdet.engine import Trainer
from ppdet.slim import build_slim_model
class Layers(object):
def __init__(self, size, fw, fc, letter_box):
self.blocks = [0 for _ in range(300)]
self.current = -1
self.backbone_outs = []
self.neck_fpn_feats = []
self.neck_pan_feats = []
self.yolo_head_cls = []
self.yolo_head_reg = []
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.letter_box = letter_box
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def ConvBNLayer(self, child):
self.current += 1
self.convolutional(child, act='swish')
def CSPResStage(self, child, ret):
self.current += 1
if child.conv_down is not None:
self.convolutional(child.conv_down, act='swish')
self.convolutional(child.conv1, act='swish')
self.route('-2')
self.convolutional(child.conv2, act='swish')
idx = -3
for m in child.blocks:
self.convolutional(m.conv1, act='swish')
self.convolutional(m.conv2, act='swish')
self.shortcut(-3)
idx -= 3
self.route('%d, -1' % idx)
if child.attn is not None:
self.reduce((1, 2), mode='mean', keepdim=True)
self.convolutional(child.attn.fc, act='hardsigmoid')
self.shortcut(-3, ew='mul')
self.convolutional(child.conv3, act='swish')
if ret is True:
self.backbone_outs.append(self.current)
def CSPStage(self, child, stage):
self.current += 1
self.convolutional(child.conv1, act='swish')
self.route('-2')
self.convolutional(child.conv2, act='swish')
idx = -3
for m in child.convs:
if m.__class__.__name__ == 'BasicBlock':
self.convolutional(m.conv1, act='swish')
self.convolutional(m.conv2, act='swish')
idx -= 2
elif m.__class__.__name__ == 'SPP':
self.maxpool(m.pool0)
self.route('-2')
self.maxpool(m.pool1)
self.route('-4')
self.maxpool(m.pool2)
self.route('-6, -5, -3, -1')
self.convolutional(m.conv, act='swish')
idx -= 7
self.route('%d, -1' % idx)
self.convolutional(child.conv3, act='swish')
if stage == 'fpn':
self.neck_fpn_feats.append(self.current)
elif stage == 'pan':
self.neck_pan_feats.append(self.current)
def Concat(self, route):
self.current += 1
r = self.get_route(route)
self.route('-1, %d' % r)
def Upsample(self):
self.current += 1
self.upsample()
def AvgPool2d(self, route=None):
self.current += 1
if route is not None:
r = self.get_route(route)
self.route('%d' % r)
self.avgpool()
def ESEAttn(self, child, route=0):
self.current += 1
if route < 0:
self.route('%d' % route)
self.convolutional(child.fc, act='sigmoid')
self.shortcut(route - 3, ew='mul')
self.convolutional(child.conv, act='swish')
if route == 0:
self.shortcut(-5)
def Conv2D(self, child, act='linear'):
self.current += 1
self.convolutional(child, act=act)
def Shuffle(self, reshape=None, transpose1=None, transpose2=None, output=''):
self.current += 1
self.shuffle(reshape=reshape, transpose1=transpose1, transpose2=transpose2)
if output == 'cls':
self.yolo_head_cls.append(self.current)
elif output == 'reg':
self.yolo_head_reg.append(self.current)
def SoftMax(self, axes):
self.current += 1
self.softmax(axes)
def Detect(self, output):
self.current += 1
routes = self.yolo_head_cls if output == 'cls' else self.yolo_head_reg
for i, route in enumerate(routes):
routes[i] = self.get_route(route)
self.route(str(routes)[1:-1], axis=-1)
self.yolo(output)
def net(self):
lb = 'letter_box=1\n' if self.letter_box else ''
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
lb)
def convolutional(self, cv, act='linear'):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv.__class__.__name__ == 'Conv2D':
filters = cv._out_channels
size = cv._kernel_size
stride = cv._stride
pad = cv._padding
groups = cv._groups
bias = cv.bias
bn = False
else:
filters = cv.conv._out_channels
size = cv.conv._kernel_size
stride = cv.conv._stride
pad = cv.conv._padding
groups = cv.conv._groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def route(self, layers, axis=0):
self.blocks[self.current] += 1
a = 'axis=%d\n' % axis if axis != 0 else ''
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers +
a)
def shortcut(self, r, ew='add', act='linear'):
self.blocks[self.current] += 1
m = 'mode=mul\n' if ew == 'mul' else ''
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
m +
'activation=%s\n' % act)
def reduce(self, dim, mode='mean', keepdim=False):
self.blocks[self.current] += 1
self.fc.write('\n[reduce]\n' +
'mode=%s\n' % mode +
'axes=%s\n' % str(dim)[1:-1] +
'keep=%d\n' % keepdim)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.ksize
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def upsample(self):
self.blocks[self.current] += 1
stride = 2
self.fc.write('\n[upsample]\n' +
'stride=%d\n' % stride)
def avgpool(self):
self.blocks[self.current] += 1
self.fc.write('\n[avgpool]\n')
def shuffle(self, reshape=None, transpose1=None, transpose2=None):
self.blocks[self.current] += 1
r = 'reshape=%s\n' % ', '.join(str(x) for x in reshape) if reshape is not None else ''
t1 = 'transpose1=%s\n' % ', '.join(str(x) for x in transpose1) if transpose1 is not None else ''
t2 = 'transpose2=%s\n' % ', '.join(str(x) for x in transpose2) if transpose2 is not None else ''
self.fc.write('\n[shuffle]\n' +
r +
t1 +
t2)
def softmax(self, axes):
self.blocks[self.current] += 1
self.fc.write('\n[softmax]\n' +
'axes=%d\n' % axes)
def yolo(self, output):
self.blocks[self.current] += 1
self.fc.write('\n[%s]\n' % output)
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'alpha' not in k:
vr = v.reshape([-1]).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_anchors(self, anchor_points, stride_tensor):
vr = anchor_points.numpy()
self.fw.write('{} {} '.format('anchor_points', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
vr = stride_tensor.numpy()
self.fw.write('{} {} '.format('stride_tensor', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def export_model():
paddle.set_device('cpu')
FLAGS = parse_args()
cfg = load_config(FLAGS.config)
FLAGS.opt['weights'] = FLAGS.weights
FLAGS.opt['exclude_nms'] = True
if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn':
FLAGS.opt['norm_type'] = 'bn'
merge_config(FLAGS.opt)
if FLAGS.slim_config:
cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test')
merge_config(FLAGS.opt)
check_config(cfg)
check_version()
trainer = Trainer(cfg, mode='test')
trainer.load_weights(cfg.weights)
trainer.model.eval()
if not os.path.exists('.tmp'):
os.makedirs('.tmp')
static_model, _ = trainer._get_infer_cfg_and_input_spec('.tmp')
os.system('rm -r .tmp')
return cfg, static_model
def parse_args():
parser = ArgsParser()
parser.add_argument('-w', '--weights', required=True, type=str, help='Input weights (.pdparams) file path (required)')
parser.add_argument('--slim_config', default=None, type=str, help='Slim configuration file of slim method')
args = parser.parse_args()
return args
cfg, model = export_model()
model_name = cfg.filename
inference_size = (cfg.eval_height, cfg.eval_width)
letter_box = False
for sample_transforms in cfg['EvalReader']['sample_transforms']:
if 'Resize' in sample_transforms:
letter_box = sample_transforms['Resize']['keep_ratio']
backbone = cfg[cfg.architecture]['backbone']
neck = cfg[cfg.architecture]['neck']
yolo_head = cfg[cfg.architecture]['yolo_head']
wts_file = model_name + '.wts' if 'ppyoloe' in model_name else 'ppyoloe_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'ppyoloe' in model_name else 'ppyoloe_' + model_name + '.cfg'
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(inference_size, fw, fc, letter_box)
if backbone == 'CSPResNet':
layers.fc.write('\n# CSPResNet\n')
for child in model.backbone.stem:
layers.ConvBNLayer(child)
for i, child in enumerate(model.backbone.stages):
ret = True if i in model.backbone.return_idx else False
layers.CSPResStage(child, ret)
else:
raise SystemExit('Model not supported')
if neck == 'CustomCSPPAN':
layers.fc.write('\n# CustomCSPPAN\n')
blocks = layers.backbone_outs[::-1]
for i, block in enumerate(blocks):
if i > 0:
layers.Concat(block)
layers.CSPStage(model.neck.fpn_stages[i][0], 'fpn')
if i < model.neck.num_blocks - 1:
layers.ConvBNLayer(model.neck.fpn_routes[i])
layers.Upsample()
layers.neck_pan_feats = [layers.neck_fpn_feats[-1], ]
for i in reversed(range(model.neck.num_blocks - 1)):
layers.ConvBNLayer(model.neck.pan_routes[i])
layers.Concat(layers.neck_fpn_feats[i])
layers.CSPStage(model.neck.pan_stages[i][0], 'pan')
layers.neck_pan_feats = layers.neck_pan_feats[::-1]
else:
raise SystemExit('Model not supported')
if yolo_head == 'PPYOLOEHead':
layers.fc.write('\n# PPYOLOEHead\n')
reg_max = model.yolo_head.reg_max + 1 if hasattr(model.yolo_head, 'reg_max') else model.yolo_head.reg_range[1]
for i, feat in enumerate(layers.neck_pan_feats):
if i > 0:
layers.AvgPool2d(route=feat)
else:
layers.AvgPool2d()
layers.ESEAttn(model.yolo_head.stem_cls[i])
layers.Conv2D(model.yolo_head.pred_cls[i], act='sigmoid')
layers.Shuffle(reshape=[model.yolo_head.num_classes, 'hw'], output='cls')
layers.ESEAttn(model.yolo_head.stem_reg[i], route=-7)
layers.Conv2D(model.yolo_head.pred_reg[i])
layers.Shuffle(reshape=[4, reg_max, 'hw'], transpose2=[1, 0, 2])
layers.SoftMax(0)
layers.Conv2D(model.yolo_head.proj_conv)
layers.Shuffle(reshape=['h', 'w'], output='reg')
layers.Detect('cls')
layers.Detect('reg')
layers.get_anchors(model.yolo_head.anchor_points.reshape([-1]), model.yolo_head.stride_tensor)
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))

View File

@@ -1,394 +0,0 @@
import argparse
import os
import struct
import torch
from utils.torch_utils import select_device
class Layers(object):
def __init__(self, n, size, fw, fc):
self.blocks = [0 for _ in range(n)]
self.current = 0
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.num = 0
self.nc = 0
self.anchors = ''
self.masks = []
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def Focus(self, child):
self.current = child.i
self.fc.write('\n# Focus\n')
self.reorg()
self.convolutional(child.conv)
def Conv(self, child):
self.current = child.i
self.fc.write('\n# Conv\n')
self.convolutional(child)
def BottleneckCSP(self, child):
self.current = child.i
self.fc.write('\n# BottleneckCSP\n')
self.convolutional(child.cv2)
self.route('-2')
self.convolutional(child.cv1)
idx = -3
for m in child.m:
if m.add:
self.convolutional(m.cv1)
self.convolutional(m.cv2)
self.shortcut(-3)
idx -= 3
else:
self.convolutional(m.cv1)
self.convolutional(m.cv2)
idx -= 2
self.convolutional(child.cv3)
self.route('-1, %d' % (idx - 1))
self.batchnorm(child.bn, child.act)
self.convolutional(child.cv4)
def C3(self, child):
self.current = child.i
self.fc.write('\n# C3\n')
self.convolutional(child.cv2)
self.route('-2')
self.convolutional(child.cv1)
idx = -3
for m in child.m:
if m.add:
self.convolutional(m.cv1)
self.convolutional(m.cv2)
self.shortcut(-3)
idx -= 3
else:
self.convolutional(m.cv1)
self.convolutional(m.cv2)
idx -= 2
self.route('-1, %d' % idx)
self.convolutional(child.cv3)
def SPP(self, child):
self.current = child.i
self.fc.write('\n# SPP\n')
self.convolutional(child.cv1)
self.maxpool(child.m[0])
self.route('-2')
self.maxpool(child.m[1])
self.route('-4')
self.maxpool(child.m[2])
self.route('-6, -5, -3, -1')
self.convolutional(child.cv2)
def SPPF(self, child):
self.current = child.i
self.fc.write('\n# SPPF\n')
self.convolutional(child.cv1)
self.maxpool(child.m)
self.maxpool(child.m)
self.maxpool(child.m)
self.route('-4, -3, -2, -1')
self.convolutional(child.cv2)
def Upsample(self, child):
self.current = child.i
self.fc.write('\n# Upsample\n')
self.upsample(child)
def Concat(self, child):
self.current = child.i
self.fc.write('\n# Concat\n')
r = []
for i in range(1, len(child.f)):
r.append(self.get_route(child.f[i]))
self.route('-1, %s' % str(r)[1:-1])
def Detect(self, child):
self.current = child.i
self.fc.write('\n# Detect\n')
self.get_anchors(child.state_dict(), child.m[0].out_channels)
for i, m in enumerate(child.m):
r = self.get_route(child.f[i])
self.route('%d' % r)
self.convolutional(m, detect=True)
self.yolo(i)
def net(self):
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
'letter_box=1\n')
def CBH(self, child):
self.current = child.i
self.fc.write('\n# CBH\n')
self.convolutional(child.conv, act='hardswish')
def LC_Block(self, child):
self.current = child.i
self.fc.write('\n# LC_Block\n')
self.convolutional(child.dw_conv, act='hardswish')
if child.use_se:
self.avgpool()
self.convolutional(child.se.conv1, act='relu')
self.convolutional(child.se.conv2, act='silu')
self.shortcut(-4, ew='mul')
self.convolutional(child.pw_conv, act='hardswish')
def Dense(self, child):
self.current = child.i
self.fc.write('\n# Dense\n')
self.convolutional(child.dense_conv, act='hardswish')
def reorg(self):
self.blocks[self.current] += 1
self.fc.write('\n[reorg]\n')
def convolutional(self, cv, act=None, detect=False):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv._get_name() == 'Conv2d':
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
bn = False
act = 'linear' if not detect else 'logistic'
else:
filters = cv.conv.out_channels
size = cv.conv.kernel_size
stride = cv.conv.stride
pad = cv.conv.padding
groups = cv.conv.groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
if act is None:
act = self.get_activation(cv.act._get_name()) if hasattr(cv, 'act') else 'linear'
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def batchnorm(self, bn, act):
self.blocks[self.current] += 1
self.get_state_dict(bn.state_dict())
filters = bn.num_features
act = self.get_activation(act._get_name())
self.fc.write('\n[batchnorm]\n' +
'filters=%d\n' % filters +
'activation=%s\n' % act)
def route(self, layers):
self.blocks[self.current] += 1
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers)
def shortcut(self, r, ew='add', act='linear'):
self.blocks[self.current] += 1
m = 'mode=mul\n' if ew == 'mul' else ''
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
m +
'activation=%s\n' % act)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.kernel_size
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def upsample(self, child):
self.blocks[self.current] += 1
stride = child.scale_factor
self.fc.write('\n[upsample]\n' +
'stride=%d\n' % stride)
def avgpool(self):
self.blocks[self.current] += 1
self.fc.write('\n[avgpool]\n')
def yolo(self, i):
self.blocks[self.current] += 1
self.fc.write('\n[yolo]\n' +
'mask=%s\n' % self.masks[i] +
'anchors=%s\n' % self.anchors +
'classes=%d\n' % self.nc +
'num=%d\n' % self.num +
'scale_x_y=2.0\n' +
'new_coords=1\n')
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_anchors(self, state_dict, out_channels):
anchor_grid = state_dict['anchor_grid']
aa = anchor_grid.reshape(-1).tolist()
am = anchor_grid.tolist()
self.num = (len(aa) / 2)
self.nc = int((out_channels / (self.num / len(am))) - 5)
self.anchors = str(aa)[1:-1]
n = 0
for m in am:
mask = []
for _ in range(len(m)):
mask.append(n)
n += 1
self.masks.append(str(mask)[1:-1])
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
if n < 0:
for i, b in enumerate(self.blocks[self.current-1::-1]):
if i < abs(n) - 1:
r -= b
else:
break
else:
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def get_activation(self, act):
if act == 'Hardswish':
return 'hardswish'
elif act == 'LeakyReLU':
return 'leaky'
elif act == 'SiLU':
return 'silu'
return 'linear'
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOv5 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument(
'-s', '--size', nargs='+', type=int, help='Inference size [H,W] (default [640])')
parser.add_argument("--p6", action="store_true", help="P6 model")
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if not args.size:
args.size = [1280] if args.p6 else [640]
return args.weights, args.size
pt_file, inference_size = parse_args()
model_name = os.path.basename(pt_file).split('.pt')[0]
wts_file = model_name + '.wts' if 'yolov5' in model_name else 'yolov5_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'yolov5' in model_name else 'yolov5_' + model_name + '.cfg'
device = select_device('cpu')
model = torch.load(pt_file, map_location=device)['model'].float()
anchor_grid = model.model[-1].anchors * model.model[-1].stride[..., None, None]
delattr(model.model[-1], 'anchor_grid')
model.model[-1].register_buffer('anchor_grid', anchor_grid)
model.to(device).eval()
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(len(model.model), inference_size, fw, fc)
for child in model.model.children():
if child._get_name() == 'Focus':
layers.Focus(child)
elif child._get_name() == 'Conv':
layers.Conv(child)
elif child._get_name() == 'BottleneckCSP':
layers.BottleneckCSP(child)
elif child._get_name() == 'C3':
layers.C3(child)
elif child._get_name() == 'SPP':
layers.SPP(child)
elif child._get_name() == 'SPPF':
layers.SPPF(child)
elif child._get_name() == 'Upsample':
layers.Upsample(child)
elif child._get_name() == 'Concat':
layers.Concat(child)
elif child._get_name() == 'Detect':
layers.Detect(child)
elif child._get_name() == 'CBH':
layers.CBH(child)
elif child._get_name() == 'LC_Block':
layers.LC_Block(child)
elif child._get_name() == 'Dense':
layers.Dense(child)
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))

View File

@@ -1,588 +0,0 @@
import argparse
import os
import struct
import torch
from yolov6.assigners.anchor_generator import generate_anchors
class Layers(object):
def __init__(self, size, fw, fc):
self.blocks = [0 for _ in range(300)]
self.current = -1
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.backbone_outs = []
self.fpn_feats = []
self.pan_feats = []
self.yolo_head_cls = []
self.yolo_head_reg = []
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def BaseConv(self, child):
self.current += 1
if child._get_name() == 'RepVGGBlock':
self.convolutional(child.rbr_reparam, act=self.get_activation(child.nonlinearity._get_name()))
elif child._get_name() == 'ConvWrapper' or child._get_name() == 'SimConvWrapper':
self.convolutional(child.block)
else:
raise SystemExit('Model not supported')
def RepBlock(self, child, stage=''):
self.current += 1
if child.conv1._get_name() == 'RepVGGBlock':
self.convolutional(child.conv1.rbr_reparam, act=self.get_activation(child.conv1.nonlinearity._get_name()))
if child.block is not None:
for m in child.block:
self.convolutional(m.rbr_reparam, act=self.get_activation(m.nonlinearity._get_name()))
elif child.conv1._get_name() == 'ConvWrapper' or child.conv1._get_name() == 'SimConvWrapper':
self.convolutional(child.conv1.block)
if child.block is not None:
for m in child.block:
self.convolutional(m.block)
else:
raise SystemExit('Model not supported')
if stage == 'backbone':
self.backbone_outs.append(self.current)
elif stage == 'pan':
self.pan_feats.append(self.current)
def BepC3(self, child, stage=''):
self.current += 1
if child.concat is True:
self.convolutional(child.cv2)
self.route('-2')
self.convolutional(child.cv1)
idx = -3
if child.m.conv1.conv1._get_name() == 'RepVGGBlock':
self.convolutional(child.m.conv1.conv1.rbr_reparam,
act=self.get_activation(child.m.conv1.conv1.nonlinearity._get_name()))
self.convolutional(child.m.conv1.conv2.rbr_reparam,
act=self.get_activation(child.m.conv1.conv2.nonlinearity._get_name()))
idx -= 2
if child.m.conv1.shortcut:
self.shortcut(-3)
idx -= 1
if child.m.block is not None:
for m in child.m.block:
self.convolutional(m.conv1.rbr_reparam, act=self.get_activation(m.conv1.nonlinearity._get_name()))
self.convolutional(m.conv2.rbr_reparam, act=self.get_activation(m.conv2.nonlinearity._get_name()))
idx -= 2
if m.shortcut:
self.shortcut(-3)
idx -= 1
elif child.m.conv1.conv1._get_name() == 'ConvWrapper' or child.m.conv1.conv1._get_name() == 'SimConvWrapper':
self.convolutional(child.m.conv1.conv1.block)
self.convolutional(child.m.conv1.conv2.block)
idx -= 2
if child.m.conv1.shortcut:
self.shortcut(-3)
idx -= 1
if child.m.block is not None:
for m in child.m.block:
self.convolutional(m.conv1.block)
self.convolutional(m.conv2.block)
idx -= 2
if m.shortcut:
self.shortcut(-3)
idx -= 1
else:
raise SystemExit('Model not supported')
if child.concat is True:
self.route('-1, %d' % idx)
self.convolutional(child.cv3)
if stage == 'backbone':
self.backbone_outs.append(self.current)
elif stage == 'pan':
self.pan_feats.append(self.current)
def CSPSPPF(self, child):
self.current += 1
self.convolutional(child.cv2)
self.route('-2')
self.convolutional(child.cv1)
self.convolutional(child.cv3)
self.convolutional(child.cv4)
self.maxpool(child.m)
self.maxpool(child.m)
self.maxpool(child.m)
self.route('-4, -3, -2, -1')
self.convolutional(child.cv5)
self.convolutional(child.cv6)
self.route('-11, -1')
self.convolutional(child.cv7)
self.backbone_outs.append(self.current)
def SPPF(self, child):
self.current += 1
self.convolutional(child.cv1)
self.maxpool(child.m)
self.maxpool(child.m)
self.maxpool(child.m)
self.route('-4, -3, -2, -1')
self.convolutional(child.cv2)
self.backbone_outs.append(self.current)
def SimConv(self, child, stage=''):
self.current += 1
self.convolutional(child)
if stage == 'fpn':
self.fpn_feats.append(self.current)
def BiFusion(self, child, idx):
self.current += 1
self.deconvolutional(child.upsample.upsample_transpose)
r = self.get_route(self.backbone_outs[- idx -2])
self.route('%d' % r)
self.convolutional(child.cv1)
r = self.get_route(self.backbone_outs[- idx -3])
self.route('%d' % r)
self.convolutional(child.cv2)
self.convolutional(child.downsample)
self.route('-6, -4, -1')
self.convolutional(child.cv3)
def Upsample(self, child):
self.current += 1
self.deconvolutional(child.upsample_transpose)
def Conv(self, child, act=None):
self.current += 1
self.convolutional(child, act=act)
def Concat(self, route):
self.current += 1
r = self.get_route(route)
self.route('-1, %d' % r)
def Route(self, route):
self.current += 1
if route > 0:
r = self.get_route(route)
self.route('%d' % r)
else:
self.route('%d' % route)
def Shuffle(self, reshape=None, transpose1=None, transpose2=None, output=''):
self.current += 1
self.shuffle(reshape=reshape, transpose1=transpose1, transpose2=transpose2)
if output == 'cls':
self.yolo_head_cls.append(self.current)
elif output == 'reg':
self.yolo_head_reg.append(self.current)
def SoftMax(self, axes):
self.current += 1
self.softmax(axes)
def Detect(self, output):
self.current += 1
routes = self.yolo_head_cls if output == 'cls' else self.yolo_head_reg
for i, route in enumerate(routes):
routes[i] = self.get_route(route)
self.route(str(routes)[1:-1], axis=-1)
self.yolo(output)
def net(self):
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
'letter_box=1\n')
def convolutional(self, cv, act=None, detect=False):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv._get_name() == 'Conv2d':
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
bn = False
act = act if act is not None else 'linear'
else:
filters = cv.conv.out_channels
size = cv.conv.kernel_size
stride = cv.conv.stride
pad = cv.conv.padding
groups = cv.conv.groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
if act is None:
act = self.get_activation(cv.act._get_name()) if hasattr(cv, 'act') else 'linear'
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def deconvolutional(self, cv):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=0\n' if bias is None else ''
self.fc.write('\n[deconvolutional]\n' +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w)
def route(self, layers, axis=0):
self.blocks[self.current] += 1
a = 'axis=%d\n' % axis if axis != 0 else ''
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers +
a)
def shortcut(self, r, ew='add', act='linear'):
self.blocks[self.current] += 1
m = 'mode=mul\n' if ew == 'mul' else ''
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
m +
'activation=%s\n' % act)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.kernel_size
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def shuffle(self, reshape=None, transpose1=None, transpose2=None):
self.blocks[self.current] += 1
r = 'reshape=%s\n' % ', '.join(str(x) for x in reshape) if reshape is not None else ''
t1 = 'transpose1=%s\n' % ', '.join(str(x) for x in transpose1) if transpose1 is not None else ''
t2 = 'transpose2=%s\n' % ', '.join(str(x) for x in transpose2) if transpose2 is not None else ''
self.fc.write('\n[shuffle]\n' +
r +
t1 +
t2)
def softmax(self, axes):
self.blocks[self.current] += 1
self.fc.write('\n[softmax]\n' +
'axes=%d\n' % axes)
def yolo(self, output):
self.blocks[self.current] += 1
self.fc.write('\n[%s]\n' % output)
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_anchors(self, anchor_points, stride_tensor):
vr = anchor_points.numpy()
self.fw.write('{} {} '.format('anchor_points', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
vr = stride_tensor.numpy()
self.fw.write('{} {} '.format('stride_tensor', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def get_activation(self, act):
if act == 'Hardswish':
return 'hardswish'
elif act == 'LeakyReLU':
return 'leaky'
elif act == 'SiLU':
return 'silu'
elif act == 'ReLU':
return 'relu'
return 'linear'
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOv6 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument(
'-s', '--size', nargs='+', type=int, help='Inference size [H,W] (default [640])')
parser.add_argument("--p6", action="store_true", help="P6 model")
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if not args.size:
args.size = [1280] if args.p6 else [640]
return args.weights, args.size
pt_file, inference_size = parse_args()
model_name = os.path.basename(pt_file).split('.pt')[0]
wts_file = model_name + '.wts' if 'yolov6' in model_name else 'yolov6_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'yolov6' in model_name else 'yolov6_' + model_name + '.cfg'
model = torch.load(pt_file, map_location='cpu')['model'].float()
model.to('cpu').eval()
for layer in model.modules():
if layer._get_name() == 'RepVGGBlock':
layer.switch_to_deploy()
backbones = ['EfficientRep', 'CSPBepBackbone']
necks = ['RepBiFPANNeck', 'CSPRepBiFPANNeck', 'RepPANNeck', 'CSPRepPANNeck']
backbones_p6 = ['EfficientRep6', 'CSPBepBackbone_P6']
necks_p6 = ['RepBiFPANNeck6', 'CSPRepBiFPANNeck_P6', 'RepPANNeck6', 'CSPRepPANNeck_P6']
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(inference_size, fw, fc)
if model.backbone._get_name() in backbones:
layers.fc.write('\n# %s\n' % model.backbone._get_name())
if model.backbone._get_name() == 'EfficientRep':
block1 = layers.RepBlock
elif model.backbone._get_name() == 'CSPBepBackbone':
block1 = layers.BepC3
if model.backbone.ERBlock_5[2]._get_name() == 'CSPSPPF' or model.backbone.ERBlock_5[2]._get_name() == 'SimCSPSPPF':
block2 = layers.CSPSPPF
elif model.backbone.ERBlock_5[2]._get_name() == 'SPPF' or model.backbone.ERBlock_5[2]._get_name() == 'SimSPPF':
block2 = layers.SPPF
else:
raise SystemExit('Model not supported')
layers.BaseConv(model.backbone.stem)
layers.BaseConv(model.backbone.ERBlock_2[0])
block1(model.backbone.ERBlock_2[1], 'backbone' if hasattr(model.backbone, 'fuse_P2') and
model.backbone.fuse_P2 else '')
layers.BaseConv(model.backbone.ERBlock_3[0])
block1(model.backbone.ERBlock_3[1], 'backbone')
layers.BaseConv(model.backbone.ERBlock_4[0])
block1(model.backbone.ERBlock_4[1], 'backbone')
layers.BaseConv(model.backbone.ERBlock_5[0])
block1(model.backbone.ERBlock_5[1])
block2(model.backbone.ERBlock_5[2])
elif model.backbone._get_name() in backbones_p6:
layers.fc.write('\n# %s\n' % model.backbone._get_name())
if model.backbone._get_name() == 'EfficientRep6':
block1 = layers.RepBlock
elif model.backbone._get_name() == 'CSPBepBackbone_P6':
block1 = layers.BepC3
if model.backbone.ERBlock_6[2]._get_name() == 'CSPSPPF' or model.backbone.ERBlock_6[2]._get_name() == 'SimCSPSPPF':
block2 = layers.CSPSPPF
elif model.backbone.ERBlock_6[2]._get_name() == 'SPPF' or model.backbone.ERBlock_6[2]._get_name() == 'SimSPPF':
block2 = layers.SPPF
else:
raise SystemExit('Model not supported')
layers.BaseConv(model.backbone.stem)
layers.BaseConv(model.backbone.ERBlock_2[0])
block1(model.backbone.ERBlock_2[1], 'backbone' if model.backbone._get_name() == 'CSPBepBackbone_P6' or
(hasattr(model.backbone, 'fuse_P2') and model.backbone.fuse_P2) else '')
layers.BaseConv(model.backbone.ERBlock_3[0])
block1(model.backbone.ERBlock_3[1], 'backbone')
layers.BaseConv(model.backbone.ERBlock_4[0])
block1(model.backbone.ERBlock_4[1], 'backbone')
layers.BaseConv(model.backbone.ERBlock_5[0])
block1(model.backbone.ERBlock_5[1], 'backbone')
layers.BaseConv(model.backbone.ERBlock_6[0])
block1(model.backbone.ERBlock_6[1])
block2(model.backbone.ERBlock_6[2])
else:
raise SystemExit('Model not supported')
if model.neck._get_name() in necks:
layers.fc.write('\n# %s\n' % model.neck._get_name())
if model.neck._get_name() == 'RepBiFPANNeck' or model.neck._get_name() == 'RepPANNeck':
block = layers.RepBlock
elif model.neck._get_name() == 'CSPRepBiFPANNeck' or model.neck._get_name() == 'CSPRepPANNeck':
block = layers.BepC3
layers.SimConv(model.neck.reduce_layer0, 'fpn')
if 'Bi' in model.neck._get_name():
layers.BiFusion(model.neck.Bifusion0, 0)
else:
layers.Upsample(model.neck.upsample0)
layers.Concat(layers.backbone_outs[-2])
block(model.neck.Rep_p4)
layers.SimConv(model.neck.reduce_layer1, 'fpn')
if 'Bi' in model.neck._get_name():
layers.BiFusion(model.neck.Bifusion1, 1)
else:
layers.Upsample(model.neck.upsample1)
layers.Concat(layers.backbone_outs[-3])
block(model.neck.Rep_p3, 'pan')
layers.SimConv(model.neck.downsample2)
layers.Concat(layers.fpn_feats[1])
block(model.neck.Rep_n3, 'pan')
layers.SimConv(model.neck.downsample1)
layers.Concat(layers.fpn_feats[0])
block(model.neck.Rep_n4, 'pan')
layers.pan_feats = layers.pan_feats[::-1]
elif model.neck._get_name() in necks_p6:
layers.fc.write('\n# %s\n' % model.neck._get_name())
if model.neck._get_name() == 'RepBiFPANNeck6' or model.neck._get_name() == 'RepPANNeck6':
block = layers.RepBlock
elif model.neck._get_name() == 'CSPRepBiFPANNeck_P6' or model.neck._get_name() == 'CSPRepPANNeck_P6':
block = layers.BepC3
layers.SimConv(model.neck.reduce_layer0, 'fpn')
if 'Bi' in model.neck._get_name():
layers.BiFusion(model.neck.Bifusion0, 0)
else:
layers.Upsample(model.neck.upsample0)
layers.Concat(layers.backbone_outs[-2])
block(model.neck.Rep_p5)
layers.SimConv(model.neck.reduce_layer1, 'fpn')
if 'Bi' in model.neck._get_name():
layers.BiFusion(model.neck.Bifusion1, 1)
else:
layers.Upsample(model.neck.upsample1)
layers.Concat(layers.backbone_outs[-3])
block(model.neck.Rep_p4)
layers.SimConv(model.neck.reduce_layer2, 'fpn')
if 'Bi' in model.neck._get_name():
layers.BiFusion(model.neck.Bifusion2, 2)
else:
layers.Upsample(model.neck.upsample2)
layers.Concat(layers.backbone_outs[-4])
block(model.neck.Rep_p3, 'pan')
layers.SimConv(model.neck.downsample2)
layers.Concat(layers.fpn_feats[2])
block(model.neck.Rep_n4, 'pan')
layers.SimConv(model.neck.downsample1)
layers.Concat(layers.fpn_feats[1])
block(model.neck.Rep_n5, 'pan')
layers.SimConv(model.neck.downsample0)
layers.Concat(layers.fpn_feats[0])
block(model.neck.Rep_n6, 'pan')
layers.pan_feats = layers.pan_feats[::-1]
else:
raise SystemExit('Model not supported')
if model.detect._get_name() == 'Detect':
layers.fc.write('\n# Detect\n')
for i, feat in enumerate(layers.pan_feats):
idx = len(layers.pan_feats) - i - 1
if i > 0:
layers.Route(feat)
layers.Conv(model.detect.stems[idx])
layers.Conv(model.detect.cls_convs[idx])
layers.Conv(model.detect.cls_preds[idx], act='sigmoid')
layers.Shuffle(reshape=[model.detect.nc, 'hw'], output='cls')
layers.Route(-4)
layers.Conv(model.detect.reg_convs[idx])
layers.Conv(model.detect.reg_preds[idx])
if model.detect.use_dfl:
layers.Shuffle(reshape=[4, model.detect.reg_max + 1, 'hw'], transpose2=[1, 0, 2])
layers.SoftMax(0)
layers.Conv(model.detect.proj_conv)
layers.Shuffle(reshape=['h', 'w'], output='reg')
else:
layers.Shuffle(reshape=[4, 'hw'], output='reg')
layers.Detect('cls')
layers.Detect('reg')
x = []
for stride in model.detect.stride.tolist()[::-1]:
x.append(torch.zeros([1, 1, int(layers.height / stride), int(layers.width / stride)], dtype=torch.float32))
anchor_points, stride_tensor = generate_anchors(x, model.detect.stride.flip((0,)), model.detect.grid_cell_size,
model.detect.grid_cell_offset, device='cpu', is_eval=True, mode='af')
layers.get_anchors(anchor_points.reshape([-1]), stride_tensor)
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))

View File

@@ -1,357 +0,0 @@
import argparse
import os
import struct
import torch
from utils.torch_utils import select_device
class Layers(object):
def __init__(self, n, size, fw, fc):
self.blocks = [0 for _ in range(n)]
self.current = 0
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.num = 0
self.nc = 0
self.anchors = ''
self.masks = []
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def ReOrg(self, child):
self.current = child.i
self.fc.write('\n# ReOrg\n')
self.reorg()
def Conv(self, child):
self.current = child.i
self.fc.write('\n# Conv\n')
if child.f != -1:
r = self.get_route(child.f)
self.route('%d' % r)
self.convolutional(child)
def DownC(self, child):
self.current = child.i
self.fc.write('\n# DownC\n')
self.maxpool(child.mp)
self.convolutional(child.cv3)
self.route('-3')
self.convolutional(child.cv1)
self.convolutional(child.cv2)
self.route('-1, -4')
def MP(self, child):
self.current = child.i
self.fc.write('\n# MP\n')
self.maxpool(child.m)
def SP(self, child):
self.current = child.i
self.fc.write('\n# SP\n')
if child.f != -1:
r = self.get_route(child.f)
self.route('%d' % r)
self.maxpool(child.m)
def SPPCSPC(self, child):
self.current = child.i
self.fc.write('\n# SPPCSPC\n')
self.convolutional(child.cv2)
self.route('-2')
self.convolutional(child.cv1)
self.convolutional(child.cv3)
self.convolutional(child.cv4)
self.maxpool(child.m[0])
self.route('-2')
self.maxpool(child.m[1])
self.route('-4')
self.maxpool(child.m[2])
self.route('-6, -5, -3, -1')
self.convolutional(child.cv5)
self.convolutional(child.cv6)
self.route('-1, -13')
self.convolutional(child.cv7)
def RepConv(self, child):
self.current = child.i
self.fc.write('\n# RepConv\n')
if child.f != -1:
r = self.get_route(child.f)
self.route('%d' % r)
self.convolutional(child.rbr_1x1)
self.route('-2')
self.convolutional(child.rbr_dense)
self.shortcut(-3, act=self.get_activation(child.act._get_name()))
def Upsample(self, child):
self.current = child.i
self.fc.write('\n# Upsample\n')
self.upsample(child)
def Concat(self, child):
self.current = child.i
self.fc.write('\n# Concat\n')
r = []
for i in range(1, len(child.f)):
r.append(self.get_route(child.f[i]))
self.route('-1, %s' % str(r)[1:-1])
def Shortcut(self, child):
self.current = child.i
self.fc.write('\n# Shortcut\n')
r = self.get_route(child.f[1])
self.shortcut(r)
def Detect(self, child):
self.current = child.i
self.fc.write('\n# Detect\n')
self.get_anchors(child.state_dict(), child.m[0].out_channels)
for i, m in enumerate(child.m):
r = self.get_route(child.f[i])
self.route('%d' % r)
self.convolutional(m, detect=True)
self.yolo(i)
def net(self):
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
'letter_box=1\n')
def reorg(self):
self.blocks[self.current] += 1
self.fc.write('\n[reorg]\n')
def convolutional(self, cv, act=None, detect=False):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv._get_name() == 'Conv2d':
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
bn = False
act = 'linear' if not detect else 'logistic'
elif cv._get_name() == 'Sequential':
filters = cv[0].out_channels
size = cv[0].kernel_size
stride = cv[0].stride
pad = cv[0].padding
groups = cv[0].groups
bias = cv[0].bias
bn = True if cv[1]._get_name() == 'BatchNorm2d' else False
act = 'linear'
else:
filters = cv.conv.out_channels
size = cv.conv.kernel_size
stride = cv.conv.stride
pad = cv.conv.padding
groups = cv.conv.groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
if act is None:
act = self.get_activation(cv.act._get_name()) if hasattr(cv, 'act') else 'linear'
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def route(self, layers):
self.blocks[self.current] += 1
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers)
def shortcut(self, r, act='linear'):
self.blocks[self.current] += 1
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
'activation=%s\n' % act)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.kernel_size
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def upsample(self, child):
self.blocks[self.current] += 1
stride = child.scale_factor
self.fc.write('\n[upsample]\n' +
'stride=%d\n' % stride)
def yolo(self, i):
self.blocks[self.current] += 1
self.fc.write('\n[yolo]\n' +
'mask=%s\n' % self.masks[i] +
'anchors=%s\n' % self.anchors +
'classes=%d\n' % self.nc +
'num=%d\n' % self.num +
'scale_x_y=2.0\n' +
'new_coords=1\n')
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_anchors(self, state_dict, out_channels):
anchor_grid = state_dict['anchor_grid']
aa = anchor_grid.reshape(-1).tolist()
am = anchor_grid.tolist()
self.num = (len(aa) / 2)
self.nc = int((out_channels / (self.num / len(am))) - 5)
self.anchors = str(aa)[1:-1]
n = 0
for m in am:
mask = []
for _ in range(len(m)):
mask.append(n)
n += 1
self.masks.append(str(mask)[1:-1])
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
if n < 0:
for i, b in enumerate(self.blocks[self.current-1::-1]):
if i < abs(n) - 1:
r -= b
else:
break
else:
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def get_activation(self, act):
if act == 'Hardswish':
return 'hardswish'
elif act == 'LeakyReLU':
return 'leaky'
elif act == 'SiLU':
return 'silu'
return 'linear'
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOv7 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument(
'-s', '--size', nargs='+', type=int, help='Inference size [H,W] (default [640])')
parser.add_argument("--p6", action="store_true", help="P6 model")
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if not args.size:
args.size = [1280] if args.p6 else [640]
return args.weights, args.size
pt_file, inference_size = parse_args()
model_name = os.path.basename(pt_file).split('.pt')[0]
wts_file = model_name + '.wts' if 'yolov7' in model_name else 'yolov7_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'yolov7' in model_name else 'yolov7_' + model_name + '.cfg'
device = select_device('cpu')
model = torch.load(pt_file, map_location=device)
model = model['ema' if model.get('ema') else 'model'].float()
anchor_grid = model.model[-1].anchors * model.model[-1].stride[..., None, None]
delattr(model.model[-1], 'anchor_grid')
model.model[-1].register_buffer('anchor_grid', anchor_grid)
model.to(device).eval()
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(len(model.model), inference_size, fw, fc)
for child in model.model.children():
if child._get_name() == 'ReOrg':
layers.ReOrg(child)
elif child._get_name() == 'Conv':
layers.Conv(child)
elif child._get_name() == 'DownC':
layers.DownC(child)
elif child._get_name() == 'MP':
layers.MP(child)
elif child._get_name() == 'SP':
layers.SP(child)
elif child._get_name() == 'SPPCSPC':
layers.SPPCSPC(child)
elif child._get_name() == 'RepConv':
layers.RepConv(child)
elif child._get_name() == 'Upsample':
layers.Upsample(child)
elif child._get_name() == 'Concat':
layers.Concat(child)
elif child._get_name() == 'Shortcut':
layers.Shortcut(child)
elif child._get_name() == 'Detect':
layers.Detect(child)
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))

View File

@@ -1,323 +0,0 @@
import argparse
import os
import struct
import torch
from ultralytics.yolo.utils.torch_utils import select_device
from ultralytics.yolo.utils.tal import make_anchors
class Layers(object):
def __init__(self, n, size, fw, fc):
self.blocks = [0 for _ in range(n)]
self.current = -1
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def Conv(self, child):
self.current = child.i
self.fc.write('\n# Conv\n')
self.convolutional(child)
def C2f(self, child):
self.current = child.i
self.fc.write('\n# C2f\n')
self.convolutional(child.cv1)
self.c2f(child.m)
self.convolutional(child.cv2)
def SPPF(self, child):
self.current = child.i
self.fc.write('\n# SPPF\n')
self.convolutional(child.cv1)
self.maxpool(child.m)
self.maxpool(child.m)
self.maxpool(child.m)
self.route('-4, -3, -2, -1')
self.convolutional(child.cv2)
def Upsample(self, child):
self.current = child.i
self.fc.write('\n# Upsample\n')
self.upsample(child)
def Concat(self, child):
self.current = child.i
self.fc.write('\n# Concat\n')
r = []
for i in range(1, len(child.f)):
r.append(self.get_route(child.f[i]))
self.route('-1, %s' % str(r)[1:-1])
def Detect(self, child):
self.current = child.i
self.fc.write('\n# Detect\n')
output_idxs = [0 for _ in range(child.nl)]
for i in range(child.nl):
r = self.get_route(child.f[i])
self.route('%d' % r)
for j in range(len(child.cv3[i])):
self.convolutional(child.cv3[i][j])
self.route('%d' % (-1 - len(child.cv3[i])))
for j in range(len(child.cv2[i])):
self.convolutional(child.cv2[i][j])
self.route('-1, %d' % (-2 - len(child.cv2[i])))
self.shuffle(reshape=[child.no, -1])
output_idxs[i] = (-1 + i * (-4 - len(child.cv3[i]) - len(child.cv2[i])))
self.route('%s' % str(output_idxs[::-1])[1:-1], axis=1)
self.yolo(child)
def net(self):
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
'letter_box=1\n')
def convolutional(self, cv, act=None, detect=False):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv._get_name() == 'Conv2d':
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
bn = False
act = 'linear' if not detect else 'logistic'
else:
filters = cv.conv.out_channels
size = cv.conv.kernel_size
stride = cv.conv.stride
pad = cv.conv.padding
groups = cv.conv.groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
if act is None:
act = self.get_activation(cv.act._get_name()) if hasattr(cv, 'act') else 'linear'
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def c2f(self, m):
self.blocks[self.current] += 1
for x in m:
self.get_state_dict(x.state_dict())
n = len(m)
shortcut = 1 if m[0].add else 0
filters = m[0].cv1.conv.out_channels
size = m[0].cv1.conv.kernel_size
stride = m[0].cv1.conv.stride
pad = m[0].cv1.conv.padding
groups = m[0].cv1.conv.groups
bias = m[0].cv1.conv.bias
bn = True if hasattr(m[0].cv1, 'bn') else False
act = 'linear'
if hasattr(m[0].cv1, 'act'):
act = self.get_activation(m[0].cv1.act._get_name())
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[c2f]\n' +
'n=%d\n' % n +
'shortcut=%d\n' % shortcut +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def route(self, layers, axis=0):
self.blocks[self.current] += 1
a = 'axis=%d\n' % axis if axis != 0 else ''
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers +
a)
def shortcut(self, r, ew='add', act='linear'):
self.blocks[self.current] += 1
m = 'mode=mul\n' if ew == 'mul' else ''
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
m +
'activation=%s\n' % act)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.kernel_size
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def upsample(self, child):
self.blocks[self.current] += 1
stride = child.scale_factor
self.fc.write('\n[upsample]\n' +
'stride=%d\n' % stride)
def shuffle(self, reshape=None, transpose1=None, transpose2=None):
self.blocks[self.current] += 1
r = 'reshape=%s\n' % ', '.join(str(x) for x in reshape) if reshape is not None else ''
t1 = 'transpose1=%s\n' % ', '.join(str(x) for x in transpose1) if transpose1 is not None else ''
t2 = 'transpose2=%s\n' % ', '.join(str(x) for x in transpose2) if transpose2 is not None else ''
self.fc.write('\n[shuffle]\n' +
r +
t1 +
t2)
def yolo(self, child):
self.blocks[self.current] += 1
self.fc.write('\n[detect_v8]\n' +
'num=%d\n' % (child.reg_max * 4) +
'classes=%d\n' % child.nc)
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_anchors(self, anchor_points, stride_tensor):
vr = anchor_points.numpy()
self.fw.write('{} {} '.format('anchor_points', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
vr = stride_tensor.numpy()
self.fw.write('{} {} '.format('stride_tensor', len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def get_activation(self, act):
if act == 'Hardswish':
return 'hardswish'
elif act == 'LeakyReLU':
return 'leaky'
elif act == 'SiLU':
return 'silu'
return 'linear'
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOv8 conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument(
'-s', '--size', nargs='+', type=int, default=[640], help='Inference size [H,W] (default [640])')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
return args.weights, args.size
pt_file, inference_size = parse_args()
model_name = os.path.basename(pt_file).split('.pt')[0]
wts_file = model_name + '.wts' if 'yolov8' in model_name else 'yolov8_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'yolov8' in model_name else 'yolov8_' + model_name + '.cfg'
device = select_device('cpu')
model = torch.load(pt_file, map_location=device)['model'].float()
model.to(device).eval()
if model.names and model.nc:
with open("labels.txt", 'w') as fw:
for i in range(model.nc):
fw.write(model.names[i] + '\n')
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(len(model.model), inference_size, fw, fc)
for child in model.model.children():
if child._get_name() == 'Conv':
layers.Conv(child)
elif child._get_name() == 'C2f':
layers.C2f(child)
elif child._get_name() == 'SPPF':
layers.SPPF(child)
elif child._get_name() == 'Upsample':
layers.Upsample(child)
elif child._get_name() == 'Concat':
layers.Concat(child)
elif child._get_name() == 'Detect':
layers.Detect(child)
x = []
for stride in model.stride.tolist():
x.append(torch.zeros([1, 1, int(layers.height / stride), int(layers.width / stride)], dtype=torch.float32))
anchor_points, stride_tensor = (x.transpose(0, 1) for x in make_anchors(x, child.stride, 0.5))
layers.get_anchors(anchor_points.reshape([-1]), stride_tensor.reshape([-1]))
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))

View File

@@ -1,56 +0,0 @@
import argparse
import os
import struct
import torch
from utils.torch_utils import select_device
from models.models import Darknet
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOR conversion (main branch)')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pt) file path (required)')
parser.add_argument('-c', '--cfg', default='', help='Input cfg (.cfg) file path')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if args.cfg != '' and not os.path.isfile(args.cfg):
raise SystemExit('Invalid cfg file')
return args.weights, args.cfg
pt_file, cfg_file = parse_args()
model_name = os.path.basename(pt_file).split('.pt')[0]
wts_file = model_name + '.wts' if 'yolor' in model_name else 'yolor_' + model_name + '.wts'
new_cfg_file = model_name + '.cfg' if 'yolor' in model_name else 'yolor_' + model_name + '.cfg'
if cfg_file == '':
cfg_file = 'cfg/' + model_name + '.cfg'
if not os.path.isfile(cfg_file):
raise SystemExit('CFG file not found')
elif not os.path.isfile(cfg_file):
raise SystemExit('Invalid CFG file')
device = select_device('cpu')
model = Darknet(cfg_file).to(device)
model.load_state_dict(torch.load(pt_file, map_location=device)['model'])
model.to(device).eval()
with open(wts_file, 'w') as f:
wts_write = ''
conv_count = 0
for k, v in model.state_dict().items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).cpu().numpy()
wts_write += '{} {} '.format(k, len(vr))
for vv in vr:
wts_write += ' '
wts_write += struct.pack('>f', float(vv)).hex()
wts_write += '\n'
conv_count += 1
f.write('{}\n'.format(conv_count))
f.write(wts_write)
if not os.path.isfile(new_cfg_file):
os.system('cp %s %s' % (cfg_file, new_cfg_file))

View File

@@ -1,370 +0,0 @@
import argparse
import os
import struct
import torch
from yolox.exp import get_exp
class Layers(object):
def __init__(self, size, fw, fc):
self.blocks = [0 for _ in range(300)]
self.current = -1
self.width = size[0] if len(size) == 1 else size[1]
self.height = size[0]
self.backbone_outs = []
self.fpn_feats = []
self.pan_feats = []
self.yolo_head = []
self.fw = fw
self.fc = fc
self.wc = 0
self.net()
def Conv(self, child):
self.current += 1
if child._get_name() == 'DWConv':
self.convolutional(child.dconv)
self.convolutional(child.pconv)
else:
self.convolutional(child)
def Focus(self, child):
self.current += 1
self.reorg()
self.convolutional(child.conv)
def BaseConv(self, child, stage='', act=None):
self.current += 1
self.convolutional(child, act=act)
if stage == 'fpn':
self.fpn_feats.append(self.current)
def CSPLayer(self, child, stage=''):
self.current += 1
self.convolutional(child.conv2)
self.route('-2')
self.convolutional(child.conv1)
idx = -3
for m in child.m:
if m.use_add:
self.convolutional(m.conv1)
if m.conv2._get_name() == 'DWConv':
self.convolutional(m.conv2.dconv)
self.convolutional(m.conv2.pconv)
self.shortcut(-4)
idx -= 4
else:
self.convolutional(m.conv2)
self.shortcut(-3)
idx -= 3
else:
self.convolutional(m.conv1)
if m.conv2._get_name() == 'DWConv':
self.convolutional(m.conv2.dconv)
self.convolutional(m.conv2.pconv)
idx -= 3
else:
self.convolutional(m.conv2)
idx -= 2
self.route('-1, %d' % idx)
self.convolutional(child.conv3)
if stage == 'backbone':
self.backbone_outs.append(self.current)
elif stage == 'pan':
self.pan_feats.append(self.current)
def SPPBottleneck(self, child):
self.current += 1
self.convolutional(child.conv1)
self.maxpool(child.m[0])
self.route('-2')
self.maxpool(child.m[1])
self.route('-4')
self.maxpool(child.m[2])
self.route('-6, -5, -3, -1')
self.convolutional(child.conv2)
def Upsample(self, child):
self.current += 1
self.upsample(child)
def Concat(self, route):
self.current += 1
r = self.get_route(route)
self.route('-1, %d' % r)
def Route(self, route):
self.current += 1
if route > 0:
r = self.get_route(route)
self.route('%d' % r)
else:
self.route('%d' % route)
def RouteShuffleOut(self, route):
self.current += 1
self.route(route)
self.shuffle(reshape=['c', 'hw'])
self.yolo_head.append(self.current)
def Detect(self, strides):
self.current += 1
routes = self.yolo_head[::-1]
for i, route in enumerate(routes):
routes[i] = self.get_route(route)
self.route(str(routes)[1:-1], axis=1)
self.shuffle(transpose1=[1, 0])
self.yolo(strides)
def net(self):
self.fc.write('[net]\n' +
'width=%d\n' % self.width +
'height=%d\n' % self.height +
'channels=3\n' +
'letter_box=1\n')
def reorg(self):
self.blocks[self.current] += 1
self.fc.write('\n[reorg]\n')
def convolutional(self, cv, act=None, detect=False):
self.blocks[self.current] += 1
self.get_state_dict(cv.state_dict())
if cv._get_name() == 'Conv2d':
filters = cv.out_channels
size = cv.kernel_size
stride = cv.stride
pad = cv.padding
groups = cv.groups
bias = cv.bias
bn = False
act = act if act is not None else 'linear'
else:
filters = cv.conv.out_channels
size = cv.conv.kernel_size
stride = cv.conv.stride
pad = cv.conv.padding
groups = cv.conv.groups
bias = cv.conv.bias
bn = True if hasattr(cv, 'bn') else False
if act is None:
act = self.get_activation(cv.act._get_name()) if hasattr(cv, 'act') else 'linear'
b = 'batch_normalize=1\n' if bn is True else ''
g = 'groups=%d\n' % groups if groups > 1 else ''
w = 'bias=1\n' if bias is not None and bn is not False else 'bias=0\n' if bias is None and bn is False else ''
self.fc.write('\n[convolutional]\n' +
b +
'filters=%d\n' % filters +
'size=%s\n' % self.get_value(size) +
'stride=%s\n' % self.get_value(stride) +
'pad=%s\n' % self.get_value(pad) +
g +
w +
'activation=%s\n' % act)
def route(self, layers, axis=0):
self.blocks[self.current] += 1
a = 'axis=%d\n' % axis if axis != 0 else ''
self.fc.write('\n[route]\n' +
'layers=%s\n' % layers +
a)
def shortcut(self, r, ew='add', act='linear'):
self.blocks[self.current] += 1
m = 'mode=mul\n' if ew == 'mul' else ''
self.fc.write('\n[shortcut]\n' +
'from=%d\n' % r +
m +
'activation=%s\n' % act)
def maxpool(self, m):
self.blocks[self.current] += 1
stride = m.stride
size = m.kernel_size
mode = m.ceil_mode
m = 'maxpool_up' if mode else 'maxpool'
self.fc.write('\n[%s]\n' % m +
'stride=%d\n' % stride +
'size=%d\n' % size)
def upsample(self, child):
self.blocks[self.current] += 1
stride = child.scale_factor
self.fc.write('\n[upsample]\n' +
'stride=%d\n' % stride)
def shuffle(self, reshape=None, transpose1=None, transpose2=None):
self.blocks[self.current] += 1
r = 'reshape=%s\n' % ', '.join(str(x) for x in reshape) if reshape is not None else ''
t1 = 'transpose1=%s\n' % ', '.join(str(x) for x in transpose1) if transpose1 is not None else ''
t2 = 'transpose2=%s\n' % ', '.join(str(x) for x in transpose2) if transpose2 is not None else ''
self.fc.write('\n[shuffle]\n' +
r +
t1 +
t2)
def yolo(self, strides):
self.blocks[self.current] += 1
self.fc.write('\n[detect_x]\n' +
'strides=%s\n' % str(strides)[1:-1])
def get_state_dict(self, state_dict):
for k, v in state_dict.items():
if 'num_batches_tracked' not in k:
vr = v.reshape(-1).numpy()
self.fw.write('{} {} '.format(k, len(vr)))
for vv in vr:
self.fw.write(' ')
self.fw.write(struct.pack('>f', float(vv)).hex())
self.fw.write('\n')
self.wc += 1
def get_value(self, key):
if type(key) == int:
return key
return key[0] if key[0] == key[1] else str(key)[1:-1]
def get_route(self, n):
r = 0
for i, b in enumerate(self.blocks):
if i <= n:
r += b
else:
break
return r - 1
def get_activation(self, act):
if act == 'Hardswish':
return 'hardswish'
elif act == 'LeakyReLU':
return 'leaky'
elif act == 'SiLU':
return 'silu'
return 'linear'
def parse_args():
parser = argparse.ArgumentParser(description='PyTorch YOLOX conversion')
parser.add_argument('-w', '--weights', required=True, help='Input weights (.pth) file path (required)')
parser.add_argument('-e', '--exp', required=True, help='Input exp (.py) file path (required)')
args = parser.parse_args()
if not os.path.isfile(args.weights):
raise SystemExit('Invalid weights file')
if not os.path.isfile(args.exp):
raise SystemExit('Invalid exp file')
return args.weights, args.exp
pth_file, exp_file = parse_args()
exp = get_exp(exp_file)
model = exp.get_model()
model.load_state_dict(torch.load(pth_file, map_location='cpu')['model'])
model.to('cpu').eval()
model_name = exp.exp_name
inference_size = (exp.input_size[1], exp.input_size[0])
backbone = model.backbone._get_name()
head = model.head._get_name()
wts_file = model_name + '.wts' if 'yolox' in model_name else 'yolox_' + model_name + '.wts'
cfg_file = model_name + '.cfg' if 'yolox' in model_name else 'yolox_' + model_name + '.cfg'
with open(wts_file, 'w') as fw, open(cfg_file, 'w') as fc:
layers = Layers(inference_size, fw, fc)
if backbone == 'YOLOPAFPN':
layers.fc.write('\n# YOLOPAFPN\n')
layers.Focus(model.backbone.backbone.stem)
layers.Conv(model.backbone.backbone.dark2[0])
layers.CSPLayer(model.backbone.backbone.dark2[1])
layers.Conv(model.backbone.backbone.dark3[0])
layers.CSPLayer(model.backbone.backbone.dark3[1], 'backbone')
layers.Conv(model.backbone.backbone.dark4[0])
layers.CSPLayer(model.backbone.backbone.dark4[1], 'backbone')
layers.Conv(model.backbone.backbone.dark5[0])
layers.SPPBottleneck(model.backbone.backbone.dark5[1])
layers.CSPLayer(model.backbone.backbone.dark5[2], 'backbone')
layers.BaseConv(model.backbone.lateral_conv0, 'fpn')
layers.Upsample(model.backbone.upsample)
layers.Concat(layers.backbone_outs[1])
layers.CSPLayer(model.backbone.C3_p4)
layers.BaseConv(model.backbone.reduce_conv1, 'fpn')
layers.Upsample(model.backbone.upsample)
layers.Concat(layers.backbone_outs[0])
layers.CSPLayer(model.backbone.C3_p3, 'pan')
layers.Conv(model.backbone.bu_conv2)
layers.Concat(layers.fpn_feats[1])
layers.CSPLayer(model.backbone.C3_n3, 'pan')
layers.Conv(model.backbone.bu_conv1)
layers.Concat(layers.fpn_feats[0])
layers.CSPLayer(model.backbone.C3_n4, 'pan')
layers.pan_feats = layers.pan_feats[::-1]
else:
raise SystemExit('Model not supported')
if head == 'YOLOXHead':
layers.fc.write('\n# YOLOXHead\n')
for i, feat in enumerate(layers.pan_feats):
idx = len(layers.pan_feats) - i - 1
dw = True if model.head.cls_convs[idx][0]._get_name() == 'DWConv' else False
if i > 0:
layers.Route(feat)
layers.BaseConv(model.head.stems[idx])
layers.Conv(model.head.cls_convs[idx][0])
layers.Conv(model.head.cls_convs[idx][1])
layers.BaseConv(model.head.cls_preds[idx], act='sigmoid')
if dw:
layers.Route(-6)
else:
layers.Route(-4)
layers.Conv(model.head.reg_convs[idx][0])
layers.Conv(model.head.reg_convs[idx][1])
layers.BaseConv(model.head.obj_preds[idx], act='sigmoid')
layers.Route(-2)
layers.BaseConv(model.head.reg_preds[idx])
if dw:
layers.RouteShuffleOut('-1, -3, -9')
else:
layers.RouteShuffleOut('-1, -3, -7')
layers.Detect(model.head.strides)
else:
raise SystemExit('Model not supported')
os.system('echo "%d" | cat - %s > temp && mv temp %s' % (layers.wc, wts_file, wts_file))