Minor fixes
This commit is contained in:
@@ -14,11 +14,12 @@ interval=0
|
|||||||
gie-unique-id=1
|
gie-unique-id=1
|
||||||
process-mode=1
|
process-mode=1
|
||||||
network-type=0
|
network-type=0
|
||||||
cluster-mode=4
|
cluster-mode=2
|
||||||
maintain-aspect-ratio=0
|
maintain-aspect-ratio=0
|
||||||
parse-bbox-func-name=NvDsInferParseYolo
|
parse-bbox-func-name=NvDsInferParseYolo
|
||||||
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
||||||
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
||||||
|
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
|
nms-iou-threshold=0.6
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
|
|||||||
@@ -14,11 +14,12 @@ interval=0
|
|||||||
gie-unique-id=1
|
gie-unique-id=1
|
||||||
process-mode=1
|
process-mode=1
|
||||||
network-type=0
|
network-type=0
|
||||||
cluster-mode=4
|
cluster-mode=2
|
||||||
maintain-aspect-ratio=1
|
maintain-aspect-ratio=1
|
||||||
parse-bbox-func-name=NvDsInferParseYolo
|
parse-bbox-func-name=NvDsInferParseYolo
|
||||||
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
||||||
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
||||||
|
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
|
nms-iou-threshold=0.6
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
|
|||||||
@@ -14,11 +14,12 @@ interval=0
|
|||||||
gie-unique-id=1
|
gie-unique-id=1
|
||||||
process-mode=1
|
process-mode=1
|
||||||
network-type=0
|
network-type=0
|
||||||
cluster-mode=4
|
cluster-mode=2
|
||||||
maintain-aspect-ratio=1
|
maintain-aspect-ratio=1
|
||||||
parse-bbox-func-name=NvDsInferParseYolo
|
parse-bbox-func-name=NvDsInferParseYolo
|
||||||
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
|
||||||
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
engine-create-func-name=NvDsInferYoloCudaEngineGet
|
||||||
|
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
|
nms-iou-threshold=0.6
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
|
|||||||
@@ -12,7 +12,7 @@
|
|||||||
### Requirements
|
### Requirements
|
||||||
|
|
||||||
* [DeepStream-Yolo](https://github.com/marcoslucianops/DeepStream-Yolo)
|
* [DeepStream-Yolo](https://github.com/marcoslucianops/DeepStream-Yolo)
|
||||||
* [Pre-treined YOLO model](https://github.com/AlexeyAB/darknet)
|
* Pre-treined YOLO model in Darknet or PyTorch
|
||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
@@ -25,36 +25,11 @@ git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
|
|||||||
cd DeepStream-Yolo
|
cd DeepStream-Yolo
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 2. Remane the obj.names file to labels.txt and copy it to DeepStream-Yolo directory
|
#### 2. Copy your labels file to DeepStream-Yolo directory and remane it to labels.txt
|
||||||
|
|
||||||
#### 3. Copy the yolo.cfg and yolo.weights files to DeepStream-Yolo directory
|
#### 3. Copy the yolo.cfg and yolo.weights/yolo.wts files to DeepStream-Yolo directory
|
||||||
|
|
||||||
#### 4. Edit config_infer_primary.txt for your model
|
**NOTE**: It's important to keep the YOLO model reference (yolov4_, yolov5_, yolor_, etc) in you cfg and weights/wts file to generate the engine correctly.
|
||||||
|
|
||||||
```
|
|
||||||
[property]
|
|
||||||
...
|
|
||||||
# CFG
|
|
||||||
custom-network-config=yolo.cfg
|
|
||||||
# Weights
|
|
||||||
model-file=yolo.weights
|
|
||||||
# Model labels file
|
|
||||||
labelfile-path=labels.txt
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
**NOTE**: If you want to use YOLOv2 or YOLOv2-Tiny models, change the deepstream_app_config.txt file before run it
|
|
||||||
|
|
||||||
```
|
|
||||||
[primary-gie]
|
|
||||||
enable=1
|
|
||||||
gpu-id=0
|
|
||||||
gie-unique-id=1
|
|
||||||
nvbuf-memory-type=0
|
|
||||||
config-file=config_infer_primary_yoloV2.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
**NOTE**: The config_infer_primary.txt file uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while the config_infer_primary_yoloV2.txt file uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.
|
|
||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
@@ -192,6 +167,8 @@ nvbuf-memory-type=0
|
|||||||
config-file=config_infer_primary.txt
|
config-file=config_infer_primary.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**NOTE**: Choose the correct config_infer_primary based on your YOLO model.
|
||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
### Understanding and editing config_infer_primary
|
### Understanding and editing config_infer_primary
|
||||||
@@ -211,6 +188,26 @@ model-color-format=0
|
|||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
|
#### custom-network-config
|
||||||
|
|
||||||
|
* Example for custom YOLOv4 model
|
||||||
|
|
||||||
|
```
|
||||||
|
custom-network-config=yolov4_custom.cfg
|
||||||
|
```
|
||||||
|
|
||||||
|
##
|
||||||
|
|
||||||
|
#### model-file
|
||||||
|
|
||||||
|
* Example for custom YOLOv4 model
|
||||||
|
|
||||||
|
```
|
||||||
|
model-file=yolov4_custom.weights
|
||||||
|
```
|
||||||
|
|
||||||
|
##
|
||||||
|
|
||||||
#### model-engine-file
|
#### model-engine-file
|
||||||
|
|
||||||
* Example for batch-size=1 and network-mode=2
|
* Example for batch-size=1 and network-mode=2
|
||||||
@@ -219,6 +216,12 @@ model-color-format=0
|
|||||||
model-engine-file=model_b1_gpu0_fp16.engine
|
model-engine-file=model_b1_gpu0_fp16.engine
|
||||||
```
|
```
|
||||||
|
|
||||||
|
* Example for batch-size=1 and network-mode=1
|
||||||
|
|
||||||
|
```
|
||||||
|
model-engine-file=model_b1_gpu0_int8.engine
|
||||||
|
```
|
||||||
|
|
||||||
* Example for batch-size=1 and network-mode=0
|
* Example for batch-size=1 and network-mode=0
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -260,15 +263,6 @@ num-detected-classes=80
|
|||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
#### network-type
|
|
||||||
|
|
||||||
```
|
|
||||||
# 0=Detector, 1=Classifier, 2=Segmentation
|
|
||||||
network-type=0
|
|
||||||
```
|
|
||||||
|
|
||||||
##
|
|
||||||
|
|
||||||
#### interval
|
#### interval
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -278,11 +272,19 @@ interval=0
|
|||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
|
#### nms-iou-threshold
|
||||||
|
|
||||||
|
```
|
||||||
|
# IOU threshold
|
||||||
|
nms-iou-threshold=0.6
|
||||||
|
```
|
||||||
|
|
||||||
|
##
|
||||||
|
|
||||||
#### pre-cluster-threshold
|
#### pre-cluster-threshold
|
||||||
|
|
||||||
```
|
```
|
||||||
[class-attrs-all]
|
# Socre threshold
|
||||||
# CONF_THRESH
|
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -42,12 +42,15 @@ static bool getYoloNetworkInfo (NetworkInfo &networkInfo, const NvDsInferContext
|
|||||||
|
|
||||||
yoloType = yoloCfg.substr(0, yoloCfg.find(".cfg"));
|
yoloType = yoloCfg.substr(0, yoloCfg.find(".cfg"));
|
||||||
|
|
||||||
|
networkInfo.inputBlobName = "data";
|
||||||
networkInfo.networkType = yoloType;
|
networkInfo.networkType = yoloType;
|
||||||
networkInfo.configFilePath = initParams->customNetworkConfigFilePath;
|
networkInfo.configFilePath = initParams->customNetworkConfigFilePath;
|
||||||
networkInfo.wtsFilePath = initParams->modelFilePath;
|
networkInfo.wtsFilePath = initParams->modelFilePath;
|
||||||
networkInfo.int8CalibPath = initParams->int8CalibrationFilePath;
|
networkInfo.int8CalibPath = initParams->int8CalibrationFilePath;
|
||||||
networkInfo.deviceType = (initParams->useDLA ? "kDLA" : "kGPU");
|
networkInfo.deviceType = (initParams->useDLA ? "kDLA" : "kGPU");
|
||||||
networkInfo.inputBlobName = "data";
|
networkInfo.numDetectedClasses = initParams->numDetectedClasses;
|
||||||
|
networkInfo.clusterMode = initParams->clusterMode;
|
||||||
|
networkInfo.iouThreshold = initParams->perClassDetectionParams->nmsIOUThreshold;
|
||||||
|
|
||||||
if(initParams->networkMode == 0) {
|
if(initParams->networkMode == 0) {
|
||||||
networkInfo.networkMode = "FP32";
|
networkInfo.networkMode = "FP32";
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -37,103 +37,36 @@ extern "C" bool NvDsInferParseYolo(
|
|||||||
NvDsInferParseDetectionParams const& detectionParams,
|
NvDsInferParseDetectionParams const& detectionParams,
|
||||||
std::vector<NvDsInferParseObjectInfo>& objectList);
|
std::vector<NvDsInferParseObjectInfo>& objectList);
|
||||||
|
|
||||||
static std::vector<NvDsInferParseObjectInfo>
|
static NvDsInferParseObjectInfo convertBBox(
|
||||||
nonMaximumSuppression(const float nmsThresh, std::vector<NvDsInferParseObjectInfo> binfo)
|
const float& bx, const float& by, const float& bw,
|
||||||
{
|
const float& bh, const uint& netW, const uint& netH)
|
||||||
auto overlap1D = [](float x1min, float x1max, float x2min, float x2max) -> float {
|
|
||||||
if (x1min > x2min)
|
|
||||||
{
|
|
||||||
std::swap(x1min, x2min);
|
|
||||||
std::swap(x1max, x2max);
|
|
||||||
}
|
|
||||||
return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
|
|
||||||
};
|
|
||||||
auto computeIoU
|
|
||||||
= [&overlap1D](NvDsInferParseObjectInfo& bbox1, NvDsInferParseObjectInfo& bbox2) -> float {
|
|
||||||
float overlapX
|
|
||||||
= overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
|
|
||||||
float overlapY
|
|
||||||
= overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
|
|
||||||
float area1 = (bbox1.width) * (bbox1.height);
|
|
||||||
float area2 = (bbox2.width) * (bbox2.height);
|
|
||||||
float overlap2D = overlapX * overlapY;
|
|
||||||
float u = area1 + area2 - overlap2D;
|
|
||||||
return u == 0 ? 0 : overlap2D / u;
|
|
||||||
};
|
|
||||||
|
|
||||||
std::stable_sort(binfo.begin(), binfo.end(),
|
|
||||||
[](const NvDsInferParseObjectInfo& b1, const NvDsInferParseObjectInfo& b2) {
|
|
||||||
return b1.detectionConfidence > b2.detectionConfidence;
|
|
||||||
});
|
|
||||||
std::vector<NvDsInferParseObjectInfo> out;
|
|
||||||
for (auto i : binfo)
|
|
||||||
{
|
|
||||||
bool keep = true;
|
|
||||||
for (auto j : out)
|
|
||||||
{
|
|
||||||
if (keep)
|
|
||||||
{
|
|
||||||
float overlap = computeIoU(i, j);
|
|
||||||
keep = overlap <= nmsThresh;
|
|
||||||
}
|
|
||||||
else
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
if (keep) out.push_back(i);
|
|
||||||
}
|
|
||||||
return out;
|
|
||||||
}
|
|
||||||
|
|
||||||
static std::vector<NvDsInferParseObjectInfo>
|
|
||||||
nmsAllClasses(const float nmsThresh,
|
|
||||||
std::vector<NvDsInferParseObjectInfo>& binfo,
|
|
||||||
const uint numClasses)
|
|
||||||
{
|
|
||||||
std::vector<NvDsInferParseObjectInfo> result;
|
|
||||||
std::vector<std::vector<NvDsInferParseObjectInfo>> splitBoxes(numClasses);
|
|
||||||
for (auto& box : binfo)
|
|
||||||
{
|
|
||||||
splitBoxes.at(box.classId).push_back(box);
|
|
||||||
}
|
|
||||||
|
|
||||||
for (auto& boxes : splitBoxes)
|
|
||||||
{
|
|
||||||
boxes = nonMaximumSuppression(nmsThresh, boxes);
|
|
||||||
result.insert(result.end(), boxes.begin(), boxes.end());
|
|
||||||
}
|
|
||||||
return result;
|
|
||||||
}
|
|
||||||
|
|
||||||
static NvDsInferParseObjectInfo convertBBox(const float& bx, const float& by, const float& bw,
|
|
||||||
const float& bh, const int& stride, const uint& netW,
|
|
||||||
const uint& netH)
|
|
||||||
{
|
{
|
||||||
NvDsInferParseObjectInfo b;
|
NvDsInferParseObjectInfo b;
|
||||||
float xCenter = bx * stride;
|
|
||||||
float yCenter = by * stride;
|
|
||||||
float x0 = xCenter - bw / 2;
|
|
||||||
float y0 = yCenter - bh / 2;
|
|
||||||
float x1 = x0 + bw;
|
|
||||||
float y1 = y0 + bh;
|
|
||||||
|
|
||||||
x0 = clamp(x0, 0, netW);
|
float x1 = bx - bw / 2;
|
||||||
y0 = clamp(y0, 0, netH);
|
float y1 = by - bh / 2;
|
||||||
|
float x2 = x1 + bw;
|
||||||
|
float y2 = y1 + bh;
|
||||||
|
|
||||||
x1 = clamp(x1, 0, netW);
|
x1 = clamp(x1, 0, netW);
|
||||||
y1 = clamp(y1, 0, netH);
|
y1 = clamp(y1, 0, netH);
|
||||||
|
x2 = clamp(x2, 0, netW);
|
||||||
|
y2 = clamp(y2, 0, netH);
|
||||||
|
|
||||||
b.left = x0;
|
b.left = x1;
|
||||||
b.width = clamp(x1 - x0, 0, netW);
|
b.width = clamp(x2 - x1, 0, netW);
|
||||||
b.top = y0;
|
b.top = y1;
|
||||||
b.height = clamp(y1 - y0, 0, netH);
|
b.height = clamp(y2 - y1, 0, netH);
|
||||||
|
|
||||||
return b;
|
return b;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void addBBoxProposal(const float bx, const float by, const float bw, const float bh,
|
static void addBBoxProposal(
|
||||||
const uint stride, const uint& netW, const uint& netH, const int maxIndex,
|
const float bx, const float by, const float bw, const float bh,
|
||||||
|
const uint& netW, const uint& netH, const int maxIndex,
|
||||||
const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
|
const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
|
||||||
{
|
{
|
||||||
NvDsInferParseObjectInfo bbi = convertBBox(bx, by, bw, bh, stride, netW, netH);
|
NvDsInferParseObjectInfo bbi = convertBBox(bx, by, bw, bh, netW, netH);
|
||||||
if (bbi.width < 1 || bbi.height < 1) return;
|
if (bbi.width < 1 || bbi.height < 1) return;
|
||||||
|
|
||||||
bbi.detectionConfidence = maxProb;
|
bbi.detectionConfidence = maxProb;
|
||||||
@@ -141,11 +74,10 @@ static void addBBoxProposal(const float bx, const float by, const float bw, cons
|
|||||||
binfo.push_back(bbi);
|
binfo.push_back(bbi);
|
||||||
}
|
}
|
||||||
|
|
||||||
static std::vector<NvDsInferParseObjectInfo>
|
static std::vector<NvDsInferParseObjectInfo> decodeYoloTensor(
|
||||||
decodeYoloTensor(
|
|
||||||
const float* detections,
|
const float* detections,
|
||||||
const uint gridSizeW, const uint gridSizeH, const uint stride, const uint numBBoxes,
|
const uint gridSizeW, const uint gridSizeH, const uint numBBoxes,
|
||||||
const uint numOutputClasses, const uint& netW, const uint& netH, const float confThresh)
|
const uint numOutputClasses, const uint& netW, const uint& netH)
|
||||||
{
|
{
|
||||||
std::vector<NvDsInferParseObjectInfo> binfo;
|
std::vector<NvDsInferParseObjectInfo> binfo;
|
||||||
for (uint y = 0; y < gridSizeH; ++y) {
|
for (uint y = 0; y < gridSizeH; ++y) {
|
||||||
@@ -163,92 +95,32 @@ decodeYoloTensor(
|
|||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)];
|
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)];
|
||||||
const float bh
|
const float bh
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)];
|
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)];
|
||||||
|
|
||||||
const float maxProb
|
const float maxProb
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)];
|
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)];
|
||||||
|
|
||||||
const int maxIndex
|
const int maxIndex
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 5)];
|
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 5)];
|
||||||
|
|
||||||
if (maxProb > confThresh)
|
addBBoxProposal(bx, by, bw, bh, netW, netH, maxIndex, maxProb, binfo);
|
||||||
{
|
|
||||||
addBBoxProposal(bx, by, bw, bh, stride, netW, netH, maxIndex, maxProb, binfo);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return binfo;
|
return binfo;
|
||||||
}
|
}
|
||||||
|
|
||||||
static std::vector<NvDsInferParseObjectInfo>
|
static bool NvDsInferParseCustomYolo(
|
||||||
decodeYoloV2Tensor(
|
|
||||||
const float* detections,
|
|
||||||
const uint gridSizeW, const uint gridSizeH, const uint stride, const uint numBBoxes,
|
|
||||||
const uint numOutputClasses, const uint& netW, const uint& netH)
|
|
||||||
{
|
|
||||||
std::vector<NvDsInferParseObjectInfo> binfo;
|
|
||||||
for (uint y = 0; y < gridSizeH; ++y) {
|
|
||||||
for (uint x = 0; x < gridSizeW; ++x) {
|
|
||||||
for (uint b = 0; b < numBBoxes; ++b)
|
|
||||||
{
|
|
||||||
const int numGridCells = gridSizeH * gridSizeW;
|
|
||||||
const int bbindex = y * gridSizeW + x;
|
|
||||||
|
|
||||||
const float bx
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 0)];
|
|
||||||
const float by
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 1)];
|
|
||||||
const float bw
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)] * stride;
|
|
||||||
const float bh
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)] * stride;
|
|
||||||
|
|
||||||
const float maxProb
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)];
|
|
||||||
|
|
||||||
const int maxIndex
|
|
||||||
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 5)];
|
|
||||||
|
|
||||||
addBBoxProposal(bx, by, bw, bh, stride, netW, netH, maxIndex, maxProb, binfo);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return binfo;
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline std::vector<const NvDsInferLayerInfo*>
|
|
||||||
SortLayers(const std::vector<NvDsInferLayerInfo> & outputLayersInfo)
|
|
||||||
{
|
|
||||||
std::vector<const NvDsInferLayerInfo*> outLayers;
|
|
||||||
for (auto const &layer : outputLayersInfo) {
|
|
||||||
outLayers.push_back (&layer);
|
|
||||||
}
|
|
||||||
std::sort(outLayers.begin(), outLayers.end(),
|
|
||||||
[](const NvDsInferLayerInfo* a, const NvDsInferLayerInfo* b) {
|
|
||||||
return a->inferDims.d[1] < b->inferDims.d[1];
|
|
||||||
});
|
|
||||||
return outLayers;
|
|
||||||
}
|
|
||||||
|
|
||||||
static bool NvDsInferParseYolo(
|
|
||||||
std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
|
std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
|
||||||
NvDsInferNetworkInfo const& networkInfo,
|
NvDsInferNetworkInfo const& networkInfo,
|
||||||
NvDsInferParseDetectionParams const& detectionParams,
|
NvDsInferParseDetectionParams const& detectionParams,
|
||||||
std::vector<NvDsInferParseObjectInfo>& objectList,
|
std::vector<NvDsInferParseObjectInfo>& objectList,
|
||||||
const uint &numBBoxes,
|
const uint &numBBoxes,
|
||||||
const uint &numClasses,
|
const uint &numClasses)
|
||||||
const float &betaNMS)
|
|
||||||
{
|
{
|
||||||
if (outputLayersInfo.empty()) {
|
if (outputLayersInfo.empty())
|
||||||
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;;
|
{
|
||||||
|
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
const float kCONF_THRESH = detectionParams.perClassThreshold[0];
|
|
||||||
|
|
||||||
const std::vector<const NvDsInferLayerInfo*> sortedLayers =
|
|
||||||
SortLayers (outputLayersInfo);
|
|
||||||
|
|
||||||
if (numClasses != detectionParams.numClassesConfigured)
|
if (numClasses != detectionParams.numClassesConfigured)
|
||||||
{
|
{
|
||||||
std::cerr << "WARNING: Num classes mismatch. Configured: "
|
std::cerr << "WARNING: Num classes mismatch. Configured: "
|
||||||
@@ -258,57 +130,23 @@ static bool NvDsInferParseYolo(
|
|||||||
|
|
||||||
std::vector<NvDsInferParseObjectInfo> objects;
|
std::vector<NvDsInferParseObjectInfo> objects;
|
||||||
|
|
||||||
for (uint idx = 0; idx < sortedLayers.size(); ++idx) {
|
for (uint idx = 0; idx < outputLayersInfo.size(); ++idx)
|
||||||
const NvDsInferLayerInfo &layer = *sortedLayers[idx]; // 255 x Grid x Grid
|
{
|
||||||
|
const NvDsInferLayerInfo &layer = outputLayersInfo[idx];
|
||||||
|
|
||||||
assert(layer.inferDims.numDims == 3);
|
assert(layer.inferDims.numDims == 3);
|
||||||
const uint gridSizeH = layer.inferDims.d[1];
|
const uint gridSizeH = layer.inferDims.d[1];
|
||||||
const uint gridSizeW = layer.inferDims.d[2];
|
const uint gridSizeW = layer.inferDims.d[2];
|
||||||
const uint stride = DIVUP(networkInfo.width, gridSizeW);
|
|
||||||
|
|
||||||
std::vector<NvDsInferParseObjectInfo> outObjs =
|
std::vector<NvDsInferParseObjectInfo> outObjs =
|
||||||
decodeYoloTensor((const float*)(layer.buffer), gridSizeW, gridSizeH, stride, numBBoxes,
|
decodeYoloTensor(
|
||||||
numClasses, networkInfo.width, networkInfo.height, kCONF_THRESH);
|
(const float*)(layer.buffer),
|
||||||
|
gridSizeW, gridSizeH, numBBoxes, numClasses,
|
||||||
|
networkInfo.width, networkInfo.height);
|
||||||
|
|
||||||
objects.insert(objects.end(), outObjs.begin(), outObjs.end());
|
objects.insert(objects.end(), outObjs.begin(), outObjs.end());
|
||||||
}
|
}
|
||||||
|
|
||||||
objectList.clear();
|
|
||||||
objectList = nmsAllClasses(betaNMS, objects, numClasses);
|
|
||||||
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
static bool NvDsInferParseYoloV2(
|
|
||||||
std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
|
|
||||||
NvDsInferNetworkInfo const& networkInfo,
|
|
||||||
NvDsInferParseDetectionParams const& detectionParams,
|
|
||||||
std::vector<NvDsInferParseObjectInfo>& objectList,
|
|
||||||
const uint &numBBoxes,
|
|
||||||
const uint &numClasses)
|
|
||||||
{
|
|
||||||
if (outputLayersInfo.empty()) {
|
|
||||||
std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;;
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
|
|
||||||
const NvDsInferLayerInfo &layer = outputLayersInfo[0];
|
|
||||||
|
|
||||||
if (numClasses != detectionParams.numClassesConfigured)
|
|
||||||
{
|
|
||||||
std::cerr << "WARNING: Num classes mismatch. Configured: "
|
|
||||||
<< detectionParams.numClassesConfigured
|
|
||||||
<< ", detected by network: " << numClasses << std::endl;
|
|
||||||
}
|
|
||||||
|
|
||||||
assert(layer.inferDims.numDims == 3);
|
|
||||||
const uint gridSizeH = layer.inferDims.d[1];
|
|
||||||
const uint gridSizeW = layer.inferDims.d[2];
|
|
||||||
const uint stride = DIVUP(networkInfo.width, gridSizeW);
|
|
||||||
|
|
||||||
std::vector<NvDsInferParseObjectInfo> objects =
|
|
||||||
decodeYoloV2Tensor((const float*)(layer.buffer), gridSizeW, gridSizeH, stride, numBBoxes,
|
|
||||||
numClasses, networkInfo.width, networkInfo.height);
|
|
||||||
|
|
||||||
objectList = objects;
|
objectList = objects;
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
@@ -320,19 +158,11 @@ extern "C" bool NvDsInferParseYolo(
|
|||||||
NvDsInferParseDetectionParams const& detectionParams,
|
NvDsInferParseDetectionParams const& detectionParams,
|
||||||
std::vector<NvDsInferParseObjectInfo>& objectList)
|
std::vector<NvDsInferParseObjectInfo>& objectList)
|
||||||
{
|
{
|
||||||
int model_type = kMODEL_TYPE;
|
uint numBBoxes = kNUM_BBOXES;
|
||||||
int num_bboxes = kNUM_BBOXES;
|
uint numClasses = kNUM_CLASSES;
|
||||||
int num_classes = kNUM_CLASSES;
|
|
||||||
float beta_nms = kBETA_NMS;
|
|
||||||
|
|
||||||
if (model_type != 0) {
|
return NvDsInferParseCustomYolo (
|
||||||
return NvDsInferParseYolo (outputLayersInfo, networkInfo, detectionParams, objectList,
|
outputLayersInfo, networkInfo, detectionParams, objectList, numBBoxes, numClasses);
|
||||||
num_bboxes, num_classes, beta_nms);
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
return NvDsInferParseYoloV2 (outputLayersInfo, networkInfo, detectionParams, objectList,
|
|
||||||
num_bboxes, num_classes);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo);
|
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo);
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2019-2020, NVIDIA CORPORATION. All rights reserved.
|
* Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
* copy of this software and associated documentation files (the "Software"),
|
* copy of this software and associated documentation files (the "Software"),
|
||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -149,12 +149,6 @@ int getNumChannels(nvinfer1::ITensor* t)
|
|||||||
return d.d[0];
|
return d.d[0];
|
||||||
}
|
}
|
||||||
|
|
||||||
uint64_t get3DTensorVolume(nvinfer1::Dims inputDims)
|
|
||||||
{
|
|
||||||
assert(inputDims.nbDims == 3);
|
|
||||||
return inputDims.d[0] * inputDims.d[1] * inputDims.d[2];
|
|
||||||
}
|
|
||||||
|
|
||||||
void printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput,
|
void printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput,
|
||||||
std::string layerOutput, std::string weightPtr)
|
std::string layerOutput, std::string weightPtr)
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (c) 2019-2020, NVIDIA CORPORATION. All rights reserved.
|
* Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
|
||||||
*
|
*
|
||||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||||
* copy of this software and associated documentation files (the "Software"),
|
* copy of this software and associated documentation files (the "Software"),
|
||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -35,17 +35,12 @@
|
|||||||
|
|
||||||
#include "NvInfer.h"
|
#include "NvInfer.h"
|
||||||
|
|
||||||
#define UNUSED(expr) (void)(expr)
|
|
||||||
#define DIVUP(n, d) ((n) + (d)-1) / (d)
|
|
||||||
|
|
||||||
std::string trim(std::string s);
|
std::string trim(std::string s);
|
||||||
float clamp(const float val, const float minVal, const float maxVal);
|
float clamp(const float val, const float minVal, const float maxVal);
|
||||||
bool fileExists(const std::string fileName, bool verbose = true);
|
bool fileExists(const std::string fileName, bool verbose = true);
|
||||||
std::vector<float> loadWeights(const std::string weightsFilePath, const std::string& networkType);
|
std::vector<float> loadWeights(const std::string weightsFilePath, const std::string& networkType);
|
||||||
std::string dimsToString(const nvinfer1::Dims d);
|
std::string dimsToString(const nvinfer1::Dims d);
|
||||||
void displayDimType(const nvinfer1::Dims d);
|
|
||||||
int getNumChannels(nvinfer1::ITensor* t);
|
int getNumChannels(nvinfer1::ITensor* t);
|
||||||
uint64_t get3DTensorVolume(nvinfer1::Dims inputDims);
|
|
||||||
|
|
||||||
void printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput,
|
void printLayerInfo(std::string layerIndex, std::string layerName, std::string layerInput,
|
||||||
std::string layerOutput, std::string weightPtr);
|
std::string layerOutput, std::string weightPtr);
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -32,17 +32,23 @@
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
Yolo::Yolo(const NetworkInfo& networkInfo)
|
Yolo::Yolo(const NetworkInfo& networkInfo)
|
||||||
: m_NetworkType(networkInfo.networkType), // YOLO type
|
: m_InputBlobName(networkInfo.inputBlobName),
|
||||||
m_ConfigFilePath(networkInfo.configFilePath), // YOLO cfg
|
m_NetworkType(networkInfo.networkType),
|
||||||
m_WtsFilePath(networkInfo.wtsFilePath), // YOLO weights
|
m_ConfigFilePath(networkInfo.configFilePath),
|
||||||
m_Int8CalibPath(networkInfo.int8CalibPath), // INT8 calibration path
|
m_WtsFilePath(networkInfo.wtsFilePath),
|
||||||
m_NetworkMode(networkInfo.networkMode), // FP32, INT8, FP16
|
m_Int8CalibPath(networkInfo.int8CalibPath),
|
||||||
m_DeviceType(networkInfo.deviceType), // kDLA, kGPU
|
m_DeviceType(networkInfo.deviceType),
|
||||||
m_InputBlobName(networkInfo.inputBlobName), // data
|
m_NumDetectedClasses(networkInfo.numDetectedClasses),
|
||||||
|
m_ClusterMode(networkInfo.clusterMode),
|
||||||
|
m_IouThreshold(networkInfo.iouThreshold),
|
||||||
|
m_NetworkMode(networkInfo.networkMode),
|
||||||
m_InputH(0),
|
m_InputH(0),
|
||||||
m_InputW(0),
|
m_InputW(0),
|
||||||
m_InputC(0),
|
m_InputC(0),
|
||||||
m_InputSize(0)
|
m_InputSize(0),
|
||||||
|
m_NumClasses(0),
|
||||||
|
m_LetterBox(0),
|
||||||
|
m_BetaNMS(networkInfo.iouThreshold)
|
||||||
{}
|
{}
|
||||||
|
|
||||||
Yolo::~Yolo()
|
Yolo::~Yolo()
|
||||||
@@ -65,9 +71,19 @@ nvinfer1::ICudaEngine *Yolo::createEngine (nvinfer1::IBuilder* builder, nvinfer1
|
|||||||
|
|
||||||
std::cout << "Building the TensorRT Engine" << std::endl;
|
std::cout << "Building the TensorRT Engine" << std::endl;
|
||||||
|
|
||||||
if (m_LetterBox == 1) {
|
if (m_NumClasses != m_NumDetectedClasses) {
|
||||||
std::cout << "\nNOTE: letter_box is set in cfg file, make sure to set maintain-aspect-ratio=1 in config_infer file to get better accuracy\n" << std::endl;
|
std::cout << "\nNOTE: Number of classes mismatch, make sure to set num-detected-classes=" << m_NumClasses << " in config_infer file" << std::endl;
|
||||||
}
|
}
|
||||||
|
if (m_LetterBox == 1) {
|
||||||
|
std::cout << "\nNOTE: letter_box is set in cfg file, make sure to set maintain-aspect-ratio=1 in config_infer file to get better accuracy" << std::endl;
|
||||||
|
}
|
||||||
|
if (m_BetaNMS != m_IouThreshold) {
|
||||||
|
std::cout << "\nNOTE: beta_nms is set in cfg file, make sure to set nms-iou-threshold=" << m_BetaNMS << " in config_infer file to get better accuracy" << std::endl;
|
||||||
|
}
|
||||||
|
if (m_ClusterMode != 2) {
|
||||||
|
std::cout << "\nNOTE: Wrong cluster-mode is set, make sure to set cluster-mode=2 in config_infer file" << std::endl;
|
||||||
|
}
|
||||||
|
std::cout << "" << std::endl;
|
||||||
|
|
||||||
if (m_NetworkMode == "INT8" && !fileExists(m_Int8CalibPath)) {
|
if (m_NetworkMode == "INT8" && !fileExists(m_Int8CalibPath)) {
|
||||||
assert(builder->platformHasFastInt8());
|
assert(builder->platformHasFastInt8());
|
||||||
@@ -314,43 +330,31 @@ NvDsInferStatus Yolo::buildYoloNetwork(
|
|||||||
}
|
}
|
||||||
|
|
||||||
else if (m_ConfigBlocks.at(i).at("type") == "yolo") {
|
else if (m_ConfigBlocks.at(i).at("type") == "yolo") {
|
||||||
uint model_type;
|
uint modelType = 1;
|
||||||
|
uint newCoords = 0;
|
||||||
|
float scaleXY = 1.0;
|
||||||
if (m_NetworkType.find("yolor") != std::string::npos) {
|
if (m_NetworkType.find("yolor") != std::string::npos) {
|
||||||
model_type = 2;
|
modelType = 2;
|
||||||
}
|
}
|
||||||
else {
|
|
||||||
model_type = 1;
|
|
||||||
}
|
|
||||||
nvinfer1::Dims prevTensorDims = previous->getDimensions();
|
|
||||||
TensorInfo& curYoloTensor = m_OutputTensors.at(outputTensorCount);
|
|
||||||
curYoloTensor.gridSizeY = prevTensorDims.d[1];
|
|
||||||
curYoloTensor.gridSizeX = prevTensorDims.d[2];
|
|
||||||
curYoloTensor.stride = m_InputH / curYoloTensor.gridSizeY;
|
|
||||||
m_OutputTensors.at(outputTensorCount).volume = curYoloTensor.gridSizeY
|
|
||||||
* curYoloTensor.gridSizeX
|
|
||||||
* (curYoloTensor.numBBoxes * (5 + curYoloTensor.numClasses));
|
|
||||||
std::string layerName = "yolo_" + std::to_string(i);
|
|
||||||
curYoloTensor.blobName = layerName;
|
|
||||||
int new_coords = 0;
|
|
||||||
float scale_x_y = 1;
|
|
||||||
float beta_nms = 0.45;
|
|
||||||
if (m_ConfigBlocks.at(i).find("new_coords") != m_ConfigBlocks.at(i).end()) {
|
if (m_ConfigBlocks.at(i).find("new_coords") != m_ConfigBlocks.at(i).end()) {
|
||||||
new_coords = std::stoi(m_ConfigBlocks.at(i).at("new_coords"));
|
newCoords = std::stoi(m_ConfigBlocks.at(i).at("new_coords"));
|
||||||
}
|
}
|
||||||
if (m_ConfigBlocks.at(i).find("scale_x_y") != m_ConfigBlocks.at(i).end()) {
|
if (m_ConfigBlocks.at(i).find("scale_x_y") != m_ConfigBlocks.at(i).end()) {
|
||||||
scale_x_y = std::stof(m_ConfigBlocks.at(i).at("scale_x_y"));
|
scaleXY = std::stof(m_ConfigBlocks.at(i).at("scale_x_y"));
|
||||||
}
|
}
|
||||||
if (m_ConfigBlocks.at(i).find("beta_nms") != m_ConfigBlocks.at(i).end()) {
|
if (m_ConfigBlocks.at(i).find("beta_nms") != m_ConfigBlocks.at(i).end()) {
|
||||||
beta_nms = std::stof(m_ConfigBlocks.at(i).at("beta_nms"));
|
m_BetaNMS = std::stof(m_ConfigBlocks.at(i).at("beta_nms"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
std::string layerName = "yolo_" + std::to_string(i);
|
||||||
|
nvinfer1::Dims prevTensorDims = previous->getDimensions();
|
||||||
|
TensorInfo& curYoloTensor = m_OutputTensors.at(outputTensorCount);
|
||||||
|
m_NumClasses = curYoloTensor.numClasses;
|
||||||
|
curYoloTensor.blobName = layerName;
|
||||||
nvinfer1::IPluginV2* yoloPlugin
|
nvinfer1::IPluginV2* yoloPlugin
|
||||||
= new YoloLayer(curYoloTensor.numBBoxes,
|
= new YoloLayer(curYoloTensor.numBBoxes, curYoloTensor.numClasses, m_InputW, m_InputH,
|
||||||
curYoloTensor.numClasses,
|
prevTensorDims.d[2], prevTensorDims.d[1], modelType, newCoords, scaleXY,
|
||||||
curYoloTensor.gridSizeX,
|
curYoloTensor.anchors, curYoloTensor.mask);
|
||||||
curYoloTensor.gridSizeY,
|
|
||||||
model_type, new_coords, scale_x_y, beta_nms,
|
|
||||||
curYoloTensor.anchors,
|
|
||||||
curYoloTensor.masks);
|
|
||||||
assert(yoloPlugin != nullptr);
|
assert(yoloPlugin != nullptr);
|
||||||
nvinfer1::IPluginV2Layer* yolo =
|
nvinfer1::IPluginV2Layer* yolo =
|
||||||
network.addPluginV2(&previous, 1, *yoloPlugin);
|
network.addPluginV2(&previous, 1, *yoloPlugin);
|
||||||
@@ -368,26 +372,16 @@ NvDsInferStatus Yolo::buildYoloNetwork(
|
|||||||
++outputTensorCount;
|
++outputTensorCount;
|
||||||
}
|
}
|
||||||
|
|
||||||
//YOLOv2 support
|
|
||||||
else if (m_ConfigBlocks.at(i).at("type") == "region") {
|
else if (m_ConfigBlocks.at(i).at("type") == "region") {
|
||||||
|
std::vector<int> mask;
|
||||||
|
std::string layerName = "region_" + std::to_string(i);
|
||||||
nvinfer1::Dims prevTensorDims = previous->getDimensions();
|
nvinfer1::Dims prevTensorDims = previous->getDimensions();
|
||||||
TensorInfo& curRegionTensor = m_OutputTensors.at(outputTensorCount);
|
TensorInfo& curRegionTensor = m_OutputTensors.at(outputTensorCount);
|
||||||
curRegionTensor.gridSizeY = prevTensorDims.d[1];
|
m_NumClasses = curRegionTensor.numClasses;
|
||||||
curRegionTensor.gridSizeX = prevTensorDims.d[2];
|
|
||||||
curRegionTensor.stride = m_InputH / curRegionTensor.gridSizeY;
|
|
||||||
m_OutputTensors.at(outputTensorCount).volume = curRegionTensor.gridSizeY
|
|
||||||
* curRegionTensor.gridSizeX
|
|
||||||
* (curRegionTensor.numBBoxes * (5 + curRegionTensor.numClasses));
|
|
||||||
std::string layerName = "region_" + std::to_string(i);
|
|
||||||
curRegionTensor.blobName = layerName;
|
curRegionTensor.blobName = layerName;
|
||||||
std::vector<int> mask;
|
|
||||||
nvinfer1::IPluginV2* regionPlugin
|
nvinfer1::IPluginV2* regionPlugin
|
||||||
= new YoloLayer(curRegionTensor.numBBoxes,
|
= new YoloLayer(curRegionTensor.numBBoxes, curRegionTensor.numClasses, m_InputW, m_InputH,
|
||||||
curRegionTensor.numClasses,
|
prevTensorDims.d[2], prevTensorDims.d[1], 0, 0, 1.0, curRegionTensor.anchors,
|
||||||
curRegionTensor.gridSizeX,
|
|
||||||
curRegionTensor.gridSizeY,
|
|
||||||
0, 0, 1.0, 0,
|
|
||||||
curRegionTensor.anchors,
|
|
||||||
mask);
|
mask);
|
||||||
assert(regionPlugin != nullptr);
|
assert(regionPlugin != nullptr);
|
||||||
nvinfer1::IPluginV2Layer* region =
|
nvinfer1::IPluginV2Layer* region =
|
||||||
@@ -531,20 +525,20 @@ void Yolo::parseConfigBlocks()
|
|||||||
if (npos != -1)
|
if (npos != -1)
|
||||||
{
|
{
|
||||||
int mask = std::stoul(trim(maskString.substr(0, npos)));
|
int mask = std::stoul(trim(maskString.substr(0, npos)));
|
||||||
outputTensor.masks.push_back(mask);
|
outputTensor.mask.push_back(mask);
|
||||||
maskString.erase(0, npos + 1);
|
maskString.erase(0, npos + 1);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
int mask = std::stoul(trim(maskString));
|
int mask = std::stoul(trim(maskString));
|
||||||
outputTensor.masks.push_back(mask);
|
outputTensor.mask.push_back(mask);
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
outputTensor.numBBoxes = outputTensor.masks.size() > 0
|
outputTensor.numBBoxes = outputTensor.mask.size() > 0
|
||||||
? outputTensor.masks.size()
|
? outputTensor.mask.size()
|
||||||
: std::stoul(trim(block.at("num")));
|
: std::stoul(trim(block.at("num")));
|
||||||
outputTensor.numClasses = std::stoul(block.at("classes"));
|
outputTensor.numClasses = std::stoul(block.at("classes"));
|
||||||
m_OutputTensors.push_back(outputTensor);
|
m_OutputTensors.push_back(outputTensor);
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -40,28 +40,25 @@
|
|||||||
|
|
||||||
struct NetworkInfo
|
struct NetworkInfo
|
||||||
{
|
{
|
||||||
|
std::string inputBlobName;
|
||||||
std::string networkType;
|
std::string networkType;
|
||||||
std::string configFilePath;
|
std::string configFilePath;
|
||||||
std::string wtsFilePath;
|
std::string wtsFilePath;
|
||||||
std::string int8CalibPath;
|
std::string int8CalibPath;
|
||||||
std::string networkMode;
|
|
||||||
std::string deviceType;
|
std::string deviceType;
|
||||||
std::string inputBlobName;
|
uint numDetectedClasses;
|
||||||
|
int clusterMode;
|
||||||
|
float iouThreshold;
|
||||||
|
std::string networkMode;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct TensorInfo
|
struct TensorInfo
|
||||||
{
|
{
|
||||||
std::string blobName;
|
std::string blobName;
|
||||||
uint stride{0};
|
uint numBBoxes {0};
|
||||||
uint gridSizeY{0};
|
uint numClasses {0};
|
||||||
uint gridSizeX{0};
|
|
||||||
uint numClasses{0};
|
|
||||||
uint numBBoxes{0};
|
|
||||||
uint64_t volume{0};
|
|
||||||
std::vector<int> masks;
|
|
||||||
std::vector<float> anchors;
|
std::vector<float> anchors;
|
||||||
int bindingIndex{-1};
|
std::vector<int> mask;
|
||||||
float* hostBuffer{nullptr};
|
|
||||||
};
|
};
|
||||||
|
|
||||||
class Yolo : public IModelParser {
|
class Yolo : public IModelParser {
|
||||||
@@ -78,21 +75,27 @@ public:
|
|||||||
nvinfer1::ICudaEngine *createEngine (nvinfer1::IBuilder* builder, nvinfer1::IBuilderConfig* config);
|
nvinfer1::ICudaEngine *createEngine (nvinfer1::IBuilder* builder, nvinfer1::IBuilderConfig* config);
|
||||||
|
|
||||||
protected:
|
protected:
|
||||||
|
const std::string m_InputBlobName;
|
||||||
const std::string m_NetworkType;
|
const std::string m_NetworkType;
|
||||||
const std::string m_ConfigFilePath;
|
const std::string m_ConfigFilePath;
|
||||||
const std::string m_WtsFilePath;
|
const std::string m_WtsFilePath;
|
||||||
const std::string m_Int8CalibPath;
|
const std::string m_Int8CalibPath;
|
||||||
const std::string m_NetworkMode;
|
|
||||||
const std::string m_DeviceType;
|
const std::string m_DeviceType;
|
||||||
const std::string m_InputBlobName;
|
const uint m_NumDetectedClasses;
|
||||||
std::vector<TensorInfo> m_OutputTensors;
|
const int m_ClusterMode;
|
||||||
std::vector<std::map<std::string, std::string>> m_ConfigBlocks;
|
const float m_IouThreshold;
|
||||||
|
const std::string m_NetworkMode;
|
||||||
|
|
||||||
uint m_InputH;
|
uint m_InputH;
|
||||||
uint m_InputW;
|
uint m_InputW;
|
||||||
uint m_InputC;
|
uint m_InputC;
|
||||||
uint64_t m_InputSize;
|
uint64_t m_InputSize;
|
||||||
|
uint m_NumClasses;
|
||||||
uint m_LetterBox;
|
uint m_LetterBox;
|
||||||
|
float m_BetaNMS;
|
||||||
|
|
||||||
|
std::vector<TensorInfo> m_OutputTensors;
|
||||||
|
std::vector<std::map<std::string, std::string>> m_ConfigBlocks;
|
||||||
std::vector<nvinfer1::Weights> m_TrtWeights;
|
std::vector<nvinfer1::Weights> m_TrtWeights;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
|||||||
@@ -11,8 +11,10 @@
|
|||||||
|
|
||||||
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
||||||
|
|
||||||
__global__ void gpuYoloLayer(const float* input, float* output, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
|
__global__ void gpuYoloLayer(
|
||||||
const uint numBBoxes, const float scaleXY, const float* anchors, const int* mask)
|
const float* input, float* output, const uint netWidth, const uint netHeight, const uint gridSizeX,
|
||||||
|
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY,
|
||||||
|
const float* anchors, const int* mask)
|
||||||
{
|
{
|
||||||
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
||||||
@@ -30,16 +32,20 @@ __global__ void gpuYoloLayer(const float* input, float* output, const uint gridS
|
|||||||
const float beta = -0.5 * (scaleXY - 1);
|
const float beta = -0.5 * (scaleXY - 1);
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) * alpha + beta + x_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)])
|
||||||
|
* alpha + beta + x_id) * netWidth / gridSizeX;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) * alpha + beta + y_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)])
|
||||||
|
* alpha + beta + y_id) * netHeight / gridSizeY;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
||||||
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[mask[z_id] * 2];
|
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)])
|
||||||
|
* anchors[mask[z_id] * 2];
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
||||||
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[mask[z_id] * 2 + 1];
|
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)])
|
||||||
|
* anchors[mask[z_id] * 2 + 1];
|
||||||
|
|
||||||
const float objectness
|
const float objectness
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
|
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
|
||||||
@@ -66,24 +72,28 @@ __global__ void gpuYoloLayer(const float* input, float* output, const uint gridS
|
|||||||
= maxIndex;
|
= maxIndex;
|
||||||
}
|
}
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask);
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask)
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
|
||||||
{
|
{
|
||||||
dim3 threads_per_block(16, 16, 4);
|
dim3 threads_per_block(16, 16, 4);
|
||||||
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
||||||
(gridSizeY / threads_per_block.y) + 1,
|
(gridSizeY / threads_per_block.y) + 1,
|
||||||
(numBBoxes / threads_per_block.z) + 1);
|
(numBBoxes / threads_per_block.z) + 1);
|
||||||
|
|
||||||
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
||||||
{
|
{
|
||||||
gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
gpuYoloLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
||||||
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
||||||
reinterpret_cast<float*>(output) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
|
reinterpret_cast<float*>(output) + (batch * outputSize),
|
||||||
numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY,
|
||||||
|
reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
||||||
}
|
}
|
||||||
return cudaGetLastError();
|
return cudaGetLastError();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -9,8 +9,10 @@
|
|||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
#include <string.h>
|
#include <string.h>
|
||||||
|
|
||||||
__global__ void gpuYoloLayer_nc(const float* input, float* output, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
|
__global__ void gpuYoloLayer_nc(
|
||||||
const uint numBBoxes, const float scaleXY, const float* anchors, const int* mask)
|
const float* input, float* output, const uint netWidth, const uint netHeight, const uint gridSizeX,
|
||||||
|
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY,
|
||||||
|
const float* anchors, const int* mask)
|
||||||
{
|
{
|
||||||
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
||||||
@@ -28,16 +30,20 @@ __global__ void gpuYoloLayer_nc(const float* input, float* output, const uint gr
|
|||||||
const float beta = -0.5 * (scaleXY - 1);
|
const float beta = -0.5 * (scaleXY - 1);
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
||||||
= input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)] * alpha + beta + x_id;
|
= (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
||||||
|
* alpha + beta + x_id) * netWidth / gridSizeX;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
||||||
= input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)] * alpha + beta + y_id;
|
= (input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
||||||
|
* alpha + beta + y_id) * netHeight / gridSizeY;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
||||||
= __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)] * 2, 2) * anchors[mask[z_id] * 2];
|
= __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)] * 2, 2)
|
||||||
|
* anchors[mask[z_id] * 2];
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
||||||
= __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)] * 2, 2) * anchors[mask[z_id] * 2 + 1];
|
= __powf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)] * 2, 2)
|
||||||
|
* anchors[mask[z_id] * 2 + 1];
|
||||||
|
|
||||||
const float objectness
|
const float objectness
|
||||||
= input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)];
|
= input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)];
|
||||||
@@ -64,24 +70,28 @@ __global__ void gpuYoloLayer_nc(const float* input, float* output, const uint gr
|
|||||||
= maxIndex;
|
= maxIndex;
|
||||||
}
|
}
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_nc(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer_nc(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask);
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_nc(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer_nc(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask)
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
|
||||||
{
|
{
|
||||||
dim3 threads_per_block(16, 16, 4);
|
dim3 threads_per_block(16, 16, 4);
|
||||||
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
||||||
(gridSizeY / threads_per_block.y) + 1,
|
(gridSizeY / threads_per_block.y) + 1,
|
||||||
(numBBoxes / threads_per_block.z) + 1);
|
(numBBoxes / threads_per_block.z) + 1);
|
||||||
|
|
||||||
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
||||||
{
|
{
|
||||||
gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
gpuYoloLayer_nc<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
||||||
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
||||||
reinterpret_cast<float*>(output) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
|
reinterpret_cast<float*>(output) + (batch * outputSize),
|
||||||
numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY,
|
||||||
|
reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
||||||
}
|
}
|
||||||
return cudaGetLastError();
|
return cudaGetLastError();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -11,8 +11,10 @@
|
|||||||
|
|
||||||
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
||||||
|
|
||||||
__global__ void gpuYoloLayer_r(const float* input, float* output, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
|
__global__ void gpuYoloLayer_r(
|
||||||
const uint numBBoxes, const float scaleXY, const float* anchors, const int* mask)
|
const float* input, float* output, const uint netWidth, const uint netHeight, const uint gridSizeX,
|
||||||
|
const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes, const float scaleXY,
|
||||||
|
const float* anchors, const int* mask)
|
||||||
{
|
{
|
||||||
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
||||||
@@ -30,16 +32,20 @@ __global__ void gpuYoloLayer_r(const float* input, float* output, const uint gri
|
|||||||
const float beta = -0.5 * (scaleXY - 1);
|
const float beta = -0.5 * (scaleXY - 1);
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) * alpha + beta + x_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)])
|
||||||
|
* alpha + beta + x_id) * netWidth / gridSizeX;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) * alpha + beta + y_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)])
|
||||||
|
* alpha + beta + y_id) * netHeight / gridSizeY;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
||||||
= __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * 2, 2) * anchors[mask[z_id] * 2];
|
= __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * 2, 2)
|
||||||
|
* anchors[mask[z_id] * 2];
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
||||||
= __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * 2, 2) * anchors[mask[z_id] * 2 + 1];
|
= __powf(sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * 2, 2)
|
||||||
|
* anchors[mask[z_id] * 2 + 1];
|
||||||
|
|
||||||
const float objectness
|
const float objectness
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
|
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 4)]);
|
||||||
@@ -66,24 +72,28 @@ __global__ void gpuYoloLayer_r(const float* input, float* output, const uint gri
|
|||||||
= maxIndex;
|
= maxIndex;
|
||||||
}
|
}
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_r(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer_r(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask);
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_r(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaYoloLayer_r(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const float scaleXY, const void* anchors, const void* mask)
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream)
|
||||||
{
|
{
|
||||||
dim3 threads_per_block(16, 16, 4);
|
dim3 threads_per_block(16, 16, 4);
|
||||||
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
||||||
(gridSizeY / threads_per_block.y) + 1,
|
(gridSizeY / threads_per_block.y) + 1,
|
||||||
(numBBoxes / threads_per_block.z) + 1);
|
(numBBoxes / threads_per_block.z) + 1);
|
||||||
|
|
||||||
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
||||||
{
|
{
|
||||||
gpuYoloLayer_r<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
gpuYoloLayer_r<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
||||||
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
||||||
reinterpret_cast<float*>(output) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
|
reinterpret_cast<float*>(output) + (batch * outputSize),
|
||||||
numBBoxes, scaleXY, reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes, scaleXY,
|
||||||
|
reinterpret_cast<const float*>(anchors), reinterpret_cast<const int*>(mask));
|
||||||
}
|
}
|
||||||
return cudaGetLastError();
|
return cudaGetLastError();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -11,8 +11,9 @@
|
|||||||
|
|
||||||
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
|
||||||
|
|
||||||
__device__ void softmaxGPU(const float* input, const int bbindex, const int numGridCells,
|
__device__ void softmaxGPU(
|
||||||
uint z_id, const uint numOutputClasses, float temp, float* output)
|
const float* input, const int bbindex, const int numGridCells, uint z_id,
|
||||||
|
const uint numOutputClasses, float temp, float* output)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
float sum = 0;
|
float sum = 0;
|
||||||
@@ -31,8 +32,10 @@ __device__ void softmaxGPU(const float* input, const int bbindex, const int numG
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
__global__ void gpuRegionLayer(const float* input, float* output, float* softmax, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
|
__global__ void gpuRegionLayer(
|
||||||
const uint numBBoxes, const float* anchors)
|
const float* input, float* output, float* softmax, const uint netWidth, const uint netHeight,
|
||||||
|
const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses, const uint numBBoxes,
|
||||||
|
const float* anchors)
|
||||||
{
|
{
|
||||||
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
|
||||||
@@ -47,16 +50,20 @@ __global__ void gpuRegionLayer(const float* input, float* output, float* softmax
|
|||||||
const int bbindex = y_id * gridSizeX + x_id;
|
const int bbindex = y_id * gridSizeX + x_id;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]) + x_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)])
|
||||||
|
+ x_id) * netWidth / gridSizeX;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]
|
||||||
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)]) + y_id;
|
= (sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 1)])
|
||||||
|
+ y_id) * netHeight / gridSizeY;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]
|
||||||
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)]) * anchors[z_id * 2];
|
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 2)])
|
||||||
|
* anchors[z_id * 2] * netWidth / gridSizeX;
|
||||||
|
|
||||||
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]
|
||||||
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)]) * anchors[z_id * 2 + 1];
|
= __expf(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 3)])
|
||||||
|
* anchors[z_id * 2 + 1] * netHeight / gridSizeY;
|
||||||
|
|
||||||
softmaxGPU(input, bbindex, numGridCells, z_id, numOutputClasses, 1.0, softmax);
|
softmaxGPU(input, bbindex, numGridCells, z_id, numOutputClasses, 1.0, softmax);
|
||||||
|
|
||||||
@@ -85,25 +92,29 @@ __global__ void gpuRegionLayer(const float* input, float* output, float* softmax
|
|||||||
= maxIndex;
|
= maxIndex;
|
||||||
}
|
}
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_v2(const void* input, void* output, void* softmax, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaRegionLayer(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, void* softmax, const uint& batchSize, const uint& netWidth,
|
||||||
const void* anchors);
|
const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
||||||
|
const uint& numBBoxes, uint64_t& outputSize, const void* anchors, cudaStream_t stream);
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_v2(const void* input, void* output, void* softmax, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
|
cudaError_t cudaRegionLayer(
|
||||||
const uint& numOutputClasses, const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream,
|
const void* input, void* output, void* softmax, const uint& batchSize, const uint& netWidth,
|
||||||
const void* anchors)
|
const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
||||||
|
const uint& numBBoxes, uint64_t& outputSize, const void* anchors, cudaStream_t stream)
|
||||||
{
|
{
|
||||||
dim3 threads_per_block(16, 16, 4);
|
dim3 threads_per_block(16, 16, 4);
|
||||||
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
|
||||||
(gridSizeY / threads_per_block.y) + 1,
|
(gridSizeY / threads_per_block.y) + 1,
|
||||||
(numBBoxes / threads_per_block.z) + 1);
|
(numBBoxes / threads_per_block.z) + 1);
|
||||||
|
|
||||||
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
for (unsigned int batch = 0; batch < batchSize; ++batch)
|
||||||
{
|
{
|
||||||
gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
gpuRegionLayer<<<number_of_blocks, threads_per_block, 0, stream>>>(
|
||||||
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
reinterpret_cast<const float*>(input) + (batch * outputSize),
|
||||||
reinterpret_cast<float*>(output) + (batch * outputSize),
|
reinterpret_cast<float*>(output) + (batch * outputSize),
|
||||||
reinterpret_cast<float*>(softmax) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
|
reinterpret_cast<float*>(softmax) + (batch * outputSize),
|
||||||
numBBoxes, reinterpret_cast<const float*>(anchors));
|
netWidth, netHeight, gridSizeX, gridSizeY, numOutputClasses, numBBoxes,
|
||||||
|
reinterpret_cast<const float*>(anchors));
|
||||||
}
|
}
|
||||||
return cudaGetLastError();
|
return cudaGetLastError();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -29,10 +29,8 @@
|
|||||||
#include <iostream>
|
#include <iostream>
|
||||||
#include <memory>
|
#include <memory>
|
||||||
|
|
||||||
int kMODEL_TYPE;
|
uint kNUM_BBOXES;
|
||||||
int kNUM_BBOXES;
|
uint kNUM_CLASSES;
|
||||||
int kNUM_CLASSES;
|
|
||||||
float kBETA_NMS;
|
|
||||||
|
|
||||||
namespace {
|
namespace {
|
||||||
template <typename T>
|
template <typename T>
|
||||||
@@ -50,42 +48,40 @@ namespace {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_r (
|
cudaError_t cudaYoloLayer_r(
|
||||||
const void* input, void* output, const uint& batchSize,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream, const float scaleXY,
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
const void* anchors, const void* mask);
|
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_nc (
|
cudaError_t cudaYoloLayer_nc(
|
||||||
const void* input, void* output, const uint& batchSize,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream, const float scaleXY,
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
const void* anchors, const void* mask);
|
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer (
|
cudaError_t cudaYoloLayer(
|
||||||
const void* input, void* output, const uint& batchSize,
|
const void* input, void* output, const uint& batchSize, const uint& netWidth, const uint& netHeight,
|
||||||
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses, const uint& numBBoxes,
|
||||||
const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream, const float scaleXY,
|
uint64_t& outputSize, const float& scaleXY, const void* anchors, const void* mask, cudaStream_t stream);
|
||||||
const void* anchors, const void* mask);
|
|
||||||
|
|
||||||
cudaError_t cudaYoloLayer_v2 (
|
cudaError_t cudaRegionLayer(
|
||||||
const void* input, void* output, void* softmax, const uint& batchSize,
|
const void* input, void* output, void* softmax, const uint& batchSize, const uint& netWidth,
|
||||||
const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
const uint& netHeight, const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
|
||||||
const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream, const void* anchors);
|
const uint& numBBoxes, uint64_t& outputSize, const void* anchors, cudaStream_t stream);
|
||||||
|
|
||||||
YoloLayer::YoloLayer (const void* data, size_t length)
|
YoloLayer::YoloLayer (const void* data, size_t length)
|
||||||
{
|
{
|
||||||
const char *d = static_cast<const char*>(data);
|
const char *d = static_cast<const char*>(data);
|
||||||
read(d, m_NumBoxes);
|
|
||||||
|
read(d, m_NumBBoxes);
|
||||||
read(d, m_NumClasses);
|
read(d, m_NumClasses);
|
||||||
|
read(d, m_NetWidth);
|
||||||
|
read(d, m_NetHeight);
|
||||||
read(d, m_GridSizeX);
|
read(d, m_GridSizeX);
|
||||||
read(d, m_GridSizeY);
|
read(d, m_GridSizeY);
|
||||||
read(d, m_OutputSize);
|
|
||||||
|
|
||||||
read(d, m_Type);
|
read(d, m_Type);
|
||||||
read(d, m_NewCoords);
|
read(d, m_NewCoords);
|
||||||
read(d, m_ScaleXY);
|
read(d, m_ScaleXY);
|
||||||
read(d, m_BetaNMS);
|
read(d, m_OutputSize);
|
||||||
|
|
||||||
uint anchorsSize;
|
uint anchorsSize;
|
||||||
read(d, anchorsSize);
|
read(d, anchorsSize);
|
||||||
@@ -103,63 +99,71 @@ YoloLayer::YoloLayer (const void* data, size_t length)
|
|||||||
m_Mask.push_back(result);
|
m_Mask.push_back(result);
|
||||||
}
|
}
|
||||||
|
|
||||||
kMODEL_TYPE = m_Type;
|
|
||||||
kNUM_BBOXES = m_NumBoxes;
|
|
||||||
kNUM_CLASSES = m_NumClasses;
|
|
||||||
kBETA_NMS = m_BetaNMS;
|
|
||||||
|
|
||||||
if (m_Anchors.size() > 0) {
|
if (m_Anchors.size() > 0) {
|
||||||
float* m_anchors = m_Anchors.data();
|
float* anchors = m_Anchors.data();
|
||||||
CHECK(cudaMallocHost(&mAnchors, m_Anchors.size() * sizeof(float)));
|
CUDA_CHECK(cudaMallocHost(&p_Anchors, m_Anchors.size() * sizeof(float)));
|
||||||
CHECK(cudaMemcpy(mAnchors, m_anchors, m_Anchors.size() * sizeof(float), cudaMemcpyHostToDevice));
|
CUDA_CHECK(cudaMemcpy(p_Anchors, anchors, m_Anchors.size() * sizeof(float), cudaMemcpyHostToDevice));
|
||||||
}
|
}
|
||||||
|
|
||||||
if (m_Mask.size() > 0) {
|
if (m_Mask.size() > 0) {
|
||||||
int* m_mask = m_Mask.data();
|
int* mask = m_Mask.data();
|
||||||
CHECK(cudaMallocHost(&mMask, m_Mask.size() * sizeof(int)));
|
CUDA_CHECK(cudaMallocHost(&p_Mask, m_Mask.size() * sizeof(int)));
|
||||||
CHECK(cudaMemcpy(mMask, m_mask, m_Mask.size() * sizeof(int), cudaMemcpyHostToDevice));
|
CUDA_CHECK(cudaMemcpy(p_Mask, mask, m_Mask.size() * sizeof(int), cudaMemcpyHostToDevice));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
kNUM_BBOXES = m_NumBBoxes;
|
||||||
|
kNUM_CLASSES = m_NumClasses;
|
||||||
};
|
};
|
||||||
|
|
||||||
YoloLayer::YoloLayer (
|
YoloLayer::YoloLayer (
|
||||||
const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY, const uint modelType, const uint newCoords, const float scaleXY, const float betaNMS, const std::vector<float> anchors, std::vector<int> mask) :
|
const uint& numBBoxes, const uint& numClasses, const uint& netWidth, const uint& netHeight,
|
||||||
m_NumBoxes(numBoxes),
|
const uint& gridSizeX, const uint& gridSizeY, const uint& modelType, const uint& newCoords,
|
||||||
|
const float& scaleXY, const std::vector<float> anchors,
|
||||||
|
const std::vector<int> mask) :
|
||||||
|
m_NumBBoxes(numBBoxes),
|
||||||
m_NumClasses(numClasses),
|
m_NumClasses(numClasses),
|
||||||
|
m_NetWidth(netWidth),
|
||||||
|
m_NetHeight(netHeight),
|
||||||
m_GridSizeX(gridSizeX),
|
m_GridSizeX(gridSizeX),
|
||||||
m_GridSizeY(gridSizeY),
|
m_GridSizeY(gridSizeY),
|
||||||
m_Type(modelType),
|
m_Type(modelType),
|
||||||
m_NewCoords(newCoords),
|
m_NewCoords(newCoords),
|
||||||
m_ScaleXY(scaleXY),
|
m_ScaleXY(scaleXY),
|
||||||
m_BetaNMS(betaNMS),
|
|
||||||
m_Anchors(anchors),
|
m_Anchors(anchors),
|
||||||
m_Mask(mask)
|
m_Mask(mask)
|
||||||
{
|
{
|
||||||
assert(m_NumBoxes > 0);
|
assert(m_NumBBoxes > 0);
|
||||||
assert(m_NumClasses > 0);
|
assert(m_NumClasses > 0);
|
||||||
|
assert(m_NetWidth > 0);
|
||||||
|
assert(m_NetHeight > 0);
|
||||||
assert(m_GridSizeX > 0);
|
assert(m_GridSizeX > 0);
|
||||||
assert(m_GridSizeY > 0);
|
assert(m_GridSizeY > 0);
|
||||||
m_OutputSize = m_GridSizeX * m_GridSizeY * (m_NumBoxes * (4 + 1 + m_NumClasses));
|
|
||||||
|
m_OutputSize = m_GridSizeX * m_GridSizeY * (m_NumBBoxes * (4 + 1 + m_NumClasses));
|
||||||
|
|
||||||
if (m_Anchors.size() > 0) {
|
if (m_Anchors.size() > 0) {
|
||||||
float* m_anchors = m_Anchors.data();
|
float* anchors = m_Anchors.data();
|
||||||
CHECK(cudaMallocHost(&mAnchors, m_Anchors.size() * sizeof(float)));
|
CUDA_CHECK(cudaMallocHost(&p_Anchors, m_Anchors.size() * sizeof(float)));
|
||||||
CHECK(cudaMemcpy(mAnchors, m_anchors, m_Anchors.size() * sizeof(float), cudaMemcpyHostToDevice));
|
CUDA_CHECK(cudaMemcpy(p_Anchors, anchors, m_Anchors.size() * sizeof(float), cudaMemcpyHostToDevice));
|
||||||
}
|
}
|
||||||
|
|
||||||
if (m_Mask.size() > 0) {
|
if (m_Mask.size() > 0) {
|
||||||
int* m_mask = m_Mask.data();
|
int* mask = m_Mask.data();
|
||||||
CHECK(cudaMallocHost(&mMask, m_Mask.size() * sizeof(int)));
|
CUDA_CHECK(cudaMallocHost(&p_Mask, m_Mask.size() * sizeof(int)));
|
||||||
CHECK(cudaMemcpy(mMask, m_mask, m_Mask.size() * sizeof(int), cudaMemcpyHostToDevice));
|
CUDA_CHECK(cudaMemcpy(p_Mask, mask, m_Mask.size() * sizeof(int), cudaMemcpyHostToDevice));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
kNUM_BBOXES = m_NumBBoxes;
|
||||||
|
kNUM_CLASSES = m_NumClasses;
|
||||||
};
|
};
|
||||||
|
|
||||||
YoloLayer::~YoloLayer()
|
YoloLayer::~YoloLayer()
|
||||||
{
|
{
|
||||||
if (m_Anchors.size() > 0) {
|
if (m_Anchors.size() > 0) {
|
||||||
CHECK(cudaFreeHost(mAnchors));
|
CUDA_CHECK(cudaFreeHost(p_Anchors));
|
||||||
}
|
}
|
||||||
if (m_Mask.size() > 0) {
|
if (m_Mask.size() > 0) {
|
||||||
CHECK(cudaFreeHost(mMask));
|
CUDA_CHECK(cudaFreeHost(p_Mask));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -185,73 +189,79 @@ YoloLayer::configureWithFormat (
|
|||||||
nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) noexcept
|
nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) noexcept
|
||||||
{
|
{
|
||||||
assert(nbInputs == 1);
|
assert(nbInputs == 1);
|
||||||
assert (format == nvinfer1::PluginFormat::kLINEAR);
|
assert(format == nvinfer1::PluginFormat::kLINEAR);
|
||||||
assert(inputDims != nullptr);
|
assert(inputDims != nullptr);
|
||||||
}
|
}
|
||||||
|
|
||||||
int YoloLayer::enqueue(
|
int YoloLayer::enqueue (
|
||||||
int batchSize, void const* const* inputs, void* const* outputs, void* workspace,
|
int batchSize, void const* const* inputs, void* const* outputs, void* workspace,
|
||||||
cudaStream_t stream) noexcept
|
cudaStream_t stream) noexcept
|
||||||
{
|
{
|
||||||
if (m_Type == 2) { // YOLOR incorrect param: scale_x_y = 2.0
|
if (m_Type == 2) { // YOLOR incorrect param: scale_x_y = 2.0
|
||||||
CHECK(cudaYoloLayer_r(
|
CUDA_CHECK(cudaYoloLayer_r(
|
||||||
inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
|
inputs[0], outputs[0], batchSize, m_NetWidth, m_NetHeight, m_GridSizeX, m_GridSizeY,
|
||||||
m_OutputSize, stream, 2.0, mAnchors, mMask));
|
m_NumClasses, m_NumBBoxes, m_OutputSize, 2.0, p_Anchors, p_Mask, stream));
|
||||||
}
|
}
|
||||||
else if (m_Type == 1) {
|
else if (m_Type == 1) {
|
||||||
if (m_NewCoords) {
|
if (m_NewCoords) {
|
||||||
CHECK(cudaYoloLayer_nc(
|
CUDA_CHECK(cudaYoloLayer_nc(
|
||||||
inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
|
inputs[0], outputs[0], batchSize, m_NetWidth, m_NetHeight, m_GridSizeX, m_GridSizeY,
|
||||||
m_OutputSize, stream, m_ScaleXY, mAnchors, mMask));
|
m_NumClasses, m_NumBBoxes, m_OutputSize, m_ScaleXY, p_Anchors, p_Mask, stream));
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
CHECK(cudaYoloLayer(
|
CUDA_CHECK(cudaYoloLayer(
|
||||||
inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
|
inputs[0], outputs[0], batchSize, m_NetWidth, m_NetHeight, m_GridSizeX, m_GridSizeY,
|
||||||
m_OutputSize, stream, m_ScaleXY, mAnchors, mMask));
|
m_NumClasses, m_NumBBoxes, m_OutputSize, m_ScaleXY, p_Anchors, p_Mask, stream));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
else {
|
else {
|
||||||
void* softmax;
|
void* softmax;
|
||||||
CHECK(cudaMallocHost(&softmax, sizeof(outputs[0])));
|
cudaMallocHost(&softmax, sizeof(outputs[0]));
|
||||||
CHECK(cudaMemcpy(softmax, outputs[0], sizeof(outputs[0]), cudaMemcpyHostToDevice));
|
cudaMemcpy(softmax, outputs[0], sizeof(outputs[0]), cudaMemcpyHostToDevice);
|
||||||
|
|
||||||
CHECK(cudaYoloLayer_v2(
|
CUDA_CHECK(cudaRegionLayer(
|
||||||
inputs[0], outputs[0], softmax, batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
|
inputs[0], outputs[0], softmax, batchSize, m_NetWidth, m_NetHeight, m_GridSizeX, m_GridSizeY,
|
||||||
m_OutputSize, stream, mAnchors));
|
m_NumClasses, m_NumBBoxes, m_OutputSize, p_Anchors, stream));
|
||||||
|
|
||||||
CHECK(cudaFreeHost(softmax));
|
CUDA_CHECK(cudaFreeHost(softmax));
|
||||||
}
|
}
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
size_t YoloLayer::getSerializationSize() const noexcept
|
size_t YoloLayer::getSerializationSize() const noexcept
|
||||||
{
|
{
|
||||||
int anchorsSum = 1;
|
size_t totalSize = 0;
|
||||||
for (uint i = 0; i < m_Anchors.size(); i++) {
|
|
||||||
anchorsSum += 1;
|
|
||||||
}
|
|
||||||
int maskSum = 1;
|
|
||||||
for (uint i = 0; i < m_Mask.size(); i++) {
|
|
||||||
maskSum += 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
return sizeof(m_NumBoxes) + sizeof(m_NumClasses) + sizeof(m_GridSizeX) + sizeof(m_GridSizeY) + sizeof(m_OutputSize) + sizeof(m_Type)
|
totalSize += sizeof(m_NumBBoxes);
|
||||||
+ sizeof(m_NewCoords) + sizeof(m_ScaleXY) + sizeof(m_BetaNMS) + anchorsSum * sizeof(float) + maskSum * sizeof(int);
|
totalSize += sizeof(m_NumClasses);
|
||||||
|
totalSize += sizeof(m_NetWidth);
|
||||||
|
totalSize += sizeof(m_NetHeight);
|
||||||
|
totalSize += sizeof(m_GridSizeX);
|
||||||
|
totalSize += sizeof(m_GridSizeY);
|
||||||
|
totalSize += sizeof(m_Type);
|
||||||
|
totalSize += sizeof(m_NewCoords);
|
||||||
|
totalSize += sizeof(m_ScaleXY);
|
||||||
|
totalSize += sizeof(m_OutputSize);
|
||||||
|
totalSize += sizeof(uint) + sizeof(m_Anchors[0]) * m_Anchors.size();
|
||||||
|
totalSize += sizeof(uint) + sizeof(m_Mask[0]) * m_Mask.size();
|
||||||
|
|
||||||
|
return totalSize;
|
||||||
}
|
}
|
||||||
|
|
||||||
void YoloLayer::serialize(void* buffer) const noexcept
|
void YoloLayer::serialize(void* buffer) const noexcept
|
||||||
{
|
{
|
||||||
char *d = static_cast<char*>(buffer);
|
char *d = static_cast<char*>(buffer);
|
||||||
write(d, m_NumBoxes);
|
|
||||||
|
write(d, m_NumBBoxes);
|
||||||
write(d, m_NumClasses);
|
write(d, m_NumClasses);
|
||||||
|
write(d, m_NetWidth);
|
||||||
|
write(d, m_NetHeight);
|
||||||
write(d, m_GridSizeX);
|
write(d, m_GridSizeX);
|
||||||
write(d, m_GridSizeY);
|
write(d, m_GridSizeY);
|
||||||
write(d, m_OutputSize);
|
|
||||||
|
|
||||||
write(d, m_Type);
|
write(d, m_Type);
|
||||||
write(d, m_NewCoords);
|
write(d, m_NewCoords);
|
||||||
write(d, m_ScaleXY);
|
write(d, m_ScaleXY);
|
||||||
write(d, m_BetaNMS);
|
write(d, m_OutputSize);
|
||||||
|
|
||||||
uint anchorsSize = m_Anchors.size();
|
uint anchorsSize = m_Anchors.size();
|
||||||
write(d, anchorsSize);
|
write(d, anchorsSize);
|
||||||
@@ -264,16 +274,13 @@ void YoloLayer::serialize(void* buffer) const noexcept
|
|||||||
for (uint i = 0; i < maskSize; i++) {
|
for (uint i = 0; i < maskSize; i++) {
|
||||||
write(d, m_Mask[i]);
|
write(d, m_Mask[i]);
|
||||||
}
|
}
|
||||||
|
|
||||||
kMODEL_TYPE = m_Type;
|
|
||||||
kNUM_BBOXES = m_NumBoxes;
|
|
||||||
kNUM_CLASSES = m_NumClasses;
|
|
||||||
kBETA_NMS = m_BetaNMS;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
nvinfer1::IPluginV2* YoloLayer::clone() const noexcept
|
nvinfer1::IPluginV2* YoloLayer::clone() const noexcept
|
||||||
{
|
{
|
||||||
return new YoloLayer (m_NumBoxes, m_NumClasses, m_GridSizeX, m_GridSizeY, m_Type, m_NewCoords, m_ScaleXY, m_BetaNMS, m_Anchors, m_Mask);
|
return new YoloLayer (
|
||||||
|
m_NumBBoxes, m_NumClasses, m_NetWidth, m_NetHeight, m_GridSizeX, m_GridSizeY, m_Type,
|
||||||
|
m_NewCoords, m_ScaleXY, m_Anchors, m_Mask);
|
||||||
}
|
}
|
||||||
|
|
||||||
REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator);
|
REGISTER_TENSORRT_PLUGIN(YoloLayerPluginCreator);
|
||||||
|
|||||||
@@ -18,7 +18,7 @@
|
|||||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||||
* DEALINGS IN THE SOFTWARE.
|
* DEALINGS IN THE SOFTWARE.
|
||||||
|
*
|
||||||
* Edited by Marcos Luciano
|
* Edited by Marcos Luciano
|
||||||
* https://www.github.com/marcoslucianops
|
* https://www.github.com/marcoslucianops
|
||||||
*/
|
*/
|
||||||
@@ -36,7 +36,7 @@
|
|||||||
|
|
||||||
#include "NvInferPlugin.h"
|
#include "NvInferPlugin.h"
|
||||||
|
|
||||||
#define CHECK(status) \
|
#define CUDA_CHECK(status) \
|
||||||
{ \
|
{ \
|
||||||
if (status != 0) \
|
if (status != 0) \
|
||||||
{ \
|
{ \
|
||||||
@@ -56,9 +56,11 @@ class YoloLayer : public nvinfer1::IPluginV2
|
|||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
YoloLayer (const void* data, size_t length);
|
YoloLayer (const void* data, size_t length);
|
||||||
YoloLayer (const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY,
|
YoloLayer (
|
||||||
const uint modelType, const uint newCoords, const float scaleXY, const float betaNMS,
|
const uint& numBBoxes, const uint& numClasses, const uint& netWidth, const uint& netHeight,
|
||||||
const std::vector<float> anchors, const std::vector<int> mask);
|
const uint& gridSizeX, const uint& gridSizeY, const uint& modelType, const uint& newCoords,
|
||||||
|
const float& scaleXY, const std::vector<float> anchors,
|
||||||
|
const std::vector<int> mask);
|
||||||
~YoloLayer ();
|
~YoloLayer ();
|
||||||
const char* getPluginType () const noexcept override { return YOLOLAYER_PLUGIN_NAME; }
|
const char* getPluginType () const noexcept override { return YOLOLAYER_PLUGIN_NAME; }
|
||||||
const char* getPluginVersion () const noexcept override { return YOLOLAYER_PLUGIN_VERSION; }
|
const char* getPluginVersion () const noexcept override { return YOLOLAYER_PLUGIN_VERSION; }
|
||||||
@@ -95,22 +97,21 @@ public:
|
|||||||
}
|
}
|
||||||
|
|
||||||
private:
|
private:
|
||||||
uint m_NumBoxes {0};
|
std::string m_Namespace {""};
|
||||||
|
uint m_NumBBoxes {0};
|
||||||
uint m_NumClasses {0};
|
uint m_NumClasses {0};
|
||||||
|
uint m_NetWidth {0};
|
||||||
|
uint m_NetHeight {0};
|
||||||
uint m_GridSizeX {0};
|
uint m_GridSizeX {0};
|
||||||
uint m_GridSizeY {0};
|
uint m_GridSizeY {0};
|
||||||
uint64_t m_OutputSize {0};
|
|
||||||
std::string m_Namespace {""};
|
|
||||||
|
|
||||||
uint m_Type {0};
|
uint m_Type {0};
|
||||||
uint m_NewCoords {0};
|
uint m_NewCoords {0};
|
||||||
float m_ScaleXY {0};
|
float m_ScaleXY {0};
|
||||||
float m_BetaNMS {0};
|
|
||||||
std::vector<float> m_Anchors;
|
std::vector<float> m_Anchors;
|
||||||
std::vector<int> m_Mask;
|
std::vector<int> m_Mask;
|
||||||
|
uint64_t m_OutputSize {0};
|
||||||
void* mAnchors;
|
void* p_Anchors;
|
||||||
void* mMask;
|
void* p_Mask;
|
||||||
};
|
};
|
||||||
|
|
||||||
class YoloLayerPluginCreator : public nvinfer1::IPluginCreator
|
class YoloLayerPluginCreator : public nvinfer1::IPluginCreator
|
||||||
@@ -152,9 +153,7 @@ private:
|
|||||||
std::string m_Namespace {""};
|
std::string m_Namespace {""};
|
||||||
};
|
};
|
||||||
|
|
||||||
extern int kMODEL_TYPE;
|
extern uint kNUM_BBOXES;
|
||||||
extern int kNUM_BBOXES;
|
extern uint kNUM_CLASSES;
|
||||||
extern int kNUM_CLASSES;
|
|
||||||
extern float kBETA_NMS;
|
|
||||||
|
|
||||||
#endif // __YOLO_PLUGINS__
|
#endif // __YOLO_PLUGINS__
|
||||||
|
|||||||
26
readme.md
26
readme.md
@@ -2,12 +2,11 @@
|
|||||||
|
|
||||||
NVIDIA DeepStream SDK 6.0 configuration for YOLO models
|
NVIDIA DeepStream SDK 6.0 configuration for YOLO models
|
||||||
|
|
||||||
### Future updates (comming soon, stay tuned)
|
### Future updates
|
||||||
|
|
||||||
* New documentation for multiple models
|
* New documentation for multiple models
|
||||||
* DeepStream tutorials
|
* DeepStream tutorials
|
||||||
* Native PP-YOLO support
|
* Native PP-YOLO support
|
||||||
* GPU NMS [#71](https://github.com/marcoslucianops/DeepStream-Yolo/issues/71#issuecomment-1046270722])
|
|
||||||
* Dynamic batch-size
|
* Dynamic batch-size
|
||||||
|
|
||||||
### Improvements on this repository
|
### Improvements on this repository
|
||||||
@@ -23,8 +22,9 @@ NVIDIA DeepStream SDK 6.0 configuration for YOLO models
|
|||||||
* Support for reorg, implicit and channel layers (YOLOR)
|
* Support for reorg, implicit and channel layers (YOLOR)
|
||||||
* YOLOv5 6.0 native support
|
* YOLOv5 6.0 native support
|
||||||
* YOLOR native support
|
* YOLOR native support
|
||||||
* Models benchmarks
|
* Models benchmarks (**outdated**)
|
||||||
* **GPU YOLO Decoder (moved from CPU to GPU to get better performance)** [#138](https://github.com/marcoslucianops/DeepStream-Yolo/issues/138)
|
* **GPU YOLO Decoder (moved from CPU to GPU to get better performance)** [#138](https://github.com/marcoslucianops/DeepStream-Yolo/issues/138)
|
||||||
|
* **Improved NMS** [#71](https://github.com/marcoslucianops/DeepStream-Yolo/issues/142)
|
||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
@@ -316,9 +316,9 @@ CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
|
|||||||
...
|
...
|
||||||
# 0=RGB, 1=BGR, 2=GRAYSCALE
|
# 0=RGB, 1=BGR, 2=GRAYSCALE
|
||||||
model-color-format=0
|
model-color-format=0
|
||||||
# CFG
|
# YOLO cfg
|
||||||
custom-network-config=yolov4.cfg
|
custom-network-config=yolov4.cfg
|
||||||
# Weights
|
# YOLO weights
|
||||||
model-file=yolov4.weights
|
model-file=yolov4.weights
|
||||||
# Generated TensorRT model (will be created if it doesn't exist)
|
# Generated TensorRT model (will be created if it doesn't exist)
|
||||||
model-engine-file=model_b1_gpu0_fp32.engine
|
model-engine-file=model_b1_gpu0_fp32.engine
|
||||||
@@ -332,7 +332,9 @@ network-mode=0
|
|||||||
num-detected-classes=80
|
num-detected-classes=80
|
||||||
...
|
...
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
# CONF_THRESH
|
# IOU threshold
|
||||||
|
nms-iou-threshold=0.6
|
||||||
|
# Socre threshold
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -354,8 +356,6 @@ nvbuf-memory-type=0
|
|||||||
config-file=config_infer_primary_yoloV2.txt
|
config-file=config_infer_primary_yoloV2.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
**NOTE**: The config_infer_primary.txt file uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while the config_infer_primary_yoloV2.txt file uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.
|
|
||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
### YOLOv5 usage
|
### YOLOv5 usage
|
||||||
@@ -417,7 +417,9 @@ network-mode=0
|
|||||||
num-detected-classes=80
|
num-detected-classes=80
|
||||||
...
|
...
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
# CONF_THRESH
|
# IOU threshold
|
||||||
|
nms-iou-threshold=0.6
|
||||||
|
# Socre threshold
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -534,7 +536,9 @@ network-mode=0
|
|||||||
num-detected-classes=80
|
num-detected-classes=80
|
||||||
...
|
...
|
||||||
[class-attrs-all]
|
[class-attrs-all]
|
||||||
# CONF_THRESH
|
# IOU threshold
|
||||||
|
nms-iou-threshold=0.6
|
||||||
|
# Socre threshold
|
||||||
pre-cluster-threshold=0.25
|
pre-cluster-threshold=0.25
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -652,4 +656,4 @@ In C++/Python deepstream-test application, your code need be in osd_sink_pad_buf
|
|||||||
|
|
||||||
##
|
##
|
||||||
|
|
||||||
My projects: https://www.youtube.com/MarcosLucianoTV (new videos and tutorials comming soon)
|
My projects: https://www.youtube.com/MarcosLucianoTV
|
||||||
|
|||||||
Reference in New Issue
Block a user