Loading…

An improved multi-scale and knowledge distillation method for efficient pedestrian detection in dense scenes

Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians h...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of real-time image processing 2024-08, Vol.21 (4), p.126, Article 126
Main Authors:	Xu, Yanxiang, Wen, Mi, He, Wei, Wang, Hongwei, Xue, Yunsheng
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Attention Computer Graphics Computer Science Computer vision Convolution Datasets Design Distillation Effectiveness Efficiency Feature extraction Image Processing and Computer Vision Lightweight Modules Multilayers Multimedia Information Systems Neural networks Object recognition Occlusion Parameters Pattern Recognition Pedestrians Real time Semantics Sensors Signal,Image and Speech Processing Target detection Weight reduction
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios.
ISSN:	1861-8200 1861-8219
DOI:	10.1007/s11554-024-01507-8