Loading…
An improved multi-scale and knowledge distillation method for efficient pedestrian detection in dense scenes
Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians h...
Saved in:
Published in: | Journal of real-time image processing 2024-08, Vol.21 (4), p.126, Article 126 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios. |
---|---|
ISSN: | 1861-8200 1861-8219 |
DOI: | 10.1007/s11554-024-01507-8 |