Loading…

Adaptive Inattentional Framework for Video Object Detection With Reward-Conditional Training

Recent object detection studies have been focused on video sequences, mostly due to the increasing demand of industrial applications. Although single-image architectures achieve remarkable results in terms of accuracy, they do not take advantage of particular properties of the video sequences and us...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020, Vol.8, p.124451-124466
Main Authors: Rodriguez-Ramos, Alejandro, Rodriguez-Vazquez, Javier, Sampedro, Carlos, Campoy, Pascual
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent object detection studies have been focused on video sequences, mostly due to the increasing demand of industrial applications. Although single-image architectures achieve remarkable results in terms of accuracy, they do not take advantage of particular properties of the video sequences and usually require high parallel computational resources, such as desktop GPUs. In this work, an inattentional framework is proposed, where the object context in video frames is dynamically reused in order to reduce the computation overhead. The context features corresponding to keyframes are fused into a synthetic feature map, which is further refined using temporal aggregation with ConvLSTMs. Furthermore, an inattentional policy has been learned to adaptively balance the accuracy and the amount of context reused. The inattentional policy has been learned under the reinforcement learning paradigm, and using our novel reward-conditional training scheme, which allows for policy training over a whole distribution of reward functions and enables the selection of a unique reward function at inference time. Our framework shows outstanding results on platforms with reduced parallelization capabilities, such as CPUs, achieving an average latency reduction up to 2.09\times , and obtaining FPS rates similar to their equivalent GPU platform, at the cost of a 1.11\times mAP reduction.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3006191