Loading…

Visible-Infrared Dual-Sensor Tracking Based on Transformer via Progressive Feature Enhancement and Fusion

This article investigates how to implement accurate RGB-T tracking by achieving effective feature enhancement of the target and adaptive fusion of the complementary information in RGB and thermal infrared modalities. Inspired by the excellent long-range dependency modeling ability of the transformer...

Full description

Saved in:
Bibliographic Details
Published in:IEEE sensors journal 2024-05, Vol.24 (9), p.14519-14528
Main Authors: Kuai, Yangliu, Li, Dongdong, Gao, Zhinan, Yuan, Mingwei, Zhang, Da
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article investigates how to implement accurate RGB-T tracking by achieving effective feature enhancement of the target and adaptive fusion of the complementary information in RGB and thermal infrared modalities. Inspired by the excellent long-range dependency modeling ability of the transformer, we propose a novel RGBT tracking method based on the transformer via progressive feature enhancement and fusion. The overall flowchart of our proposed tracker consists of a two-branch Siamese network, respectively, an exemplar branch, and a search branch. First, deep features of the RGB and thermal infrared images are extracted by a backbone. Then the features in each branch are enhanced progressively in the channel and spatial dimensions. Specifically, in the channel dimension, the channel attention feature module (CAFM) is designed to adaptively enhance the RGB and thermal infrared features. In the spatial dimension, the transformer self-attention mechanism with the AiA module is integrated to enhance the dual-modality features. Next, the enhanced features from the exemplar and search branches are fused based on the transformer cross-attention mechanism, which can achieve global and deep interaction between the exemplar and search images. Finally, the fused features are fed into a corner predictor head to estimate the target state. Experiments on two widely used public benchmarks (RGBT234 and LasHeR) demonstrate the effectiveness and efficiency of our proposed method when compared to many other state-of-the-art (SOTA) trackers released recently.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2024.3372991