Loading…

EdgeFormer: Edge-Aware Efficient Transformer for Image Super-Resolution

The imaging system of visual measurement equipment is usually affected by environment factors, such as distortion, blurring, and noise, which lead to the degradation of the acquired image. This article mainly studies on the image super-resolution (SR) technology with the vision transformer (ViT). Ho...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on instrumentation and measurement 2024, Vol.73, p.1-12
Main Authors: Luo, Xiaotong, Ai, Zekun, Liang, Qiuyuan, Xie, Yuan, Shi, Zhongchao, Fan, Jianping, Qu, Yanyun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The imaging system of visual measurement equipment is usually affected by environment factors, such as distortion, blurring, and noise, which lead to the degradation of the acquired image. This article mainly studies on the image super-resolution (SR) technology with the vision transformer (ViT). However, it has the problem of high computational cost and large GPU memory, which hinders its application in image SR. The existing works mainly design lightweight network architectures for efficient inference, while ignoring the intrinsic image content resulting in wasting computing resources on unnecessary regions. In this article, we present an edge-aware high-efficiency transformer (EdgeFormer) for accurate image SR, which aims to perform self-attention (SA) on the informative edge and texture regions so as to significantly reduce the computational complexity. It consists of a sparse edge-aware pixel selector (SEPS) and a multiscale efficient transformer module (METM). SEPS is designed as a tiny side subnetwork to generate a binary mask indicating the position of edge or texture tokens, in which a sparse error-driven loss is introduced to further constrain the informative tokens in a more fine-grained way. Then, METM focuses on performing SA between the selective informative tokens. To effectively parallelize the execution, a cross-sample sliding window (CSSW) strategy is designed to make up for the uneven number of informative tokens for each sample. Our EdgeFormer can be combined with existing convolutional neural network (CNN)-based SR backbones to fully integrate the global and local context information. Extensive experimental results demonstrate that our EdgeFormer achieves obvious performance gain with fewer floating point of operations (FLOPs) compared with other models. The code is available at: https://github.com/xiaotongtt/EdgeFormer .
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2024.3436070