Loading…

Multi-modality learning for human action recognition

The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single modality. However, it is difficult for multi-modality learning to capture the spatial-temporal information from the entire RGB and depth sequen...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2021-05, Vol.80 (11), p.16185-16203
Main Authors:	Ren, Ziliang, Zhang, Qieshi, Gao, Xiangyang, Hao, Pengyi, Cheng, Jun
Format:	Article
Language:	English
Subjects:	Computer Communication Networks Computer Science Data Structures and Information Theory Design Human activity recognition Human motion Image segmentation Learning Methods Multimedia Multimedia Information Systems Neural networks Special Purpose and Application-Based Systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single modality. However, it is difficult for multi-modality learning to capture the spatial-temporal information from the entire RGB and depth sequence effectively. In this paper, to obtain better representation of spatial-temporal information, we propose a bidirectional rank pooling method to construct the RGB Visual Dynamic Images (VDIs) and Depth Dynamic Images (DDIs). Furthermore, we design an effective segmentation convolutional networks (ConvNets) architecture based on multi-modality hierarchical fusion strategy for human action recognition. The proposed method has been verified and achieved the state-of-the-art results on the widely used NTU RGB+D, SYSU 3D HOI and UWA3D II datasets.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-019-08576-z