Action recognition based on element-level fine-grained multi-modal fusion
Abstract Traditional action recognition algorithms often only pay attention to video RGB features or optical flow features. These methods do not make good use of the audio information in the video. Based on RGB and optical flow characteristics, this paper introduces the processing of audio informati...
Saved in:
Published in: | Journal of physics. Conference series 2021-09, Vol.2010 (1), p.12114 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | eng |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
Traditional action recognition algorithms often only pay attention to video RGB features or optical flow features. These methods do not make good use of the audio information in the video. Based on RGB and optical flow characteristics, this paper introduces the processing of audio information, and classifies videos based on element-level fine-grained multi-modal fusion. Through experimental comparison, the accuracy of the multi-modal fusion algorithm proposed in this paper is improved by 7.38% on the HMDB51 dataset and 3.18% on the UCF101 dataset compared to the simple modal splicing. At the same time, it is proved that the introduction of audio modes can effectively improve the performance of the model. |
---|---|
ISSN: | 1742-6588 1742-6596 |