Loading…

Dynamic Texture Classification Using Unsupervised 3D Filter Learning and Local Binary Encoding

Local binary descriptors, such as local binary pattern (LBP) and its various variants, have been studied extensively in texture and dynamic texture analysis due to their outstanding characteristics, such as grayscale invariance, low computational complexity and good discriminability. Most existing l...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2019-07, Vol.21 (7), p.1694-1708
Main Authors: Zhao, Xiaochao, Lin, Yaping, Liu, Li, Heikkila, Janne, Zheng, Wenming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Local binary descriptors, such as local binary pattern (LBP) and its various variants, have been studied extensively in texture and dynamic texture analysis due to their outstanding characteristics, such as grayscale invariance, low computational complexity and good discriminability. Most existing local binary feature extraction methods extract spatio-temporal features from three orthogonal planes of a spatio-temporal volume by viewing a dynamic texture in 3D space. For a given pixel in a video, only a proportion of its surrounding pixels is incorporated in the local binary feature extraction process. We argue that the ignored pixels contain discriminative information that should be explored. To fully utilize the information conveyed by all the pixels in a local neighborhood, we propose extracting local binary features from the spatio-temporal domain with 3D filters that are learned in an unsupervised manner so that the discriminative features along both the spatial and temporal dimensions are captured simultaneously. The proposed approach consists of three components: 1) 3D filtering; 2) binary hashing; and 3) joint histogramming. Densely sampled 3D blocks of a dynamic texture are first normalized to have zero mean and are then filtered by 3D filters that are learned in advance. To preserve more of the structure information, the filter response vectors are decomposed into two complementary components, namely, the signs and the magnitudes, which are further encoded separately into binary codes. The local mean pixels of the 3D blocks are also converted into binary codes. Finally, three types of binary codes are combined via joint or hybrid histograms for the final feature representation. Extensive experiments are conducted on three commonly used dynamic texture databases: 1) UCLA; 2) DynTex; and 3) YUVL. The proposed method provides comparable results to, and even outperforms, many state-of-the-art methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2018.2890362