Loading…

Facial expression video generation based-on spatio-temporal convolutional GAN: FEV-GAN

•Our model generates videos of the six basic facial expressions of a given person.•We address the quality and identity preservation issues faced by Spatio-temporal GANs.•We use two encoders, one for the identity and other for other spatial features.•We qualitatively and quantitatively evaluate our m...

Full description

Saved in:
Bibliographic Details
Published in:Intelligent systems with applications 2022-11, Vol.16, p.200139, Article 200139
Main Authors: Bouzid, Hamza, Ballihi, Lahoucine
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Our model generates videos of the six basic facial expressions of a given person.•We address the quality and identity preservation issues faced by Spatio-temporal GANs.•We use two encoders, one for the identity and other for other spatial features.•We qualitatively and quantitatively evaluate our model on two facial expression benchmark databases MUG database and Oulu-CASIA NIR&VIS facial expression database. Facial expression generation has always been an intriguing task for scientists and researchers all over the globe. In this context, we present our novel approach for generating videos of the six basic facial expressions. Starting from a single neutral facial image and a label indicating the desired facial expression, we aim to synthesize a video of the given identity performing the specified facial expression. Our approach, referred to as FEV-GAN (Facial Expression Video GAN), is based on Spatio-temporal Convolutional GANs, that are known to model both content and motion in the same network. Previous methods based on such a network have shown a good ability to generate coherent videos with smooth temporal evolution. However, they still suffer from low image quality and low identity preservation capability. In this work, we address this problem by using a generator composed of two image encoders. The first one is pre-trained for facial identity feature extraction and the second for spatial feature extraction. We have qualitatively and quantitatively evaluated our model on two international facial expression benchmark databases: MUG and Oulu-CASIA NIR&VIS. The experimental results analysis demonstrates the effectiveness of our approach in generating videos of the six basic facial expressions while preserving the input identity. The analysis also proves that the use of both identity and spatial features enhances the decoder ability to better preserve the identity and generate high-quality videos. The code and the pre-trained model will soon be made publicly available.
ISSN:2667-3053
2667-3053
DOI:10.1016/j.iswa.2022.200139