Research And Application Of Unimodal Expression Recognition Based On MobileNetV3 And Bimodal Emotion Recognition Fused With Speech

Posted on:2024-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2568307070950569

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In the past decade,the rise and development of deep convolutional neural networks have led to significant progress in human-centered visual perception computing,particularly in the field of facial expression recognition.This article aims to address the issues of lightweight and multimodal integration in the field of facial expression recognition,and proposes multiple improvement schemes to enhance the accuracy and efficiency of both unimodal and multimodal facial expression recognition.The main work of this article includes:In response to the problem of the existing lightweight networks having room for improvement in recognition accuracy for unimodal static facial expression recognition,this study proposes a series of improvement schemes based on the lightweight Mobile Net V3-Small model,which greatly improves the precision of identifying facial expressions.Specifically,the study employs different Bneck simplification strategies to reduce the model parameters and improve the network’s resistance to overfitting.It also uses an attention mechanism to enhance the model’s sensitivity to facial expression features,constructs a deep-shallow feature fusion network to obtain multi-scale facial expression information,and applies transfer learning to optimize the training strategy and accelerate network convergence while improving recognition accuracy.Experimental results show that the proposed schemes outperform the original model in a self-made mixed facial expression dataset.The optimal simplification method,which reduces 18% of Bneck parameters,can suppress 5% of overfitting.The proposed CTAM-Mobile Net V3 s improves the average recognition accuracy by 5.64%,and the deep-shallow feature fusion network improves facial expression recognition accuracy by 3.14%.To address the issue of the complexity and lack of lightweightness of bimodal facial expression recognition models,this study proposes a bimodal emotion recognition model FSANet,which integrates facial expressions and speech based on the VAANet framework.The study introduces the CTAM-Mobile Net V3 s which employs 3D convolution as the backbone feature extraction network for the visual stream,and uses coordinate attention mechanism to replace the original spatial attention mechanism.Experimental results show that,on the public emotion recognition datasets e NTERFACE’05 and RAVDESS,the accuracy of FSANet is respectively 6.17% and3.90% higher than that of VAANet,while the model size and number of parameters are only 1/3 and 1/7 of VAANet,significantly reducing model complexity.This thesis applies the proposed model methods to design and implement an expression recognition system in practical scenarios.The system mainly includes two core modules: static image expression recognition and bimodal emotion recognition that integrates facial expression and speech.This system provides strong support for emotional analysis in real-world scenarios.

Keywords/Search Tags:

facial expression recognition, bimodal emotion recognition, attention mechanism, lightweight, facial expression and speech

PDF Full Text Request

Related items

1	Bimodal Emotion Recognition Based On Facial Expression And Speech
2	Research On Bimodal Emotion Recognition Based On Facial Expression And Speech Signal
3	Research On Expression And Speech Bimodal Emotion Recognition Of Children
4	Facial Expression Recognition Algorithm Based On Deep Learning
5	Research On Facial Expression Recognition Based On Convolutional Neural Network
6	Research On Emotion Recognition Based On Speech And Facial Expression
7	Research On Emotion Recognition Method Based On Facial Expression
8	Driver Road Rage Recognition By Combining Facial Expression And Speech
9	Research On Subtle Facial Expression Recognition Algorithms Based On Deep Learning
10	Research On Emotion Recognition Based On Speech And Facial Expression