Font Size: a A A

Research On Lightweight Lip Recognition Algorithm Based On GhostNet

Posted on:2024-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:G Y ZhangFull Text:PDF
GTID:2568307106967609Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Lip recognition technology refers to recognizing what a speaker is saying by analyzing the visual information of the speaker’s mouth movements.As one of the important aspects of human-computer interaction,traditional lip recognition methods are susceptible to human influence,and the accuracy and efficiency are difficult to guarantee.Another method of lip recognition based on deep learning contains complex network models with large number of parameters and computation,which poses difficulties for devices with limited storage capacity and computation power such as mobile terminals.Based on the above problems,this paper uses GhostNet,a lightweight network,as the backbone network for front-end spatial feature acquisition,and proposes a more efficient Efficient-GhostNet based on it,which achieves performance improvement without dimensionality reduction and reduces the number of parameters through a local cross-channel interaction strategy.The improved Efficient-GhostNet is used to perform lip spatial feature extraction,and then the extracted features are input to the GRU network to obtain the temporal features of the lip sequences.To enhance its temporal feature extraction effect,we introduce an attention mechanism,and the final output is used for predictive classification.We used Asian volunteers for recording the dataset in this paper,and the experiments demonstrate that the improved EfficientGhostNet+GRU model can achieve the goal of reducing the number of parameters and has considerable accuracy.The research work in this paper is as follows:(1)Lip image preprocessing.Firstly,for the input video,we propose a semirandom extraction fixed video frame strategy to draw frames for the video,then we use the Dlib library to obtain the 68 points position coordinates of the face for a single frame image,take the four key points of the lip to set the crop coordinates and intercept the lip image.(2)Improvements to lightweight network.This paper selects GhostNet,a lightweight network,as the spatial feature extraction backbone network from the perspective of lightweight,and introduces a more efficient ECA module on top of it,which serves to enhance the cross-channel interaction capability of the model and at the same time can reduce the number of parameters and computation of the model,so that the model can better extract spatial features.(3)A converged lip recognition network based on the improved GhostNet and GRU.In this paper,combining the respective advantages of CNN and RNN in spatial and temporal feature extraction,we propose the use of the improved Efficient-GhostNet for the front-end network and the GRU network with the introduction of attention mechanism for the back-end as the overall network structure,which can greatly reduce the number of parameters and computation of the model at the expense of a small amount of accuracy.(4)Research and development of lip recognition system.At present,lip-speech is being used in a wide range of scenarios.In this paper,we develop and design a lipspeech recognition system,which uses Py Qt5 as the framework to design the interface,and a complete system from video input,video recognition,video visualization and video results is designed by combining the front and back ends.
Keywords/Search Tags:Lip recognition, Efficient-GhostNet, lightweight networks, attention mechanism
PDF Full Text Request
Related items