Font Size: a A A

The Design Of Chinese Lip-reading System Based On Deep Learning

Posted on:2023-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q XiaoFull Text:PDF
GTID:2568306788956119Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of deep learning,human society has entered an era of artificial intelligence.As one of the technologies in artificial intelligence,human-computer interaction technology has also made great progress in recent years.Lip-reading,a kind of human-computer interaction,has also attracted much attention in recent years.However,in the research on lip-reading,most of the them is carried out on English corpus,seldom concerned on Chinese lip-reading.This is mainly because Chinese is less influential than English and the research on Chinese lip-reading starts late,lacks of influential Chinese lip-reading datasets,and the accuracy rate for Chinese word is not ideal.Therefore,based on these problems,this paper designs a Chinese lip-reading system based on deep learning.It aim to shorten the distances in the field of Chinese lip-reading,enrich the dataset of Chinese lip-reading,increased the influence of Chinese lip-reading and making the system more practical,serving for the Chinese people in the future.Our system takes the CBAM with ResNet50 as CNN and the Attention with GRU as RNN,then fused the two models.After that,we successfully applies it to our lip-reading system.Compared with 11 kinds of CNN-RNN fusion neural networks commonly used in the field of lip-reading,we draw the conclusion that the deep learning network model we designed in this paper has the best performance and the highest stability.The specific contributions of this paper are as follows:(1)Preprocessing for the original input video.We use a semi-random fixed frame extraction strategy to extract frames from the input video to obtain continuous video frames containing key information,and also do face detection and lip localization on the extracted single-frame images to segment the continuous lip-movement frame sequences and use these continuous image sequences as a set of inputs.(2)Improving CNN for image space feature extraction.In the CNN part,we select the ResNet50 as the convolutional neural network for feature extraction of images,and we innovatively improve the ResBlock of ResNet50 by adding the CBAM to it,enhances its ability to capture small differences between the accents of similar words in Chinese pronunciation and improves the performance of feature extraction during convolution.(3)Improving RNN for image temporal feature extraction.In the RNN part,we choose the GRU with Attention,which helps to extract features between consecutive lip motion images.Considering the influence of before and after moments in the lip-reading process on the current moment,we assign more weights to key frames,which makes the features more representative.(4)Building a Chinese lip-reading system.In this paper,the deep learning networks used in the above two steps are fused,the CNN-RNN fusion network is designed as a form of encoding-decoding to process continuous lip-movement image sequences.In this paper,we demonstrate that our Chinese lip-reading model can accurately recognize Chinese numbers 0-9 and ten Chinese words by wrapping our trained deep learning network model into the designed system and experimenting on the self-built dataset.Compared with other lip-reading systems,our system has better stability and higher recognition accuracy with better performance.
Keywords/Search Tags:deep learning, Chinese lip-reading, CBAM, ResNet50, GRU
PDF Full Text Request
Related items