Font Size: a A A

Chinese Lipreading Research Based On Deep Learning

Posted on:2020-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:W W CaiFull Text:PDF
GTID:2428330590463145Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,lip reading based on deep learning gradually become popular.Lipreading needs to analyze and judge the movement information of the speaker's lip,including the relative position of the lip,the teeth and the tongue,and identify the contents expressed by the speaker.The result will be influenced by the language and pronunciation habits.Because of the influence from angle and other factors,obtaining information based on lip movement is a challenging task.At the same time,there is a lack of open Chinese lipreading corpus to do relevant research.Therefore,this thesis focuses on the construction of Chinese lip reading on sentence-level and the methods of Chinese lipreading based on deep learning.A semi-automatic construction method of Chinese lip reading corpus NSTDB(News,Speech,Talk Show Database)based on sentence-level is proposed.Firstly,a face detection algorithm is used to get the video segment containing only independent speaker,and then the continuous lip frames is obtained by the face key points location algorithm.Finally,the separated voice signal is processed by speech recognition technology to generate Chinese texts,the corresponding label data is obtained by word segmentation algorithm.A Chinese lipreading neural network model Ch-LipNet.based on deep learning is proposed.The model firstly uses 2D convolution neural network to extract the features of each frame and splice the features.Then LSTM(Long Short-Term Memory)or GRU(Gate Recurrent Unit),are connected to complete the learning from image sequences to text sequences.At the same time,the CTC(Connectionist Temporal Classification)loss function is used in the training process to achieve the alignment of unequal length sequences.Finally,the output layer uses FC(Fully Connected)to obtain the corresponding text label.A lipreading method based on D2D(DenseNet-2D)model and data concatenation preprocessing is proposed.This method uses LRW-1000 data set and self-built Chinese lipreading dataset NSTDB training network model.The results show that the data splicing method is not only suitable for word-level,but also for sentence-level.The training speed is greatly improved with little influence on accuracy,and improve the utilization rate of space.
Keywords/Search Tags:Chinese lipreading, Convolutional neural network, Bidirectional long short-term memory, Data splicing preprocessing
PDF Full Text Request
Related items