Font Size: a A A

Research On Silent Speech Recognition Based On The Fusion Of Visual And EMG Signals

Posted on:2023-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:L H FuFull Text:PDF
GTID:2558307154478144Subject:Engineering
Abstract/Summary:PDF Full Text Request
Research on silent speech recognition based on non-acoustic bio-signals is a rapidly growing interdisciplinary subject,which involves multiple disciplines such as engineering,computer science,medicine and neuroscience.Silent speech recognition techniques are receiving increasing attention because of their high privacy protection,noise resistance and friendliness to people with speech disabilities.Among the existing research on silent speech recognition,speech recognition based on lip images and surface electromyography(sEMG)have received much attention due to the advantages of simple signal acquisition,high portability and low cost,but they are all based on unimodal bio-signals,and their recognition accuracy and robustness need to be improved.In this thesis,we focus on silent speech recognition based on lip images,sEMG and the fusion of the two signals.Based on this,the following main work has been completed:(1)A Chinese word-level sEMG-video dataset,EV-Speech,is constructed for the rehabilitation of people with disabilities.since no publicly available sEMG-video dataset exists,this thesis designs the corpus from the perspective of helping people with speech disabilities to carry out normal communication activities.The dataset contains100 words with a total of about 20,000 samples,each containing a time-aligned sEMG signal and video,making it the first sEMG-video Chinese dataset.(2)C-MFSC,a feature extraction method with both time domain,frequency domain and space domain information,is proposed.Channel-Log Mel-Frequency Spectral Coefficients(MFSC)feature is used to extract the time-frequency domain features of sEMG signals,and on the basis of this,we combine the multi-channel structure of sEMG signals to extract a discriminative time-frequency-space domain feature C-MFSC.A discriminative time-frequency-space domain feature,C-MFSC,was extracted by retaining more information related to the space domain,and the effectiveness of the feature extraction method was verified on the EV-Speech dataset.(3)A novel end-to-end sEMG-Visual speech recognition model is proposed.Speech recognition based on unimodal bio-signals is affected by the characteristics of the signal itself and environmental conditions,and its robustness and recognition accuracy are not high.To address this feature,a novel end-to-end sEMG-Visual fusion recognition model is proposed in this thesis.The front-end network of the fusion system consists of the sEMG branch and the visual branch,which are built on Convolutional Neural Networks(CNN)to process the data input and extract the feature of sEMG and lip images respectively,while the back-end network is built with Bidirectional Gated Recurrent Unit(Bi GRU)for extracting the overall information of the feature sequences and modeling the temporal sequence.The fusion method uses feature fusion and decision fusion to improve the accuracy and robustness of the recognition by combining the information correlation and complementarity of the two bio-signals.Finally,the effectiveness of the fusion recognition model was validated on the proposed EV-Speech dataset,with the fusion model achieving the best recognition accuracy of 97.48%.
Keywords/Search Tags:Silent Speech Recognition, Electromyography, Lip Reading, Deep Learning, Multi-modality Fusion
PDF Full Text Request
Related items