Research And Implementation Of Chinese Speech Recognition Methods In Noisy Environment

Posted on:2022-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhao

Full Text:PDF

GTID:2568307070952659

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of artificial intelligence,the demand for using voice to communicate with machines continues increasing,the application of speech recognition technology is becoming more and more extensive,and speech recognition systems are emerging in endlessly.However,people find that the noise in actual life scenes inevitably causes interference to the speech as the speech recognition system gradually enters their daily life,resulting in the performance degradation of the speech recognition system.In order to improve its practicality,under noisy environments speech recognition has become an important research direction.Aiming at this difficulty,this dissertation combines signal-to-noise separation and speech recognition,and studies the unvoiced reconstruction method to compensate for the lack of signal-to-noise separation,so as to optimize speech recognition,and finally build a continuous speech recognition system.The main research work is as follows:(1)Continuous speech segmentation.The characteristics of Chinese speech signals in the time domain and frequency domain are studied.Based on the positional regularity and characteristic difference of unvoiced and voiced sounds in Chinese,this dissertation implements a multi-level speech Method of segmentation.First,the traditional dual-threshold endpoint detection algorithm is improved.In order to avoid misjudging the unvoiced segment as a silent segment,we add a new threshold to improve the accuracy of voiced segment detection effectively;then draw and process the pitch spectrum,and determine the boundary of unvoiced and voiced sounds according to the range of the pitch period track;since the formant is related to semantics,the point where the formant changes significantly is the change point of the syllable.The spectrogram is divided into multiple frequency bands,and the energy of each frequency band is calculated according to the change of frequency band energy Estimate the segmentation points of adjacent syllables to complete the segmentation of Chinese syllables.Experiments with traditional segmentation methods verify that the performance of the multilevel segmentation algorithm proposed in this dissertation is better.(2)Implementation of acoustic model and signal-to-noise separation.Using a bidirectional long-short-term memory network to implement an acoustic model in speech recognition,and the recognition results of the traditional model and the deep model on the same data set are compared and analyzed.The voiced sound is separated by extracting the harmonic structure through the improved comb filter,and the signal-to-noise separation experiment is carried out on the noisy speech under different signal-to-noise ratio conditions,which verifies the effectiveness of the improved harmonic extraction method.The acoustic model is trained with the voice of the modified voice to improve its anti-noise performance,and the comparative experiments on the improved acoustic model have also achieved good results.(3)Implementation of language model and voiceless reconstruction.The N-gram language model and the language model based on recurrent neural network are implemented,and the two language models are compared experimentally,and a similar syllable table is set for the recognition results of the acoustic model,which increases the fault tolerance and improves the performance of the language model.The experiments show that The recognition effect has been improved;the method of unvoiced speech reconstruction is also studied according to the language model,and the speech with missing information after denoising processing is recognized after information compensation,and it is verified that the unvoiced speech reconstruction algorithm has a certain improvement in speech recognition in noisy environments.(4)The realization of continuous speech recognition system.We visualized the various features used in the speech recognition process and combine the above functions of speech segmentation,acoustic model,language model,signal-to-noise separation and unvoiced reconstruction to realize a continuous speech recognition system.

Keywords/Search Tags:

Speech recognition, Deep learning, BLSTM, Speech segmentation, Auditory scene analysis, Unvoiced reconstruction

PDF Full Text Request

Related items

1	The Blind Separation Of Monaural Speech Based On Computational Auditory Scene Analysis
2	The Research Of Monaural Speech Segregation Based On Computational Auditory Scene Analysis
3	Computational Auditory Scene Analysis Based Voice Pretreatment System
4	Research On Continuous Speech Recognition Based On Deep Learning
5	Computational auditory scene analysis and robust automatic speech recognition
6	The Research Of Speech Separation Based On Computational Auditory Scene Analysis
7	Speech Separation Research Based On Human Auditory Characteristics
8	Monaural Speech Segregation Based On Computational Auditory Scene Analysis
9	Binaural Speech Separation Research Based On Deep Learning
10	Segregation Of Reverberant Speech Based On Computational Auditory Scene Analysis And Deep Neural Network