Font Size: a A A

Research On Feature Extraction Method Of Speech Recognition In Mobile Crowd Sensing

Posted on:2018-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:H H ZhangFull Text:PDF
GTID:2348330533963273Subject:Engineering
Abstract/Summary:PDF Full Text Request
Most of the automatic speech recognition systems in mobile crowd sensing are becoming more and more complex in the collection of speech signals and are sensitive to the nature of the acoustic environment in which they are deployed.In the presence of such as additive noise,linear channel distortion and reverberation,the performance of automatic speech recognition systems in mobile crowd sensing is deteriorating rapidly.Thus,automatic speech recognition systems in mobile crowd sensing need better robustness and compression than traditional speech recognition systems.This paper improves the MFCC used in the existing speech recognition system and calls this improved coefficient "power-normalized cepstral coefficient" for the problems faced by automatic speech recognition systems in mobile crowd.Firstly,the extraction algorithm of traditional MFCC is studied,and the realization principle of each process is analyzed.The MFCC algorithm is used to construct the MFCCs used in the automatic speech recognition systems.The MFCC is extracted,but in the process of implementation,the quantitative order of the MFCC algorithm is reduced and the MFCC extraction algorithm is supplemented by experiments to verify our conjecture.Secondly,in order to improve the accuracy of speech recognition,we use the gammatone-shaped filter bank based on the ear auditory model instead of the mel filter in the MFCC algorithm in the pre-processing phase of speech feature extraction.Thirdly,in order to estimate the acoustic environment degradation more easily and remove the slowly changing speech components more easily,we use 50-120 ms in the environment compensation phase of speech feature extraction,in order to estimate the acoustic environment degradation and background noise level of speech recognition system.The long time frame is combined with the short time frame to analyze the parameters,and then the "asymmetric nonlinear filtering" is used to estimate the acoustic background noise level for per frame and per band.Finally,we focus on the start of the incident power envelope rather than the characteristics of the falling edge of the power envelope for the human ear,and we implement temporary masking in the speech signal processing block.For the input signal power,in addition to the rise of the "attack transient",the temporary shelter to suppress the other part of the signal power system response.At the same time,we divide the input power by the average total power that is running to normalize the input power to achieve real-time character extraction.
Keywords/Search Tags:Mobile crowd sensing, Automatic Speech Recognition, feature extraction, MFCC, power-normalized cepstral coefficients
PDF Full Text Request
Related items