Research On Feature Extraction Method Of Speech Recognition In Mobile Crowd Sensing

Posted on:2018-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:H H Zhang

Full Text:PDF

GTID:2348330533963273

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Most of the automatic speech recognition systems in mobile crowd sensing are becoming more and more complex in the collection of speech signals and are sensitive to the nature of the acoustic environment in which they are deployed.In the presence of such as additive noise,linear channel distortion and reverberation,the performance of automatic speech recognition systems in mobile crowd sensing is deteriorating rapidly.Thus,automatic speech recognition systems in mobile crowd sensing need better robustness and compression than traditional speech recognition systems.This paper improves the MFCC used in the existing speech recognition system and calls this improved coefficient "power-normalized cepstral coefficient" for the problems faced by automatic speech recognition systems in mobile crowd.Firstly,the extraction algorithm of traditional MFCC is studied,and the realization principle of each process is analyzed.The MFCC algorithm is used to construct the MFCCs used in the automatic speech recognition systems.The MFCC is extracted,but in the process of implementation,the quantitative order of the MFCC algorithm is reduced and the MFCC extraction algorithm is supplemented by experiments to verify our conjecture.Secondly,in order to improve the accuracy of speech recognition,we use the gammatone-shaped filter bank based on the ear auditory model instead of the mel filter in the MFCC algorithm in the pre-processing phase of speech feature extraction.Thirdly,in order to estimate the acoustic environment degradation more easily and remove the slowly changing speech components more easily,we use 50-120 ms in the environment compensation phase of speech feature extraction,in order to estimate the acoustic environment degradation and background noise level of speech recognition system.The long time frame is combined with the short time frame to analyze the parameters,and then the "asymmetric nonlinear filtering" is used to estimate the acoustic background noise level for per frame and per band.Finally,we focus on the start of the incident power envelope rather than the characteristics of the falling edge of the power envelope for the human ear,and we implement temporary masking in the speech signal processing block.For the input signal power,in addition to the rise of the "attack transient",the temporary shelter to suppress the other part of the signal power system response.At the same time,we divide the input power by the average total power that is running to normalize the input power to achieve real-time character extraction.

Keywords/Search Tags:

Mobile crowd sensing, Automatic Speech Recognition, feature extraction, MFCC, power-normalized cepstral coefficients

PDF Full Text Request

Related items

1	Anti-noise Power Normalized Cepstral Coefficients For Two-level Robust Environmental Sounds Recognition In Real Noisy Conditions
2	Noise-robust Auditory Feature Extraction And Optimization For Speech Recognition
3	The Research Of Feature Extraction Algorithm On The Speaker-Independent Speech Recognition
4	Speech Recognition Speed Up Research Based On MFCC
5	Comprehensive Analysis And Application Of Template Matching Algorithm Based On Feature Extraction Of Speech Signal
6	Cochlear Filter Cepstral Feature In Speech Recognition
7	Study Of Methods Of Speech Features Extraction Of Ando Tibetan
8	Hidden Markov Model Based Automatic Speech Recognition Using Mel Frequency Cepstral Coefficients In Nepalese
9	Study Of Speech Recognition System For Mandarin Digit Based On HMM
10	The Research Of Speech Emotion Based On Multi-feature Extraction And Fusion