Font Size: a A A

Cochlear Filter Cepstral Feature In Speech Recognition

Posted on:2012-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2178330332491067Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The human ear has strong ability to identify in clean as well as noisy environment. So, many researchers dedicate on the auditory character of human being in speech recognition field, which can be modeled as speech feature extraction. Auditory-based feature is better at enhancing the recognition rate as well as robustness.In this paper, a new feature is presented, called CFCC (Cochlear Filter Cepstral Coefficients). CFCC feature is based on basement membrane (BM) impulse response, realizing the whole process from outer ear to basement membrane by wavelet transform. We called the whole process as Auditory Transform (AT). The results of AT are turn to neural energy signals though theoretically simulated function of hair cell. Then, the variable windows are applied base different frequencies. Furthermore, we apply the scales of loudness function, cubic root nonlinearity from the physical energy to the perceived loudness. Finally, the new feature is presented use discrete cosine transform deduce the redundancy of the processed signal. We introduce the way of choosing every coefficient of cochlear filter, hair cell, variable windows, non-linear loudness and DCT, basing human auditory character and signal processing methods.Compared to conventional FFT, AT spectrogram is smoother and has less noise distortion. Moreover, FFT can only process stationary signals. When applied to non-stationary speech signals, they are divided into multiple short-time stationary signals and processed by FFT using fixed windows that causing one single resolution at different frequencies. However, AT can process non-stationary speech signals. Using a variable length window for different frequency bands can avoid the high frequency information being smoothed out by long window duration by the way of the higher frequency, the shorter the window.In experiment, original speech is convoluted by 18 bands Gammatone filters instead of AT operation. The results indicate that AT ensure no missing information because of its inverse transform, but Gammatone filter cannot make it. Furthermore, the center frequencies of Gammatone filters are unchangeable compared to AT. In addition, the results of MFCC & RASTA-PLP in Support Vector Machine (SVM) recognizer are higher in clean condition and drop rapidly in noisy environment, which keep a long distance to the real application. The experiment shows that CFCC has higher recognition rate and better robustness in speech recognition system. CFCC is proposed in Bark and ERB scale respectively, and use SVM as recognition network. The recognition rates are both good and system is stationery, which illustrate the CFCC feature extraction has strong adaptation to different frequency distribution. Finally, experiments show that, with the increasing number of channels, the recognition has stronger noise immunity and little higher recognition rates, but the stability decreased.
Keywords/Search Tags:speech recognition, feature extraction, auditory character, Cochlear Filter Cepstral Coefficients, filters
PDF Full Text Request
Related items