Font Size: a A A

Research On Speech Recognition Based On Convolution Neural Network

Posted on:2020-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiFull Text:PDF
GTID:2428330572497394Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Recently,Deep Learning(DL)has been proved to significantly improve speech recognition performance.Convolutional Neural Network(CNN)is widely used in speech recognition tasks due to its special network structure and strong feature learning ability.However,the convolution kernel size of the traditional convolution neural network two-dimensional model is usually nn?,which cannot reflect the one-dimensional essential characteristics of speech signals.To solve this problem,this paper proposes to set one dimension of the convolution kernel as the number of frames and use the convolution neural network one-dimensional model and two-dimensional model for speech recognition.It can adapt to the time variation of speech signals and preserve the correlation between frequency bands to the greatest extent through the movement of convolution kernel on the time axis and frequency band respectively,thus further improving the speech recognition performance.At the same time,the speech signal preprocessing,feature parameter extraction and normalization algorithm are also studied in this paper.The main work includes the following contents:(1)In the speech signal preprocessing part,the modified endpoint detection algorithm is proposed to solve the problem that the traditional endpoint detection algorithm is only suitable for detecting pure speech and the energy and zero-crossing rate thresholds are fixed values and cannot change with different speech signals.In the speech signal feature parameter extraction and normalization part,aiming at the problem that the Discrete Cosine Transformation(DCT)operation can break the correlation information which can be used by convolution operation in the feature parameter frequency scale when the Mel Frequency Cepstral Coefficients(MFCC)is obtained,In this paper,the logarithmic energy calculated by Mel's spectral coefficient(the last DCT transform is removed when calculating MFCC)is selected as the characteristic parameter,which is expressed as MFSC feature.The Dynamic Time Warping(DTW)algorithm is used to normalize the feature parameter to a certain number of frames.The comparison of the experimental results of the feature parameter shows that the speech recognition classification using MFSC feature parameter can obtain better performance than MFCC feature parameter.(2)In the experimental comparison part,the one-dimensional model and the two-dimensional model of the convolutional neural network are firstly given,and then the acoustic models of the three convolutional neural network models used in this paper are built.Then,two parts of comparison experiments are carried out on the same speaker test set.The first part of experiments compares the speech recognition accuracy of the Deep Neural Network(DNN)and the one-dimensional model and the two-dimensional model of the convolutional neural network proposed in this paper.From the experimental results,it is found that the recognition performance of the one-dimensional model and the two-dimensional model of the convolutional neural network proposed in this paper is better than that of the deep neural network.The second part of the experiment compares the speech recognition accuracy of the traditional convolution neural network two-dimensional model,the convolution neural network one-dimensional model and the two-dimensional model proposed in this paper under different regular frames,different convolution kernel shapes,different pooling parameters and different input characteristic parameters,and compares the convergence of the three convolution neural network models.Again,the generalization of the three convolutional neural network models is compared from two aspects of different convolutional layers+pooled layers and different test sets(same speaker test set,different speaker test set).Finally,the noise robustness of three convolution neural network models is evaluated on the noisy mixed test set.The experimental results show that the recognition performance of the one-dimensional model and the two-dimensional model of the convolutional neural network proposed in this paper are better than that of the traditional two-dimensional model of the convolutional neural network in the above different environments,and the generalization ability and the noise robustness are stronger.(3)Based on the basic theory of speech recognition and convolution neural network algorithm studied above,a speech recognition system based on convolution neural network is designed on Matlab GUI platform,which verifies the effectiveness of the algorithm proposed in this paper.
Keywords/Search Tags:Speech recognition, Convolution neural network, Characteristic parameters, Convolution kernel, Generalizati, Noise Robustness
PDF Full Text Request
Related items