Font Size: a A A

English Speech Recognition And Pronunciation Quality Evaluation Based On Deep Learning

Posted on:2016-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:J H ChenFull Text:PDF
GTID:2285330479982619Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the increasing level of the globalization and China’s internationalization,Chinese demand for English learning is rising at an unprecedented rate. But due to the restrictions on domestic learning environment and teaching conditions, most of English learners find it difficult to learn spoken English. With the development of computer science and technology and the progress of language teaching and learning methods, Computer-assisted Language Learning technology makes it possible to solve this problem.The core of Computer-Assisted Language Learning is speech recognition and evaluation technology, and speech recognition technology is the key. On account of the complex changes in pronunciation, the large data volume of speech signals, high dimensionality of speech feature parameters and huge computation in speech recognition and evaluation, there are more demanding software and hardware resources and algorithms required for large quantities of speech signal processing.However, the traditional speech recognition algorithms, such as Dynamic Time Warping, Hidden Markov Model and Artificial Neural Networks, are also confronted with an unprecedented bottleneck in further improving their recognition accuracy and speed, though they in themselves have advantages and disadvantages. In recent years,with the development of Deep Learning in machine learning field and the accumulation of big data corpus, speech recognition and evaluation technology have been advanced by leaps and bounds. By learning a deep nonlinear network structure,Deep Learning achieves complex function approximation, characterizes the distributed representation of input data and demonstrates a strong ability to learn essential characteristics of the data set from a few samples, and thus performs better in simulating human brain to analyze and learn. Therefore, Deep Learning technology is applied in English speech recognition. In this paper, a speech recognition model,based on Mel Frequency Cepstrum Coefficient and Deep Belief Network, is established. By the validation of Spoken Arabic Digit data set from UCI machine learning repository, the recognition performance of this model is superior to the improved Hidden Markov Model, Back Propagation Neural Networks and Tree Distributions Approximation Model.In English pronunciation quality evaluation, there are problems existing in two aspects. On the one hand, in spoken English learning, some Computer-Assisted Language Learning systems at home and abroad mainly focus on vocabulary learning and grammar learning. Those systems have certain functional defects in scoring, such as fewer evaluation indicators(mostly one or two) and only a general score returned to learners. On the other hand, in spoken English evaluation, oral English examinations are still based on human rating which is of strong subjective intentions,different scoring standards and low rating speed, and is poor in reproducibility and stability. To solve these problems, this paper takes Chinese college students’ English speech as research object. It improves the traditional computer-assisted evaluation methods for English pronunciation quality, considering multi-parameter evaluation indicators, such as accuracy, speed, rhythm and intonation. The pronunciation evaluation includes accuracy evaluation based on Mel Frequency Cepstrum Coefficient, speed evaluation based on duration, rhythm evaluation based on short-term energy and Pairwise Variability Index, and intonation evaluation based on fundamental frequency. Verified by experiments, the above evaluation indicators in this paper is reliable. Furthermore, considering the weights of the above multi-parameter evaluation indicators, the regression analysis method is used to construct a reasonable and objective English pronunciation evaluation model. It is also proved that the English pronunciation quality evaluation model in this paper is reliable, which can provide learners timely, accurate, objective feedback and guidance,help learners find out difference between their own pronunciation and the standard one, so as to correct their pronunciation errors and improve their learning efficiency.
Keywords/Search Tags:English, Speech Recognition, Deep Learning, Pronunciation Quality Evaluation, Multi-parameter Evaluation Indicators
PDF Full Text Request
Related items