Font Size: a A A

Research And Development Of End-to-End Speech Recognition In News Field

Posted on:2022-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZhangFull Text:PDF
GTID:2518306539998389Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech recognition is the process of using intelligent algorithms to convert human speech into text or control signals.It plays a vital role in many biometric recognition systems and voice control automation systems.Different from the traditional method of decomposing the speech recognition task into multiple subtasks(word model,acoustic model and language model),the end-to-end speech recognition model can generate corresponding text information according to the input audio features,which is simplified to a certain extent the training process of the model in the speech recognition task is described.At present,the end-to-end speech recognition algorithm based on the Conformer neural network structure of the self-attention mechanism has become the mainstream.Therefore,in this paper uses the multi-encoder and sentence-level consistency strategy to improve its model and get a boost.Although speech recognition can achieve good results under reasonable control of speech standards and background noise,it still faces research difficulties.On the one hand,the recognition rate is low in special and complex situations;on the other hand,the problem is that the data used in speech training and the test and the data used in daily life do not match exactly.Therefore,there are few researches on speech recognition in the current news field,and resources are relatively scarce.In this paper constructs the speech recognition data set CH_NEWSASR,and a speech recognition system oriented to the news field is designed and implemented.The main work includes the following three aspects:(1)In response to the lack of data resources in the news field,the Py Audio Anlysis tool is used to segment the news data,and then the Baidu and Tencent interfaces are called to identify the results.The results are filtered according to the BLEU value,and finally the filtered data is manually reviewed、data cleaned、etc.Constructed a data set CH_NEWS-ASR for the news field,and used RNN,Transformer,Conformer and other models to verify the effectiveness of the data set.The best model has a word error rate of4.1% on the test set and a word correct rate 96.3%,the experimental results show that the data set constructed in this paper has a certain domain and can be effectively used in the research of end-to-end speech recognition.(2)Because the data set constructed in this article has a long average text length and the attention mechanism-based model has a limited sequence length,it is not possible to learn too long sequences well.Therefore,this paper uses the Bi GRU + Conformer multiencoder strategy and the sentence-level consistency method to improve the Conformer model.Experiments are carried out on the data set constructed in this paper and the Aishell_1 data set.The experimental results prove that the sentence-level consistency method is very useful for speech recognition and the model is very effective.Moreover,the parameters of the model have not increased.Although the performance of the Bi GRU fusion method has been improved,the number of parameters of the model has also increased.(3)Based on resource construction and model training,a speech recognition system oriented to the news field was designed and implemented.The system is capable of recognizing speech in the news field,using Nginx + UWSGI + Django to complete the deployment of the system,implement the speech recognition network service interface,test the operating speed and concurrent performance of the interface,and write test cases to Perform functional tests on the sub-modules of the system.The results show that the system function modules are operating normally,the system functions meet the needs of users,and the processing speed and concurrency have achieved good results.
Keywords/Search Tags:Speech Recognition, Resource building, Deep learning, Conformer
PDF Full Text Request
Related items