End-to-end Nanchang Dialectal Speech Recognition Based On Deep Learning

Posted on:2024-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:G Jiang

Full Text:PDF

GTID:2568307100980019

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of deep learning,the recognition accuracy of speech recognition systems in various languages has been greatly improved,such as English and Chinese.Mandarin speech recognition has also reached a high level,largely meeting the needs of daily communication.However,in the context of the widespread distribution of various dialects in China,it is difficult to build a speech recognition model for dialects due to the large variety of dialects and the corresponding small amount of data.Therefore,it is necessary to seek more effective methods to study dialect speech recognition.This article focuses on how to use limited dialect speech resources to improve the performance of Nanchang dialect speech recognition system.The main work is as follows:Firstly,the paper analyzes the characteristics of Nanchang dialect and constructs a dataset of Nanchang dialect.We convened six local volunteers from Nanchang to record Nanchang dialect against specific texts.The recording file was edited,cut,and corrected to obtain a total of 18.2 hours,totaling 13988 voice data from Nanchang dialect.Secondly,this article has conducted in-depth research on various end-to-end speech recognition models.Referring to these achievements,we have built an end-to-end speech recognition model similar to the RNN-T model,called Conformer-Transducer.In this model,we use Conformar as the acoustic encoder of the RNN-T structure and BLSTM as its label encoder.The Conformer model performs well in various speech recognition tasks,combining the advantages of Transformer and convolutional neural networks-Transformer model is good at capturing global information of content,while convolutional neural networks are good at extracting local features of content.In the experiment,by setting different main parameters of the model,such as the number of Conformer encoders,the number of multiple heads of the Conformer module,and the size of the convolutional core of the Conformer module,comparative experiments were conducted,and the best parameter of the Conformer-Transducer model was selected.Then,different end-to-end speech recognition models were trained using AISHELL-1 and aidatatang＿200zh datasets,and the comparison results showed that the Conformer-Transducer end-to-end speech recognition model had the best effect on the Mandarin dataset.Finally,transfer learning is applied to train the end-to-end Nanchang dialect speech recognition model.First,the collected dialect data is processed at variable speeds to achieve the purpose of data expansion.Then,different schemes were adopted to fine-tune the model,and the optimal recognition effect was achieved through fine-tuning each module of the model.The Character error rate on the Nanchang dialect test set was 12.6%.

Keywords/Search Tags:

dialectal speech recognition, low resource, Conformer, RNN-T, fine-tuning

PDF Full Text Request

Related items

1	Research And Development Of End-to-End Speech Recognition In News Field
2	Research And Application Of Speech Recognition Based On Conformer
3	Research On Robust Automatic Segmentation Of Dialectal Speech
4	End-to-End Chinese Speech Recognition Algorithm Research
5	End-to-End Speech Recognition Model Research And System Construction
6	Research On Speech Recognition Based On Self-supervised Model
7	Research On Transfer Learning For Khalkha Mongolian Speech Recognition Acoustic Model
8	Theory Researches And Software Design Of PID-Parameter Self-tuning Based On DCS Platform
9	TSEGAN Speech Enhancement Based On Dynamic Convolution And Narrow-band Conformer Network
10	Low-delay Target Person Reconnaissance Based On Voiceprint Recognition In Complex Background