Font Size: a A A

Research On End-to-End Speech Recognition Of Civil Aviation Radiotelephony Communication Based On Deep Learning

Posted on:2024-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhangFull Text:PDF
GTID:2542307088996109Subject:Transportation planning and management
Abstract/Summary:PDF Full Text Request
The radiotelephony communication system is the primary means of communication during air traffic control.It allows controllers to provide remote pilots with necessary information such as weather conditions,traffic situation,runway conditions,etc.,to assist pilots in making the correct decisions.Incorporating speech recognition technology into the radiotelephony communication system to achieve consistency monitoring of control instructions and flight recitation,as well as post-event voice analysis,can further enhance aircraft operational efficiency and flight safety.However,due to the particularity of the civil aviation industry,general end-to-end speech recognition systems often fail to meet the high precision and realtime feedback requirements.On the one hand,large background noise and unstable speech speed can reduce speech intelligibility.On the other hand,unique industry pronunciation norms may induce recognition errors with homophonic words.In addition,training a speech recognition system suitable for this industry using limited civil aviation calibration speech data also presents significant challenges.This paper aims to provide comprehensive and in-depth research analysis to address these issues and achieve efficient speech recognition in air traffic control scenarios.The core aspects covered in this research include the following:(1)A preliminary pre-training strategy was adopted using a large-scale open-source speech dataset to enhance the Res Net-GAU-CTC model’s understanding of speech signal commonalities and patterns in different scenarios.Multiple sets of comparative experiments were conducted to validate the effectiveness of the model.The experimental results demonstrated that the proposed structure achieved the lowest character(word)error rates of9.8%,10.8%,8.7%,and 9.1% on the validation and test sets of Aishell-1 and Librispeech-clean,respectively.(2)Transfer learning techniques were employed to facilitate knowledge sharing and reuse,allowing fine-tuning and retraining on target domain datasets to reduce excessive reliance on domain-specific data and improve the model’s generalization ability.Various data augmentation methods,such as speed perturbation,noise injection,and time-frequency masking,were applied to enrich the dataset’s volume,coverage,and diversity while avoiding the inefficiency caused by low similarity between different domains.On the validation and test sets of the civil aviation English and Chinese speech recognition task,the proposed approach achieved character(word)error rates of 7.6%,8.1%,and 7.3%,7.8%,respectively.(3)A multi-task joint learning framework was proposed,utilizing both CTC-based and attention-based decoders with different alignment approaches to jointly perform training and decoding.Furthermore,an improved multi-scale convolutional neural network structure was employed in the shared network to extract features from different scales in the time-frequency domain comprehensively.To reduce memory consumption and enhance computational efficiency,a mixed chunk attention mechanism was employed to address the quadratic complexity issue caused by long input sequences.Experimental results showed that by incorporating a series of optimization measures,recognition accuracy was further improved,achieving character(word)error rates of 6.17%,7.38%,and 6.29%,7.51% on the validation and test sets of the civil aviation English and Chinese speech recognition task,respectively.
Keywords/Search Tags:radiotelephony communication, end-to-end speech recognition, Res NetGAU-CTC, transfer learning, data augmentation, multi-task joint learning framework
PDF Full Text Request
Related items