Research On End-to-End Speech Recognition Of Civil Aviation Radiotelephony Communication Based On Deep Learning

Posted on:2024-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Zhang

Full Text:PDF

GTID:2542307088996109

Subject:Transportation planning and management

Abstract/Summary:

PDF Full Text Request

The radiotelephony communication system is the primary means of communication during air traffic control.It allows controllers to provide remote pilots with necessary information such as weather conditions,traffic situation,runway conditions,etc.,to assist pilots in making the correct decisions.Incorporating speech recognition technology into the radiotelephony communication system to achieve consistency monitoring of control instructions and flight recitation,as well as post-event voice analysis,can further enhance aircraft operational efficiency and flight safety.However,due to the particularity of the civil aviation industry,general end-to-end speech recognition systems often fail to meet the high precision and realtime feedback requirements.On the one hand,large background noise and unstable speech speed can reduce speech intelligibility.On the other hand,unique industry pronunciation norms may induce recognition errors with homophonic words.In addition,training a speech recognition system suitable for this industry using limited civil aviation calibration speech data also presents significant challenges.This paper aims to provide comprehensive and in-depth research analysis to address these issues and achieve efficient speech recognition in air traffic control scenarios.The core aspects covered in this research include the following:(1)A preliminary pre-training strategy was adopted using a large-scale open-source speech dataset to enhance the Res Net-GAU-CTC model’s understanding of speech signal commonalities and patterns in different scenarios.Multiple sets of comparative experiments were conducted to validate the effectiveness of the model.The experimental results demonstrated that the proposed structure achieved the lowest character(word)error rates of9.8%,10.8%,8.7%,and 9.1% on the validation and test sets of Aishell-1 and Librispeech-clean,respectively.(2)Transfer learning techniques were employed to facilitate knowledge sharing and reuse,allowing fine-tuning and retraining on target domain datasets to reduce excessive reliance on domain-specific data and improve the model’s generalization ability.Various data augmentation methods,such as speed perturbation,noise injection,and time-frequency masking,were applied to enrich the dataset’s volume,coverage,and diversity while avoiding the inefficiency caused by low similarity between different domains.On the validation and test sets of the civil aviation English and Chinese speech recognition task,the proposed approach achieved character(word)error rates of 7.6%,8.1%,and 7.3%,7.8%,respectively.(3)A multi-task joint learning framework was proposed,utilizing both CTC-based and attention-based decoders with different alignment approaches to jointly perform training and decoding.Furthermore,an improved multi-scale convolutional neural network structure was employed in the shared network to extract features from different scales in the time-frequency domain comprehensively.To reduce memory consumption and enhance computational efficiency,a mixed chunk attention mechanism was employed to address the quadratic complexity issue caused by long input sequences.Experimental results showed that by incorporating a series of optimization measures,recognition accuracy was further improved,achieving character(word)error rates of 6.17%,7.38%,and 6.29%,7.51% on the validation and test sets of the civil aviation English and Chinese speech recognition task,respectively.

Keywords/Search Tags:

radiotelephony communication, end-to-end speech recognition, Res NetGAU-CTC, transfer learning, data augmentation, multi-task joint learning framework

PDF Full Text Request

Related items

1	Multi-task Learning Speech Recognition Model And Lightweight Research Of Chinese Civil Aviation Radiotelephony Communication
2	Research On Deep Neural Networks Model Of Radiotelephony Communication Speech Recognition
3	Research On Speech Recognition For Civil Aviation Radiotelephony
4	Research On Speech Recognition Method Of Chinese-English Cross-lingual Civil Aviation’s Radiotelephony Communication
5	Design And Implementation Of Real-time Civil Aviation Speech Recognition Algorithm Based On Deep Learning
6	A Research Of Data Augmentation Method For Transmission Line Insulator Images
7	Research And Application Of Outdoor Weather Image Classification Method Based On CNN And Transfer Learning
8	Research On Algorithm For Recognition Of Vehicle Types Under Multi-Task Learning
9	Research And Application Of Speech Enhancement And Recognition For Urban Traffic
10	Research On Human-computer Interaction Technology Of UAV Based On Speech And Body Motion