| Speech is one of the basic carriers of information interaction between human beings and their surroundings,and plays an important role in human activities.With the development of social modernization,intelligent speech technology,including speech recognition,has gradually become a research hotspot.However,in the real scene,there are various types of noise interference and the signal-to-noise ratio is unknown.How to effectively improve the performance of intelligent speech algorithms in complex noise scenes is a hot issue at present.Speech enhancement technology is used to suppress noise,so it is usually used as the front end of intelligent speech algorithm.Language identification refers to the process of automatically determining the language type of speech,and it is the key front-end technology of intelligent speech algorithm in multilingual environment.However,the speech enhancement and language identification technology based on deep learning still has the following challenges:(1)The speech enhancement model is difficult to adapt to the unknown noise type and signal-to-noise ratio;(2)When the training and testing conditions do not match,the generalization and robustness of the language identification algorithm are poor;(3)The model performs poorly on lowresource language data.(4)There is a mismatch between the speech enhancement target and the downstream task.Based on speech enhancement,this thesis takes speech enhancement as an entry point to study language recognition technology and its application in complex environments.Aiming at the above problems,a corresponding solution is proposed.The main contributions are as follows:(1)Aiming at the adaptive problem of noise and signal-to-noise ratio,this thesis proposes a speech enhancement algorithm based on dynamic selection kernel mechanism.By giving different weights to convolution modules corresponding to convolution kernels with different receptive field sizes,the receptive field size of convolution layer is dynamically adjusted,so that the model can adapt to different noise types and signal-tonoise ratio environments.In order to guide the model towards the goal optimization,this thesis introduces the relay supervision loss mechanism to further improve the model performance.(2)Aiming at the generalization of language identification,this thesis proposes a language identification method based on lexical and grammatical features.Experiments show that it has better generalization performance than other methods based on underlying acoustic features.At the same time,in order to solve the problem of poor accuracy of language identification with low resources,this thesis uses the speech representation learned from large-scale unlabeled data by self-monitoring model to further improve the accuracy of language identification.Aiming at the problem that the distortion introduced by speech enhancement leads to the decline of language identification performance,this thesis explores the way of combining noisy speech and enhanced speech,which further improves the performance of language identification model in noisy environment.(3)Finally,based on the algorithm,this thesis designs a language identification system for civil aviation land-air communication.In the system,a message batch processing framework is proposed to solve the problem of low computational efficiency of model deployment based on deep learning,which effectively improves the throughput of the system. |