| In recent years,with the rapid development of speech recognition technology,some intelligent products and services based on speech recognition technology have gradually entered people’s daily life.As the front-end of speech recognition technology,the detection accuracy of language identification directly determines the efficiency and performance of back-end speech recognition.Along with the application of deep learning technology in the area of language identification,the degree of exactitude of language identification has been immensely enhanced.However,there are still some problems to be solved,such as,poor recognition performance in the case of short utterances(i.e.,the length of the utterance is about 1 s)and degradation of recognition accuracy in noisy environment.For the sake of solving the foregoing problems,the related research on short utterances language identification and robust language identification algorithm are carried out in the dissertation,and the language identification model is improved by using time-frequency domain attention mechanism and deformable convolution technique.This dissertation introduces the attention mechanism into the language identification network,designs and implements a short utterances language identification system based on the time-frequency domain attention model for short speech with short time series and insignificant distinction of time-domain features.The system improves the ResNet network by using the joint attention mechanism in time domain and frequency domain,and effectively extracts the distinguishing language information in time domain and frequency domain.Experimental results show that the performance of the above language identification system is respectively improved by 26.85%and 29.05%,compared with the X-vector baseline system and the end-to-end LSTM baseline system when the equal error rate is used as the judgment index.Additionally,for the sake of solving the problem of poor robustness of short utterances language identification system,this dissertation also introduces the deformable convolution to improve the short utterances language identification system based on time-frequency domain attention,and finally designs and implements an endto-end short utterances language identification network with high robustness:DCA-Net.Deformable convolution increases the offset for each sampling point in the convolution kernel,which allows the model to spontaneously bypass local noise interference and automatically find information useful for language classification,thus greatly improving the robustness of the language identification model.The experiments show that the system performance of the proposed DCA-Net respectively improves 46.47%and 47.79%of the equal error rate under noisy test environment compared with the Xvector system and the end-to-end LSTM system under the same conditions. |