Research On Short Utterance Language Recognition Based On Time-Frequency Domain Attention And Deformable Convolution

Posted on:2022-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Zhang

Full Text:PDF

GTID:2558306914462764

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of speech recognition technology,some intelligent products and services based on speech recognition technology have gradually entered people’s daily life.As the front-end of speech recognition technology,the detection accuracy of language identification directly determines the efficiency and performance of back-end speech recognition.Along with the application of deep learning technology in the area of language identification,the degree of exactitude of language identification has been immensely enhanced.However,there are still some problems to be solved,such as,poor recognition performance in the case of short utterances(i.e.,the length of the utterance is about 1 s)and degradation of recognition accuracy in noisy environment.For the sake of solving the foregoing problems,the related research on short utterances language identification and robust language identification algorithm are carried out in the dissertation,and the language identification model is improved by using time-frequency domain attention mechanism and deformable convolution technique.This dissertation introduces the attention mechanism into the language identification network,designs and implements a short utterances language identification system based on the time-frequency domain attention model for short speech with short time series and insignificant distinction of time-domain features.The system improves the ResNet network by using the joint attention mechanism in time domain and frequency domain,and effectively extracts the distinguishing language information in time domain and frequency domain.Experimental results show that the performance of the above language identification system is respectively improved by 26.85%and 29.05%,compared with the X-vector baseline system and the end-to-end LSTM baseline system when the equal error rate is used as the judgment index.Additionally,for the sake of solving the problem of poor robustness of short utterances language identification system,this dissertation also introduces the deformable convolution to improve the short utterances language identification system based on time-frequency domain attention,and finally designs and implements an endto-end short utterances language identification network with high robustness:DCA-Net.Deformable convolution increases the offset for each sampling point in the convolution kernel,which allows the model to spontaneously bypass local noise interference and automatically find information useful for language classification,thus greatly improving the robustness of the language identification model.The experiments show that the system performance of the proposed DCA-Net respectively improves 46.47%and 47.79%of the equal error rate under noisy test environment compared with the Xvector system and the end-to-end LSTM system under the same conditions.

Keywords/Search Tags:

language identification, short utterances, attention mechanism, deformable convolution

PDF Full Text Request

Related items

1	Surface Defect Detection Based On Deformable Convolution And Attention Mechanism
2	Research On Text Classification Method Combining Attention Mechanism And Bi-GRU
3	Research On Language Identification Method Based On Convolutional Network And Attention Mechanism
4	Research On Pedestrian Trajectory Prediction Algorihm Under Attention Mechanism
5	Short Text Classification Algorithm Based On Temporal Convolution And Attention Mechanism
6	Research On Personalized Recommendation Based On Double Attentional Deformable Convolutional Network
7	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
8	Research And Implementation Of Small Target Detection Method In Images Based On Deep Learnin
9	Research On Ground Time-sensitive Target Pose Estimation Based On Monocular Vision
10	Image Quality Assessment Based On Deformable Convolutional Neural Networks With Gradient Fusion And Bilinear Attention Mechanism