Font Size: a A A

Multi-Modal Acoustic-to-Articulatory Inversion Based On Speech Decomposition And Auxiliary Feature

Posted on:2022-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhaoFull Text:PDF
GTID:2568307034974069Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Acoustic-to-articulatory inversion(AAI)is to obtain the movement of articulators from speech signals.Until now,achieving a speaker-independent AAI remains a challenge given the limited data.Besides,most current works only use audio speech as input,causing an inevitable performance bottleneck.To solve these two problems,firstly,to adapt the speaker-independent case,we pretrain a speech decomposition network to decompose audio feature into speaker identity embedding and sematic content embedding as the new personalized speech features.Secondly,to learn additional information to further improve the performance of AAI,we propose a novel auxiliary feature network and phoneme estimate network to estimate the non-tongue electromagnetic artigulograph positions(i.e.,auxiliary feature)and the phoneme features from the personalized speech features.Last,we transeform the personalized speech features,auxiliary features and phoneme features to build a multi-modal feature transformation network to enhance the correlation between these parts.Experimental results on three public datasets(i.e.,MNGU0,MOCHA-TIMIT,HASKINS)show that,for speaker-dependent case,the proposed method reduces the average RMSE by 0.25,and increases the average correlation coefficient by 2.0%,compared with the state-of-the-art only using the audio speech feature.More importantly,for speaker-independent case,the average RMSE decreases 0.29 and the average correlation coefficient increase 5.0%.
Keywords/Search Tags:Acoustic-to-articulatory inversion, Speech decomposition, Personalized speech feature, Auxiliary feature, Speaker-independent
PDF Full Text Request
Related items