Multi-Modal Acoustic-to-Articulatory Inversion Based On Speech Decomposition And Auxiliary Feature

Posted on:2022-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:L X Zhao

Full Text:PDF

GTID:2568307034974069

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Acoustic-to-articulatory inversion(AAI)is to obtain the movement of articulators from speech signals.Until now,achieving a speaker-independent AAI remains a challenge given the limited data.Besides,most current works only use audio speech as input,causing an inevitable performance bottleneck.To solve these two problems,firstly,to adapt the speaker-independent case,we pretrain a speech decomposition network to decompose audio feature into speaker identity embedding and sematic content embedding as the new personalized speech features.Secondly,to learn additional information to further improve the performance of AAI,we propose a novel auxiliary feature network and phoneme estimate network to estimate the non-tongue electromagnetic artigulograph positions(i.e.,auxiliary feature)and the phoneme features from the personalized speech features.Last,we transeform the personalized speech features,auxiliary features and phoneme features to build a multi-modal feature transformation network to enhance the correlation between these parts.Experimental results on three public datasets(i.e.,MNGU0,MOCHA-TIMIT,HASKINS)show that,for speaker-dependent case,the proposed method reduces the average RMSE by 0.25,and increases the average correlation coefficient by 2.0%,compared with the state-of-the-art only using the audio speech feature.More importantly,for speaker-independent case,the average RMSE decreases 0.29 and the average correlation coefficient increase 5.0%.

Keywords/Search Tags:

Acoustic-to-articulatory inversion, Speech decomposition, Personalized speech feature, Auxiliary feature, Speaker-independent

PDF Full Text Request

Related items

1	Speaker independent acoustic-to-articulatory inversion
2	Research On The Speech Emotion Recognition Fusing Articulatory And Acoustic Features
3	A Study On Acoustic-to-articulatory Inversion Based On Feature Transformation Fusion And Attention Mechanism
4	Text-independent Speaker Recognition Method And System Based On Spatial Distribution Of Speech Features
5	Research And Implementation Of Chinese Speech Synthesis System Based On Articulatory Feature
6	Research On Articulatory Feature-based Speech Recogniton
7	Research On Dual-modal Anti-noise Feature Extraction Of Fuzzy Speech
8	Research On Speech Emotion Recognition Methods
9	The Research Of Front-end Processing Technology Based On The Speaker-independent Speech Recognition
10	The Research And Application Of Text-Independent Speaker Recognition Technology