Font Size: a A A

Chinese Phonetic Prosodic Boundary Recognition Based On Acoustic Feature Engineering

Posted on:2022-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:X X WeiFull Text:PDF
GTID:2505306494479924Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech prosody boundary recognition refers to the automatic discrimination of the prosodic boundaries of the input speech,including prosodic words,prosodic phrases,intonation phrases,etc.Prosody boundary recognition based on acoustic features is of great significance to the fields of natural language understanding and speech synthesis.In terms of speech semantic understanding,the acoustic features directly express the pause and rhythm of the speaker,and the correct prosodic boundary recognition can correctly understand the semantics.On the other hand,the naturalness of current synthesized speech needs to be improved.Prosodic boundary recognition and labeling based on acoustic features are indispensable for a high-quality synthetic corpus with high naturalness.In addition,in the current Chinese prosody boundary recognition research,most of the research ignores the minor prosodic phrase boundary,and the acoustic features at the boundary are not obvious,which leads to the poor recognition effect of the prosody boundary.Therefore,in view of the problems in Chinese prosody boundary recognition,this paper uses feature engineering methods to study Chinese prosody boundary recognition based on acoustic features.First of all,from the perspective of feature selection,this paper investigates the acoustic features related to the current prosodic boundary,and conducts a statistical analysis of these related acoustic features through an open corpus,and then compares with the current research work to find out the boundary related to the prosodic boundary.Secondly,in view of the influence of the vowel structure on the vowel duration in the identification of the Chinese prosody boundary,from the perspective of feature construction,a normalized vowel duration model based on the vowel structure is proposed to combine the actual vowel duration characteristics and the vowel structure characteristics to construct A new normalized vowel duration feature and the long and short-term memory(LSTM)network model is used to model the prosody boundary recognition.Finally,by comparing the prosodic boundary recognition results under different feature sets,it is found that compared with the actual vowel duration feature,the constructed normalized vowel duration feature improves the F-Score by 5.9%in the secondary prosody boundary recognition.At the same time,in the recognition of prosodic word boundaries,main prosodic phrase boundaries and intonation group boundaries,F-Score increased by 1.4%,1.8% and 0.8%,respectively.Thirdly,after expanding and perfecting the feature set,the PCA-LDA dimensionality reduction algorithm is introduced to the feature extraction part in view of the dimensionality disaster caused by high-dimensional features to the recognition model.At the same time,considering that the LSTM network model may lose part of the key information in the recognition of the prosody boundary,this paper introduces an attention mechanism for the network model.The final prosodic boundary recognition results show that compared with the feature set before dimensionality reduction,the prosodic boundary recognition F-Score corresponding to the feature set after dimensionality reduction is improved by 14.9% overall,and the F-Score on the boundary of the minor prosodic phrase is improved by 4.2%.In addition,comparing the original network model and the network model after introducing the attention mechanism,it is found that the prosody boundary recognition F-Score in improved network model has increased by an average of2.5%.Finally,it summarizes all the work carried out in this thesis around the recognition of prosodic boundaries,and briefly analyzes some areas that still need to be improved and improved in this paper.
Keywords/Search Tags:Chinese prosody boundary, vowel duration, vowel structure, feature engineering, attention mechanism
PDF Full Text Request
Related items