Font Size: a A A

A Multimodal Cross-lingual Study Of Attitudinal Speech

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:2435330647457491Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
In this paper,the Chinese putonghua corpus of "willing-unwilling" and "certain-uncertain" attitude is collected by scene induction,and it is used for acoustic feature analysis and multi-modal perception experiments.The research ideas of this paper are as follows:(1)construct the Chinese attitudinal phonetic corpus of "willing-unwilling" and "certain-uncertain".Each attitude consists of 12 target sentences,each of which is triggered by a specific situation.(2)two groups of subjects with different cultural backgrounds were recruited for the listening experiment,including 16 Chinese and 14 koreans with HSK level 5.The modes were audio-only,video-only and audio-video.(3)conduct acoustic analysis on the collected two pairs of Chinese attitudes,extract 16 acoustic parameters related to fundamental frequency,energy,duration and voice,and investigate the corresponding relations between gender,attitude and acoustic characteristics through factor and discriminant analysis.The experimental results are as follows:(1)in terms of countries,the accuracy rate of China is higher than that of South Korea;In terms of mode,the accuracy rate of audio-video is the highest."willing-unwilling" video-only is higher than audio-only,while "certain-uncertain" audio-only is higher than video-only.In terms of attitude,unwilling was more accurate than willing,and the difference between certain and uncertain was smaller.(2)Chinese subjects can obtain a high accuracy rate only by relying on voice information when judging unwilling,certain and uncertain,and a high accuracy rate only by relying on facial expressions when judging willing;South Korean subjects can get a high accuracy rate only by relying on facial expressions when judging willing,unwilling and uncertain,and a high accuracy rate only by relying on voice messages when judging certain.The main cultural differences between the two countries are unwilling and uncertain,with China relying more on voice messages and South Korea more on facial expressions.The acoustic analysis results are as follows:(1)The results of variance analysis show that the speaker will use a higher fundamental frequency and a faster speech speed when expressing his willing,while the speaker will use a lower fundamental frequency,a faster speech speed,a smaller fundamental frequency jitter and a larger amplitude jitter when expressing his unwilling;the speaker will use a lower fundamental frequency when expressing his certain,while the speaker will use a higher fundamental frequency,a faster speech speed,a smaller fundamental frequency jitter and amplitude when expressing his uncertain.(2)factor analysis was carried out on 16 acoustical related parameters extracted from the two pairs of attitudes,and it was found that the four common factors(voice factor,fundamental frequency factor,energy factor and duration factor)of the two pairs of attitudes could contain the information of 73.1%(willing-unwilling)and 75.0%(certain-uncertain).(3)In the discriminant analysis,the acoustic judgment accuracy rate obtained by the willing-unwilling(81.2% vs 79.3% vs 87.1%)and the convinced-doubtful(80.9% vs 79.7% vs 91.1%)was close to the subjective judgment accuracy rate of the Korean group,but significantly lower than that of the Chinese group.This indicates that the Korean group mostly referred to the acoustic parameters in their subjective judgment,while the Chinese group combined their own other cognition to make judgment.
Keywords/Search Tags:attitudinal speech, multi-modal, cross-language, perceptual experiment, acoustic analysis
PDF Full Text Request
Related items