With the continuous rise and development of the Internet,mainstream platforms such as Weibo and We Chat all support the display of Tibetan language.Tibetan opinion information is growing rapidly on the Internet.A large number of Tibetan social media short texts have become an important part of Internet users’ opinion information.The analysis and processing of Tibetan viewpoints plays an important role in strengthening network security management and promoting scientific government decision-making.The construction of Tibetan corpus resources is relatively lagging behind,which has led to many challenges in Tibetan text research.In order to improve the accuracy of Tibetan sentiment classification,this thesis proposes a Tibetan-Chinese cross-language sentiment analysis model.With the help of rich Chinese corpus resources,the knowledge correlation between Tibetan and Chinese bilinguals is constructed,and the cross-language sentiment classification technology is used to realize the sharing of Tibetan and Chinese characteristic resources.In this way,the technical problems caused by the lack of Tibetan text resources can be solved to a certain extent.The main work of this thesis is as follows:First,build a Tibetan-Chinese bilingual sentiment database based on short texts on social media.The short texts of comments in Tibetan and Chinese languages on social media platforms are used as raw data,and preprocessing operations such as cleaning,removing stop words,tagging,and word segmentation are carried out on the corpus,and they are standardized and stored in the database.Second,a collaborative training algorithm is introduced into the Tibetan-Chinese cross-language sentiment classification task,and a cross-language sentiment classification model based on semi-supervised collaborative training is constructed.The balanced Tibetan-Chinese bilingual dataset is regarded as two different views for bilingual collaborative training,and the problem of lack of emotional resources and insufficient labeled samples in Tibetan is solved with the help of abundant labeled data in Chinese.The experimental results show that the use of collaborative training algorithm can enhance the learning ability of Tibetan sentiment classifier for unlabeled samples.Third,introduce adversarial network to improve the effect of Tibetan-Chinese cross-language sentiment classification.Using Chinese-Tibetan bilingual word vectors to map the two languages to the same shared space,use the language adversarial network to learn the joint features of Chinese and Tibetan,share the emotional knowledge of Chinese and Tibetan,and build a Tibetan-Chinese cross-language sentiment classification model based on adversarial network.,in the case of a small number of Tibetan emotional annotation sets,it can achieve better results.Fourth,a Tibetan-Chinese cross-language sentiment analysis algorithm based on an end-to-end method is proposed.Based on adversarial network,the model is improved with the end-to-end method,and the unsupervised end-to-end strategy is adopted to model sentences in Tibetan and Chinese language pairs.The gap between the two languages is eliminated by calculating the probability of language pairs,and the problem of insufficient annotated corpus is solved. |