| The quality of the public opinion environment is related to the stability and economic development of the country.In recent years,the rapid development of new Internet media technology in China,the way of public opinion communication has also undergone a major change,from the traditional news isometric text to social short text.Studies have found that sina weibo is the main platform of the current our country public opinion originated and spread,and the text information on weibo with short,quick speed,content diversity,timeliness higher almost all social characteristics of short text,accurate to control by sina weibo huge amounts of information on the social platform is an important part of public governance in our country.The current day of sina weibo has been up to hundreds of millions of new tweets,manual monitoring method can only be used as an auxiliary,mainly needs to improve the level of natural language processing in the field of computer science,and the semantic representation of words in the natural language processing is a basic work.At present,the mainstream method of word meaning expression is word embedding technology,in which words are represented as low-dimensional real value vectors.At present,most Chinese word embedding technologies directly follow the English solution,ignoring the differences in structure,semantics and grammar,and are mostly limited to the contextual information of target words.On the basis of the traditional word embedding model,this paper introduces the information of Chinese characters that constitute the contextual words,and proposes a new word expression model CTWE based on the combination of Chinese word structure and topic model.Considering the different language to the target word semantic contribution of up and down the differences,use of the Chinese English writing.As CTWE model,summarized the characters of different semantic processing of words,based on semantic similarity between Chinese characters and words weighted build word vector,and combining the topic model for each word given based on the statistical information,the global joint the two parts information to train word embedded model can be more accurate to use the word internal structure of Chinese characters semantic information and documentation of global information,improve word embedding effect.This paper uses the real sina weibo data to train the model,and compares the effectiveness of the model in three tasks: semantic similarity,analogical reasoning,and text classification.The results show that compared with CBOW,Skip-gram,CWE,and TWE models,the effect of word embedding obtained by CTWE model is improved. |