With the rise of social networks and Q&A sites,short texts have been becoming the dominating content of Internet,such as web titles,news titles and blog titles.What’s more,messages from Weibo,Zhihu,Twitter and Facebook are also short texts.Hence accurately mining topics behind these short texts is essential for a wide range of tasks,including emerging topic detecting,user interest profiling and content recommendation.The topic model is an effective way to extract the topic information from the text.However,directly applying conventional topic models on short texts will suffer from the severe data sparsity problem.To overcome the shortcomings of conventional topic models,this paper propose a novel way for modeling topics in short texts,referred as Word-network Triangle Topic Model(WTTM).The experimental results demonstrate that our approach performs well on short texts.The main work of this paper is as follows:1)In order to solve the problem that the common word-network can not indicate the intersection of different document sub-networks,a new word-network construction strategy is proposed,which tags each edge with a set of document indexes as the label.2)To overcome the weak semantic association between some word-pairs,this paper proposes a strategy to find specific triangular structure from the word-network.The words in word triangles have a stronger semantic relevance and a stronger subject concentration.3)Regarding word triangle as the basic unit of topic,WTTM is proposed and compared with LDA and BTM.The experimental results show that WTTM outperforms LDA and BTM in short text topic mining.4)Word clique structure is proposed based on the word triangular structure.With the increase of the number of nodes in the word clique,the experimental results have also been improved. |