| With the rapid development of social networks,Instant Messaging Systems have become an indispensable communication tools in people's daily life.From the early online chat rooms such as ICQ,MSN,and QQ,WeChat possess huge users,and DingTalk,WeChat for company and so on.With the increasing functionality of chat software and the increasing diversity of contents in group chats,research on group conversation contents has increased in recent years.From the group chat data,not only can the user's chat behavior be analyzed,the user's gender and age range can be judged,but also the hot topic in the group chat can be found.Based on the basis of the public opinion analysis and public opinion warning works,it is of Great significance for network security.According to the analysis,the group chat text has the problems of short content,serious colloquialism,grammatical structure irregularity.Considering that the traditional probabilistic topic model is only suitable for long text.This thesis first Based on the Twitter-LDA algorithm which analyzes the microblog text,and then combined with the characteristics of group chat in terms of time,user and group profile,and then this thesis proposes a group chat theme mining based on Twitter-LDA(MTLB-GCTM,Modified Twitter-LDA based Group Chat Topic Mining)model.The MTLB-GCTM model is an extension of the traditional probabilistic topic model.It has the disadvantages of shallow feature structure and probabilistic generation.Integrating deep neural networks into the topic modeling process helps to construct a deep topic feature representation model.Therefore,based on the existing research based on deep learning language model,this thesis proposes a GRU and improved Twitter-LDA based group chat topic mining(GMTL-GCTM,GRU and Modified Twitter-LDA based Group Chat Topic Mining)model.The model not only can dig deeper topic features,but also preserves the advantages of traditional probabilistic topic models that capture global semantics.This thesis tests on real group chat data and evaluates the model by using confusion indicators,manual evaluation indicators,and point-to-point mutual information.By setting up two sets of experimental comparisons,and the results verify the MTLB.The validity of the-GCTM model and the GMTL-GCTM model,while the GMTL-GCTM model can achieve better subject semantic coherence than the MTLB-GCTM model. |