| Microblog has become the main community for Internet users to share short and real-time information on the Internet.The multi-source heterogeneous information representation and the extremely low threshold for entry and communication mean that it has become the core site for information dissemination and public sentiment fermentation.Therefore,the research on microblog information flow becomes very necessary.This study mainly proposes two tasks:(1)By introducing the concept of head word,this paper proposes an improved LDA topic model in order to obtain better topic distribution effect in microblog hot topic discovery.The research is based on microblog hot data,firstly,text representation learning was carried out.The models based on Bert and Word2 Vec were set as the experimental groups,and the models based on TF-IDF and BOW were set as the control groups.Finally,the experimental groups and the control groups generated improved LDA models and traditional LDA models respectively.By comparing the traditional LDA model with the improved LDA model,it is found that the LDA model generated by the improved method is better than generated by the traditional method in terms of the distribution concentration of high-frequency words,and is more suitable for the generation of hot topics in downstream task applications.(2)On the basis of hot topic generation,this paper proposes an improved sentiment analysis model based on ABSA,and obtains the sentiment polarity distribution of each type of topic.When giving different weights to local context features,the research uses the semantic distance-based weight decay method SCDW to replace the location based decay method CDW in the original methodology,in order to balance the high risk of CDM and the weak effect of CDW,and obtain an efficient and stable model.By setting LSA-S-ME,LSA-S-DE in the control group and SLCF in the experimental group,the experiment finally found that the AUC effect of SLCF was close to CDM at the peak and better than CDW on the whole.In addition,the research also proposes a probability calibration work under the polarity binary classification,which realizes the quantification of the polar probability,and calculates the polarity value of the text based on the keywords.Finally,for a large number of microblog data,after data cleaning,word segmentation and stop words,and text representation learning,under two consecutive jobs,this research finally refined and generated a corpus based hot topic classification and corresponding sentimental polarity distribution. |