Font Size: a A A

Research On Social Network Media Hotspot Mining Algorithm Based On Distributed Computing

Posted on:2022-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:C L WuFull Text:PDF
GTID:2518306740483064Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of social network media,more and more people are used to Weibo,twitter,Facebook and various forums as platforms for expressing opinions,attitudes and comments on various events.On different platforms,hundreds of millions of text information are published every minute.The spread speed of various emergencies and hot news on these platforms is much faster than that of traditional media.So,how to mining and analyzing these texts quickly and finding out the current hot spots in real time has become a very valuable research direction.This thesis mainly focuses on the research and improvement of topic mining algorithm in social network media,and improves the efficiency of the algorithm through distributed computing platform.This thesis analyzes the difficulties of hot spot mining in social network media.Based on the advantages and disadvantages of traditional short text hot spot modeling methods,and to ensure the real-time and effectiveness of the method,a key hot word mining algorithm based on SIFRANK and a co-occurrence hot word search algorithm based on word co-occurrence model are proposed,and the parallelized acceleration algorithm based on the Spark platform.These algorithms improve the explanability of the output topic and the effect of text mining in social network media,which can provide effective data support and guarantee for the downstream data analysis and public opinion supervision.The main contents of this thesis are as follows:(1)In the context of social network media,there are some problems in the text,such as lack of information,sparse data and high feature dimension.In order to overcome these problems,this thesis improves the heat evaluation index,and adds heat weight and new word discovery module on the basis of sifrank algorithm,which is effective in short text keyword extraction task,This thesis proposes a method HSIFRANK,which can extract key hot words from short text efficiently.The effect of HSIFRANK is no worse than that of SIFRANK,and it is 5 to 8 times faster than that of SIFRANK..(2)According to the influence of user's social relationship,topic heat change,online water army,commercial account and so on in social network media on hot topic mining,this thesis designs heat weight model,and proposes an improved algorithm HCH for traditional cooccurrence word model.It uses sliding window algorithm,HSIFRANK,PWMI,user features,text features,heat change rate and so on to get hot topic heat ranking list,It can ensure the reliability and explicability of hot topics,and effectively capture hot transition topics and hot sub topic.(3)Using the pyspark distributed computing framework,the original single machine algorithm is improved to the operational algorithm in distributed environment,which accelerates the operation of the algorithm and improves the efficiency of the algorithm.(4)Four different datasets of social network media are used to test the performance of the improved algorithm in refining short text key hot words,the advantages and disadvantages of hot topic mining,and the acceleration ratio compared with the single-machine algorithm.Experimental results show that compared with the benchmark algorithm,the improved algorithm performs better in running speed,hot word recall,output topic interpretability and reliability,hot transition topic mining and other aspects.
Keywords/Search Tags:Text Mining, Short Text Topic Mining, Hot Spot Discovery, Co-occurrence Word Model
PDF Full Text Request
Related items