Font Size: a A A

Research On Document Topic Division Based On Frequent Word Set

Posted on:2022-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:X N QuFull Text:PDF
GTID:2480306317493244Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the era of knowledge economy,knowledge information is appearing in the form of "explosion" in front of the public,and the emergence of redundant information adds a lot of troubles to people.In the field of discipline research,scholars also hope to quickly find the text data that is effective for themselves.However,in the literature database,there are great differences in the research direction and viewpoint description of related fields.In the face of many information,how to detect the theme of a certain field has become an important research direction.With the gradual increase of literature data,the scale of complex network based on text information expands and the complexity increases,which challenges the traditional complex network.Therefore,the improvement of complex network method design to improve the accuracy of topic Division has attracted more and more attention.In order to help scholars quickly grasp the research direction and research focus of related fields in a short period of time,this paper proposes to apply the complex network based on frequent word set to the literature topic analysis,and puts forward some suggestions for the future research angle of researchers,so as to further promote the comprehensive promotion of related research fields.Based on the systematic analysis and elaboration of text mining technology,this paper proposes to mine the frequent word set of literature keywords,and then construct a complex network to realize the division of literature topics.Due to the constant leakage of people’s privacy information,information security has gradually become the focus of attention,so this paper takes "privacy ethics" research topic related literature as an example to analyze its keywords.First of all,this paper takes the data of the literature related to privacy ethics on CNKI and VIP as the research data set,and directly constructs the traditional complex network for the literature keywords with frequency greater than 5;makes descriptive statistics on the traditional complex network,and calculates the number of nodes,number of sides,average degree,diameter,density,average path length and path length in the network Secondly,the traditional complex network is divided into topics according to the algorithm of Louvain community division;finally,six conclusions are obtained under the research field of "privacy ethics" The research directions are as follows: big data background,doctor-patient relationship,privacy rights,genes,ethical anomie caused by new media communication,and Countermeasures for medical privacy ethical problems.Secondly,this paper proposes a complex network construction method based on frequent word sets.This method uses FP growth algorithm to construct complex network with frequent word sets whose support degree is more than 2 in the literature.Referring to the research process of traditional complex network,the paper finally divides the literature related to "privacy ethics" into 9 parts Under this method,the research focus of "privacy ethics" is: medical institutions,privacy dilemma,government data,information,sexual abuse,academic,spiritual,quality training,self-discipline and so on.Finally,the two methods are compared from the network nature and the advantages and disadvantages of community structure division.It is verified that the community division effect of complex network based on frequent word set is better,and it can effectively divide high-quality text community,which is helpful to improve the current literature topic exploration.
Keywords/Search Tags:frequent word set, complex network, topic, community partition, Louvain algorithm
PDF Full Text Request
Related items