Font Size: a A A

Improvement Of Community Discovery Algorithm Of Academic Literature Citation Network Based On Semantic Similarity

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2370330605960662Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Citation network is a kind of network formed by the citation relationship among scientific,it also is one of the most widely used important tool in the field of knowledge discovery.Citation network belongs to a branch of complex network and has the general characteristics of complex network,such as small world and clustering,etc.Citation networks are a tool used by researchers to predict and evaluate hot areas of research,it initially did not involve textual research.Text data,as one of the important information sources,has become possible to introduce text attributes in citation network with the development of artificial intelligence.However,due to the long length of literature,it is often inefficient and unnecessary to conduct text analysis on all the contents of each literature.Academic literature has the characteristics of normative structure and clear features,the title,abstract and key words can reveal the general theme of the article.Therefore,in the text analysis of citation network,the text composed of the above three elements can be used as the text attribute of the document to obtain the subject information of the article.In the text feature analysis of citation network,the short text composed of title,abstract and keywords.Due to the lexical matrix sparseness of the short text,the traditional BOW+TFIDF+VSM/LSA system or neural network model is not ideal in the text feature analysis of citation network.Therefore,this paper proposes a model of citation network based on semantic similarity with the text mining and community detection technology,according to the semantic relations and the citation reference relationship,combining the position of the vocabulary and the document information,build ding academic literature citation network based on lexical semantic similarity.Using Glo Ve model on vocabulary to quantify to make full use of the semantic information of words,WMD method measuring the similarity between documents,the document similarity transform into the optimal solution of the linear programming problem with constraint conditions,which can effectively avoid the information loss caused by excessive sparse matrix and semantic distortion caused by using neural network model training vocabulary to vector.Based on the above algorithm,the semantic content of the text and the structural characteristics of the network are considered comprehensively,and the edges in the network are weighted.The community discovery experiment of weighted citation network is carried out by Louvain society algorithm.In the experiment,this paper selects the Web of Science literature in the database as the original data to verify this model,citation network without weight,citation network based on BOW+TFIDF+VSM and citation network based on semantic similarity,are analyzed from two aspects of quantitative and qualitative,the results show that the improvement of the effect is obvious.
Keywords/Search Tags:citation network, semantic weighting, community division, text mining, word embedding
PDF Full Text Request
Related items