Font Size: a A A

Text Summarization And Keyword Extraction Based On Complex Network

Posted on:2022-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2480306335456764Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
There are many problems hidden behind the rich data resources extracted from the Internet,such as information overload,difficult to locate the key points and so on.Under this background,how to extract the key information of the text,locate the key points from the huge information system,read the text information and grasp the key information has become the focus of people's attention.Therefore,information extraction technology related to natural language processing has attracted more and more attention.In this paper,text keyword extraction and topic sentence extraction are studied based on text mining,complex network and other related knowledge.The main research and work are as follows(1)Keyword extraction: Based on the idea of complex network,this paper proposes an improved Chinese keyword extraction algorithm C-Text Rank.The algorithm is based on the word co-occurrence network of the text,introduces the structural characteristics of the complex network to calculate the node probability transfer matrix,obtains the initial weight of the node through iterative calculation,and fuses the weight with the node location features to obtain the final weight.According to the final weight,the node importance is sorted,and the top k nodes are selected as the keywords of the text.Experiments show that,compared with the traditional Text Rank and TF-IDF algorithm,the C-Text Rank algorithm in this paper has better performance in keyword extraction.(2)Abstract extraction: Based on the idea of community division,this paper proposes an improved Abstract extraction method.Firstly,word2 vec algorithm is used to extract text features to construct sentence vector,and cosine value between vectors is calculated to construct weighted undirected text network;secondly,community partition algorithm is used to divide the network,and the text is divided into multiple communities,each sub community represents a sub topic;thirdly,taking each sub topic as the research object,textrank algorithm and sentence word relation are integrated The sentence importance score function is constructed to extract the top sentences of each community as the text summary.Finally,compared with Text Rank algorithm and TFIDF algorithm,the experimental results show that the proposed algorithm has better performance in abstract extraction.
Keywords/Search Tags:Keywords extraction, Abstract extraction, Complex network, Community partition, Textrank algorithm
PDF Full Text Request
Related items