Font Size: a A A

Research And Application Of K-core-based Graph Decomposition TextRank Keyword Extraction Technology

Posted on:2023-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2568306800466654Subject:Software engineering
Abstract/Summary:PDF Full Text Request
According to the 48 th "Statistical Report on Internet Development in China",in the first half of 2021 alone,the national online retail sales reached 6,113.3 billion RMB,a year-on-year increase of 23.2%.Behind this set of numbers,not only the prosperity and development of the e-commerce industry,but also the amount of data and information in e-commerce-related fields is also growing explosively,and user review data on e-commerce websites is one of them.In order to help e-commerce sellers with a large amount of review information to save time to discover keywords in product reviews,to improve the competitiveness of their products among peers,and to form positive feedback,this paper takes e-commerce review data as the research object,uses the method of constructing a word graph,combines the attributes of nodes in the graph network,and cooperates with traditional keyword extraction algorithms to carry out research experiments.The main research contents are as follows:A textrank keyword extraction algorithm based on K-Core is proposed.Focusing on the two shortcomings of the traditional Text Rank algorithm,the initial weight of each node is 1 and the excessive dependence on the frequency of words in the process of extracting keywords.In this algorithm,the co-occurrence relationship is used to construct a word graph,K-Core algorithm performs pruning operation to decompose the core and non-core subgraphs in the word graph,so as to obtain important node attribute features.After that,according to the selected and defined 6 features,the score of the corresponding feature of each node is obtained,and finally the weight of each feature is obtained according to the G1 weighting method,so as to comprehensively calculate the initial weight of each node.In this way,not only the attribute characteristics of nodes in the graph are considered,but also each node gets different initial weights,which balances the influence of edge weights(frequency)in the traditional Text Rank algorithm.Using the algorithm in this paper to do several sets of experiments on the self-collected data set,including self-control experiments under the condition of changing parameters and experiments comparing the traditional Text Rank algorithm and TF-IDF algorithm,the results show that when the window size is set to4,the extracted When the number of keywords is set to 22,the effect of the algorithm in this paper is better than the traditional Text Rank algorithm and TF-IDF algorithm,and achieves better results.At the same time,in strict accordance with the software development process,a keyword extraction system based on the algorithm of this paper is designed and implemented,which is convenient for e-commerce sellers to use.In the development process,the idea of separating the front and back is mainly used.The front uses the Vue framework,the back uses the Django framework,and the front and back communication uses the RESTful interface.The Scrapy crawler framework is also integrated in the data collection function and the system is tested.The system can allow users to choose the uploading method of e-commerce review data,and can perform distributed storage and keyword extraction and display of text data,which can effectively improve the reading efficiency of e-commerce sellers and reflect the application value of the algorithm in this paper.
Keywords/Search Tags:keyword, TextRank, K-Core, G1 method, diagram decomposition, E-commerce review data
PDF Full Text Request
Related items