Font Size: a A A

Keywords Extraction Based On News Text

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:J TaoFull Text:PDF
GTID:2417330578953314Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the information age,text analysis has become one of the hot topics today.Text analysis mainly extracts meaningful information from massive text data as text features.By analyzing the characteristics of text data,the application and research of text data is realized.Natural language processing is an important way to achieve intelligent text analysis.Among them,keyword extraction is a research hotspot of natural language processing,and it is also the focus of my research.Chinese text analysis mainly achieves text classification,clustering,information retrieval and so on through the representation of text and the extraction of text features.Quantifying the important features of extracting processing from text is the basic work of text analysis.Keywords are important features that text data needs to be processed,and are the basic unit for analyzing text data.Automatic extraction of keywords is the key research object of natural language processing tasks,and has important research significance for text analysis.This article uses the automobile news text as the research data,and extracts the keywords of the automobile news text through the combination of the TextRank graph model and Word2Vec.Use the Chinese word segmentation tool-the Chinese word segmentation for the Chinese corpus.The vocabulary in the text is extracted by fusing the internal structure information of a single document and the word vector relationship of the entire document collection;all the words in the document collection are represented as a dense vector by the Word2Vec model,and the similarity between the vocabularies is represented by the similarity of the vectors degree.Based on the Word2Vec model,the TextRank algorithm is further improved.The candidate keywords are used as vocabulary nodes,and the weights of the lexical nodes are non-uniformly allocated according to the similarity between the lexical nodes and whether there is an adjacent relationship.The weights of the nodes are used to iterate the weights of the nodes,and the node weights are sorted to obtain the required keywords..My main tasks as follows:(1)Divide the text of a given car webpage news text according to the staging method,and get a text set composed of all the different words.(2)Using the Word2Vec model to map the document set to a more abstract word vector space,improve the original TextRank algorithm from the perspective of word semantics,obtain the lexical similarity matrix based on Word2Vec training,and improve the initial weight of the TextRank vocabulary node.Probability transfer matrix,and then keyword extraction.Experiments show that the method of extracting keywords based on the combination of Word2Vec and TextRank algorithm is better,and the accuracy of the traditional TextRank algorithm is improved in terms of accuracy,recall rate and F1 value.
Keywords/Search Tags:extraction, TextRank algorithm, Word2Vec model
PDF Full Text Request
Related items