Keywords Extraction Based On News Text

Posted on:2020-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Tao

Full Text:PDF

GTID:2417330578953314

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the advent of the information age,text analysis has become one of the hot topics today.Text analysis mainly extracts meaningful information from massive text data as text features.By analyzing the characteristics of text data,the application and research of text data is realized.Natural language processing is an important way to achieve intelligent text analysis.Among them,keyword extraction is a research hotspot of natural language processing,and it is also the focus of my research.Chinese text analysis mainly achieves text classification,clustering,information retrieval and so on through the representation of text and the extraction of text features.Quantifying the important features of extracting processing from text is the basic work of text analysis.Keywords are important features that text data needs to be processed,and are the basic unit for analyzing text data.Automatic extraction of keywords is the key research object of natural language processing tasks,and has important research significance for text analysis.This article uses the automobile news text as the research data,and extracts the keywords of the automobile news text through the combination of the TextRank graph model and Word2Vec.Use the Chinese word segmentation tool-the Chinese word segmentation for the Chinese corpus.The vocabulary in the text is extracted by fusing the internal structure information of a single document and the word vector relationship of the entire document collection;all the words in the document collection are represented as a dense vector by the Word2Vec model,and the similarity between the vocabularies is represented by the similarity of the vectors degree.Based on the Word2Vec model,the TextRank algorithm is further improved.The candidate keywords are used as vocabulary nodes,and the weights of the lexical nodes are non-uniformly allocated according to the similarity between the lexical nodes and whether there is an adjacent relationship.The weights of the nodes are used to iterate the weights of the nodes,and the node weights are sorted to obtain the required keywords..My main tasks as follows:(1)Divide the text of a given car webpage news text according to the staging method,and get a text set composed of all the different words.(2)Using the Word2Vec model to map the document set to a more abstract word vector space,improve the original TextRank algorithm from the perspective of word semantics,obtain the lexical similarity matrix based on Word2Vec training,and improve the initial weight of the TextRank vocabulary node.Probability transfer matrix,and then keyword extraction.Experiments show that the method of extracting keywords based on the combination of Word2Vec and TextRank algorithm is better,and the accuracy of the traditional TextRank algorithm is improved in terms of accuracy,recall rate and F1 value.

Keywords/Search Tags:

extraction, TextRank algorithm, Word2Vec model

PDF Full Text Request

Related items

1	Research On MOOC Recommendation Algorithm Based On Content And Word2vec
2	Research On Difficulty Prediction Model Of Examination Questions Based On Text Extraction Of Association Information
3	Research On Automatic Grading Algorithm For Essay Questions Based On Yolo V4+Word2Vec
4	The Research Of Recommended Algorithm In Scientific Research Achievement Management System
5	Design And Implementation Of Network Recruitment Data Visualization Analysis System
6	Research On Human Contour Acquisition And Feature Extraction Algorithm In Fall Detection
7	Research On Microblog Topic Sequential Feature Extraction Algorithm Based On LDA-WO Mixed Model
8	The Improved EM Algorithm Based Gaussian Mixture Model Parameters Estimation
9	Research And Application Of Chinese Multi-relation Extraction Based On Fusion Model
10	Research On The Identification Algorithm Of College Students' Mental Health Problems Based On Campus Big Data