| Keywords are words that summarize the main idea of the document and allow readers to quickly understand the topic of the document.At present,mainstream keyword extraction methods only consider the statistical information of words in documents,and tend to give higher weight to high-frequency words.Or simply calculate the distance between word embedding and document embedding based on word embedding.The closer the distance,the higher the similarity of words,and the greater the probability of being recognized as keywords.However,the performance of these methods is easy to be limited by the text length,and they fail to make use of the external knowledge of the fields,attributes and associations involved in the words,so as to build a more effective keyword extraction model for short text.In view of the current situation that the existing keyword extraction models generally do not utilize external knowledge,this paper does the following research:(1)A method of keyword extraction of abstract based on knowledge integration is proposed.Firstly,based on knowledge network and Ciilin,the vector representation of words in knowledge base is obtained by using knowledge graph embedding model.Based on the traditional keyword extraction method Text Rank,the vector representation of words in the knowledge graph is used to calculate the semantic similarity between words,construct the semantic word graph of documents,and generate the semantic matrix.Combined with the co-occurrence matrix based on the co-occurrence relation between words,a new probability transition matrix is generated.The newly obtained probability transfer matrix is used for iterative calculation to obtain the final weight of candidate words,and keyword extraction is carried out according to the weight of candidate words.(2)A knowledge-based keyword extraction method combining global similarity and local importance of short text is proposed.Firstly,the vector representation of words is obtained by pre-training language model,and the vector representation is fused with its representation in knowledge graph to enhance its semantic representation.The semantic similarity between words and the whole document is calculated to obtain the global similarity of words.Then a graph is constructed based on the document.The candidate words are taken as vertices.The vertices are connected by edges,and the weight of edges is the semantic similarity between words.Combined with the position weight of words,the local importance of words can be obtained.Considering the global similarity and local importance of the candidate words,the weight of the candidate words is scored.The candidate words with the top score are the keywords.This paper conducts experiments on a large number of Chinese paper abstracts and Sem Eval2010 data sets.The first method improves the performance by 6.3% compared with Text Rank method,and the second method improves by 9.9% compared with Topic Rank,which has the best performance,demonstrating the effectiveness of integrating external knowledge into the short-text keyword extraction method. |