| With the rapid development of the field of Natural Language Processing(NLP),the number of related academic papers is increasing.People often obtain research content in this field by searching papers in literature databases,the way to sort out information such as knowledge context,research trends,and interrelationships between studies.In this way,the topics are mixed,it takes a lot of time to filter and sort out,and the complex relationships reflected in the sorting results cannot be preserved by existing storage technologies.Knowledge graph has powerful semantic processing and interconnection organization capabilities.Through information extraction technology,valuable entity information and information relationships can be extracted from unstructured or semistructured information,and at the same time,it can be stored and displayed in the form of graph network structure which can show the relationship between different entities.Combining knowledge graphs with other tasks to solve the needs of a specific scenario has become a popular choice.In view of the above-mentioned problems,this paper constructs a knowledge graph in the field of natural language processing based on Chinese academic papers in the field of natural language processing on CNKI.In addition,in-depth research was conducted on the problems of low recognition accuracy of key terms of papers and poor classification effect of papers in the process of knowledge graph construction,and new entity recognition and relation extraction algorithms,as well as ontology definitions and storage schemes for NLP were proposed and implemented.The specific work is as follows:1.The definition of the conceptual model of the pattern layer in the field of natural language processing and the construction of the data set.Based on the theoretical knowledge of ontology and combined with the characteristics of natural language processing,this paper defines the conceptual model of knowledge graph pattern layer in the field of natural language processing for the first time.In addition,by designing a crawler system based on the Scrapy library,this paper crawled 18,345 journal papers from CNKI,and constructed an NLP paper dataset based on conceptual model annotation.2.Aiming at the problem of low entity recognition accuracy of key terms in academic papers in the process of knowledge graph construction,this paper proposes a new word discovery model I-BERT-BiLSTM-CRF.Since the traditional new word recognition method based on the amount of statistical information has a poor effect on the recognition of lowfrequency new words,this paper enhances the recognition effect of lowfrequency new words by adding a deep learning algorithm;in addition,this paper introduces the BERT model for word embedding to enhance the information expression ability of the text and improve the accuracy of the new word discovery algorithm.3.In the traditional paper classification task,only the single feature of is used,resulting in poor paper classification effect.This paper proposes a feature fusion text classification algorithm.Most of the traditional paper classification is based on the single feature of the paper abstract,ignoring the important information contained in other information such as titles and keywords.By defining the obtained paper information into two categories of features:natural language features and label features,and considering the complementarity and coordination between different features,three kinds of features and seven kinds of classification algorithms are combined to form twenty one kinds of classification models.These models are used as comparison objects to verify the effectiveness of the feature fusion text classification algorithm proposed in this paper,and complete the classification work of the research task of the paper.4.Construction of knowledge map and application system for natural language processing.Based on I-BERT-BiLSTM-CRF new word discovery algorithm and feature fusion text classification algorithm proposed in this paper,the triplet data extraction in the field of natural language processing is completed,and 18,729 entities and 245,974 triplet data are finally obtained.In this paper,Cypher statements are used to store the obtained triplet data into the Neo4j graph database.An application system with four modules of paper knowledge retrieval,spatio-temporal knowledge retrieval,key term extraction and paper classification is designed and implemented by using front and back end programming technology.The validity and practical application value of the proposed algorithm are verified.This paper uses the data set constructed in 1 to conduct experiments,and based on the schema layer conceptual model defined in 1,the extraction of knowledge graph triples is completed through the algorithms proposed in 2 and 3.Several sets of comparative experiments are designed,and the effectiveness of the algorithm optimization proposed in this paper is verified by comparing the evaluation indicators such as accuracy rate,recall rate and F1 value.Finally,the knowledge map and application system of natural language processing are constructed through 4. |