| With the rapid development of information technology,large amounts of unstructured text data emerge on the Internet.Methods for extracting structured knowledge from massive unnormalized text data have become a hot issue in domestic and foreign academic circles.Research on knowledge extraction methods aims to explore theoretical methods for extracting structured knowledge from unstructured text data and design new efficient knowledge extraction algorithms.Knowledge is usually represented and stored in the form of triples composed of entities and relationships between entities,which facilitates knowledge graph construction,retrieval,and exploitation.Hence,knowledge extraction is one important task in natural language processing.Existing knowledge extraction methods mainly include supervised knowledge extraction methods and unsupervised knowledge extraction methods,among which the supervised knowledge extraction methods can be divided into named entity recognition,event extraction and entity relation extraction according to different tasks.Supervised knowledge extraction methods have no clear task splitting.Existing methods are mainly divided into rule-based,distant supervision based,and pre-trained language model based methods according to the techniques used.Based on the research on existing knowledge extraction methods,this thesis proposes a new knowledge extraction method and its main contributions are:(1)An entity relation extraction method with first pattern discrimination is proposed to address the problems of entity redundancy,entity overlap,and error accumulation in question-and answer-based entity relation extraction.This method first identifies all patterns contained in a sentence,and then uses the pattern construction question to guide the extraction of subsequent head entity and tail entity,which addresses the entity redundancy problem.Substring-based entity recognition method is introduced in the entity extraction step.And it incorporates rich attention features not only to improve entity recognition accuracy but also to address the issue of entity overlap;Once extraction is completed,an error filtering module is added to filter the extracted entity-relationship triples,alleviating the error cumulative problem.(2)To address the problems of small amount of knowledge and low quality of knowledge in a knowledge extraction method based on a pre-trained language model,a context-based generative knowledge extraction method is proposed.To adjust the model output distribution to be similar to that in the knowledge base,the method first uses knowledge fine-tuning model;Then,it generates new knowledge in a context-based manner and proposes an improved multivariate beam search in the decoding process.The algorithm improves the amount and quality of knowledge.And finally,a knowledge filtering model is used to filter the extracted knowledge to further improve the knowledge quality.(3)Based on research on knowledge extraction methods,a system integrating new methods is proposed and implemented.Users can extract entity-relationship triples from any given text;they can also discover entity-related knowledge for a specified entity.The system also provides administrators with model training and data management functions to facilitate the migration of the system for specific data and application scenarios,and to improve the flexibility of system. |