Font Size: a A A

Research On Chinese Named Entity Recognition And Relation Extraction For Wheat Pest And Disease

Posted on:2024-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:D M ZhangFull Text:PDF
GTID:2543307088992149Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Wheat is one of the most important food crops in the world.China attaches great importance to the development of the wheat cultivation industry.Wheat cultivation process faces many challenges.Various pests and diseases are one of the most important factors affecting the yield and quality of wheat.Effective control measures can facilitate the development of the wheat cultivation industry.Currently,the majority of wheat pest and disease data is often stored in web pages,books and documents in the form of unstructured text,with different ways of data representation,organization,management,and storage methods,resulting in a fragmented and chaotic state of data in this field.Moreover,unstructured text is difficult to intuitively depict the relation between important information,it is not conducive to the efficient use of knowledge,it is also not conducive to knowledge sharing and dissemination,and it is difficult to meet the deeper and more fine-grained information needs of wheat production managers for pest and disease knowledge.Knowledge extraction is an effective method for managing unstructured data.Its purpose is to extract relevant entity pairs and their semantic relation information from unstructured data.It plays an important role in the fields of knowledge graph,knowledge retrieval,knowledge base Q&A.In order to improve the effectiveness of wheat pest and disease knowledge organization,this paper conducts named entity recognition and relation extraction research using unstructured text based on wheat pest and disease knowledge data,proposes remote supervised and supervised wheat pest and disease knowledge extraction models,and implements visual display of wheat pest and disease knowledge.The main work accomplished in this paper is as follows:(1)Chinese entity relation extraction of wheat pest and disease based on remote supervision.To address the lack of datasets in the field of wheat pest and disease,based on the idea of remote supervision,a Chinese entity relation extraction dataset for wheat pest and disease,Wheat CRE,was constructed using the relevant triplets in two external knowledge bases,CN-DBpedia and Ownthink,to match and align with unstructured text knowledge of wheat pest and disease.The dataset contains six relationship categories,and combined with manual correction.Secondly,the text features of the dataset are further analyzed.In order to reduce the impact of noise words in the sentence on the model,a single label sentence-level entity relation extraction model BE-CRE,which combines BERT and entity representation,is proposed.This model obtains dynamic character representation based on BERT.In order to enrich the text features,target entity representation is further integrated,making full use of the specific meaning implied by Chinese named entities for wheat pest and disease to provide additional effective information for the model,improving the effectiveness of the model in relation extraction tasks.Compared with current mainstream models,BE-CRE performs best on Wheat CRE,with F1-M value of 89.29%.Moreover,BE-CRE performs better on the character relation extraction dataset Character CRE than the comparison model,with F1-M value of 78.31%,which proves that the model has certain generalization.(2)Supervised Chinese named entity recognition and relation extraction for wheat pest and disease.Remote supervision relying on external knowledge bases can effectively solve the problem of data set scarcity,but it relies on the support of external knowledge bases and knowledge mining is insufficient.In order to deeply explore the knowledge contained in the unstructured text of wheat pest and disease,based on the previous research,first of all,combined with the guidance of domain experts and the summary of existing research in the agricultural field,in-depth analysis of wheat pest and disease text knowledge was carried out,with 21 entities and 18 relation categories defined in fine granularity,and manually labeled,The wheat pest and disease named entity recognition dataset Wpd CNER and the relational extraction dataset Wpd CRE were constructed respectively.Aiming at specific problems such as uneven distribution of entity categories,fuzzy entity boundaries,and nested entities caused by the increase in entity categories,a named entity recognition model WPD-RA that combines ALBERT-Bi LSTM-CRF with rules is proposed.The F1 value recognized by this model on Wpd CNER is 95.29%,which is superior to current mainstream models.Aiming at the problem of entity relation overlap caused by the further diversification of entity relation categories,a relation extraction model WPD-BBAE,which combines BERT-Bi LSTM-Attention and entity representation,is proposed.It uses Attention to dynamically assign weights and enhance the importance of keywords in sentences.This model performs best on Wpd CRE,with a F1-M value of 90.44%.(3)Visual display of wheat pests and diseases knowledge.Based on the knowledge extraction method established in the previous work,firstly,named entity recognition and relation extraction were performed on unstructured training corpus,obtaining a structured entity relation triplet of wheat pests and diseases.After the triplet de-duplication processing,a total of 2684 entities and 5668 relations were obtained,achieving fine-grained mining and structured integration of wheat pests and diseases knowledge.Secondly,the processed triplet data is stored through the graph database Neo4 j,achieving a visual display of wheat pest and disease knowledge,providing a way for obtaining and sharing of wheat pest and disease knowledge.This paper proposes remote supervised and supervised named entity recognition and relation extraction models for wheat pests and diseases based on unstructured text data,which provide technical support for fine-grained structured organization of knowledge in the field through structured integration of scattered and unordered domain data,and can play a role in building knowledge graph in the field,intelligent knowledge retrieval,intelligent question answering,and other in-depth knowledge utilization.
Keywords/Search Tags:wheat pest and disease, entity recognition, entity relation extraction, knowledge graph, knowledge visualisation
PDF Full Text Request
Related items