| The apple industry plays an integral part in China’s agricultural development,and insect pest and disease problems have significantly constrained the industry’s growth in China.However,most information on apple insect pest and disease control relies on a large amount of unstructured text material,making it difficult for apple industry practitioners to obtain highquality apple insect pest and disease control solutions.Therefore,extracting sufficient information from professional text materials is of practical importance to assist apple industry practitioners in efficiently solving pest control problems.Named entity recognition is a fundamental task in information extraction and is the basis for building knowledge graphs and constructing intelligent downstream applications such as thoughtful question-and-answer systems.Based on this,this paper conducts a study on named entity recognition of apple pests and diseases in response to the characteristics of insufficient database,complex text entities,and diverse division of entity categories in apple insect pests and diseases.The main work is as follows:(1)Construction of a named entity recognition dataset for apple pests and diseases.To address the problem of insufficient research and data support for named entity recognition in apple insect pests and diseases,eight authoritative books on apple pests and diseases were used as data sources to enrich the data in the field of specific apple insect pests and diseases,concerning existing research on insect pests and diseases in the agricultural field and research guidance from university experts in plant protection.It was determined to divide 20 subdivided entity categories and complete the dataset annotation work to provide a data basis for subsequent model applications.(2)An improved Transformer-based named entity recognition model for apple insect pests and diseases is proposed.According to the characteristics of apple pest domain data,the location information encoding method and the attention mechanism calculation method of the Transformer model have been modified in response to the problems of direction and distance loss in the attention mechanism calculation of the Transformer model.The experiments proved that the accuracy,recall,and F1 values of the modified Transformer model on the apple pest and disease dataset are 91.47%,91.99%,and 91.73%,respectively.Compared with the Transformer model,the F1 values are improved by 3.75%,and compared with the mainstream SOTA model,all of which are enhanced to a certain extent,proving the effectiveness of the improved Transformer model in apple pests and diseases.The model outperformed both the baseline and comparison models on the Weibo and Resume common datasets,demonstrating that the model has some generalisability.(3)A named entity recognition algorithm for apple insect pests and diseases incorporating BERT pre-training model.To address the problem that the traditional word vector model embedding vectors are fixed and the apple pest text is complex and specialized.The BERT pre-training model is incorporated to obtain high-quality embedding vectors containing rich semantic information.The experiments prove that the accuracy,recall,and F1 values on the apple pest and disease dataset after incorporating the BERT model were 91.92%,93.95%,and92.92%,respectively,demonstrating the effectiveness of the BERT model.Then,the extracted apple insect pest and disease entities are stored in the Neo4 j database to construct entity node mapping.Finally,identifying named entities for apple pests and diseases is completed to facilitate direct use by downstream tasks. |