| The COP15 Conference adopted the Kunming Declaration,which highlights the importance of developing educational tools on biodiversity.This paper uses the endangered wild mammals of China,which are closest to human life,as a scope of knowledge and explores ways to develop biodiversity education tools.In this paper,the endangered wild mammals of China,which are the closest to human life,are selected as the scope of knowledge,and a methodology for developing biodiversity education tools is explored.Named entity identification is the key to building a knowledge map of China’s endangered wild mammals.This paper explores a method for developing biodiversity education tools by selecting the wild mammals that are closest to human life as the scope of knowledge.The quality of the knowledge map is directly affected by the accuracy of its identification.Knowledge of China’s endangered wild mammals is scattered across multiple sources of unstructured,coarse-grained text.There is an urgent need for reasonable integration and sharing of these textual resources.In addition,the accuracy of named entity recognition is easily affected by the size of the data,and to reduce the reliance of named entity recognition models on large data volumes,the main work of this study is as follows:(1)A corpus of endangered wild mammals in China was constructed.The diverse and fragmented Chinese endangered wild mammal domain text data were preprocessed and the cleaned data were annotated using appropriate annotation tools.In this study,a word-level BIOES-based annotation strategy was adopted based on the characteristics of the text data and three rounds of manual correction were carried out,resulting in a correct rate of 95%,which indicates that the Chinese endangered wild mammal domain corpus constructed in this study can be used.(2)A BERT-based fusion model for named entity recognition was investigated.The rationality of BERT as a baseline model was verified on the corpus constructed in this study.Ablation experiments were conducted with BERT as the baseline model under the same experimental parameters,and it was found that the introduction of the Bi LSTM model could improve the effectiveness of named entity recognition on the corpus constructed in this study.(3)A named entity recognition method incorporating Template Data Augmentation(TDA)-based recognition is proposed.In order to improve the experimental confidence,experiments will also be conducted on the "CHIP2020" and "msra" and "cner" datasets.The three fusion models discussed above are set as baseline models under the same experimental parameters,and the effectiveness of the TDA data enhancement technique is analysed through comparative experiments.The experiments showed that the"CHIP2020" and "msra" datasets were the most effective when the BERT-CRF model was used as the baseline model,with an increase in f1 values of 0.82% and 2.04%,respectively.cner" dataset and the corpus constructed in this study worked best with the BERT-Bi LSTM-CRF model as the baseline model,with an increase in f1 values of2.09% and 7.9% respectively.The experimental data fully illustrates the effectiveness of the TDA data enhancement technique in the downstream task of named entity recognition,which can effectively address the dependence on data size during model training.The TDA-BERT-Bi LSTM-CRF model,which works best on the corpus constructed in this study,is also applied to the named entity recognition of the Chinese endangered wild mammal domain corpus.(4)Visualization of the knowledge graph of the Chinese endangered wild mammal domain was achieved.A pattern matching-based approach was used to perform relationship extraction,and the results were stored in the graph database Neo4 j to visualise the knowledge map of China’s endangered wild mammal domain,laying the foundation for the realisation of developing biodiversity education tools. |