In recent years,the incidence and mortality of tumors in China have continued to increase.According to the Chinese Cancer Data released by the National Cancer Center in 2019,the incidence of malignant tumors has maintained an annual growth rate of approximately 3.9% and the mortality rate has remained at 2.5% increase.How to use the existing data to summarize and excavate potential and effective data relations to strengthen the prevention and treatment of cancer has become an urgent problem that researchers need to solve.With the development of global informatization,more and more data are scattered on the Internet,which contains rich medical and oncological knowledge,mining valuable information and constructing tumor knowledge map can promote semantic technology in the field of medical information The research and application in China will help doctors obtain knowledge and guidance more conveniently and bring more efficient and accurate medical services.In response to the needs of the construction of tumor knowledge map,the main work of this article includes the following two parts:(1)Aiming at the common nested entity problem in medical data,a BERT-based nested named entity recognition model BLBC(BERT-Layered-Bi LSTM-CRF)is proposed.The model uses dynamically stacked planar NER layers to identify nested entities.In this thesis,the output of the current layer is detected and fused to create a new representation for the entity,and then they are imported into the next planar NER layer,which can make full use of the internal Encoding information in the entity to extract external entities.In addition,in order to improve the accuracy and recall of traditional pre-trained models,in order to improve their accuracy and recall rate,this thesis introduces a pre-trained model BERT with stronger text feature representation capabilities as a feature representation layer.This thesis conducted experiments on the Chinese medical data set CCKS2017 and the English medical data set GENIA,and confirmed that the BLBC model is more effective.(2)Using the tumor-related resources in Baidu Encyclopedia and CNKI papers,a tumor knowledge map with a certain scale is designed and constructed.We crawled the tumor-related pages of Baidu Encyclopedia and CNKI’s tumor-related papers,and directly sorted the semi-structured data to generate triples.For unstructured data,first use the BLBC model to identify named entities for the nested entities in the data;then use the triples formed by the semi-structured data to remotely supervise with the CN-DBpedia knowledge base,and use the PCNN model for relationships extraction;finally,the unstructured data and the semi-structured data form a ternary combination to form a tumor knowledge graph with 5247 triples,3189 entities and 204 relationships.The Neo4 j graph database is used to complete the knowledge graph storage. |