| With the rapid and stable development of my country’s society and economy,people’s living standard and education level are generally improved,more and more people begin to realize the importance of health,and obtaining safe medication knowledge and medication guidance has become a demand for more people.Drug therapy is the most commonly used and most convenient method of treatment.People often choose drugs based on their own experience and the contents of the instructions,but do not fully understand and master the specific conditions of each drug,resulting in irrational use of drugs.The drug insert is an important carrier of drug information,and is a scientific basis and guide for doctors and patients on how to use drugs.However,the wide variety of drugs on the market and the explosive growth of modern medical knowledge are beyond the ability of physicians to master.At present,the drug inserts on the existing public medical websites are presented in semi-structured or unstructured free text or natural language descriptions,with different formats and various types,and most websites only contain the contents of the drugs themselves.However,there is a lack of relevant information between drugs and drugs.Therefore,the design and implementation of a knowledge graph construction and retrieval system for drug inserts is of great significance to reduce unreasonable drug use,relieve pressure on doctors and reduce medical costs.This thesis mainly introduces the design and implementation of the knowledge graph construction and retrieval system for drug inserts.The work of this paper mainly includes four parts.The first part is data collection and processing.This part is based on the Scrapy crawler framework to crawl the information in public medical websites,analyze the content of the webpage through XPath and regular expressions to obtain the drug instructions,and then analyze and de-duplicate the collected data.so as to integrate and unify the data of different websites and store them in the csv file.The second part is named entity recognition.This part first designs the extracted entity types according to the contents of the drug insert,and manually annotates the data.Then use the BERTCRF model to train based on part of the labeled data,and then use the trained model to perform entity recognition on the unlabeled drug instructions.The third part is the construction of the knowledge graph.This part first merges and de-duplicates the extracted entities to obtain a collection of entities such as drugs,drug components,and main functions.Then construct a triple with the drug name as the head entity,other entities as the tail entity,and the entity type corresponding to the tail entity as the relationship,and then import it into the non-relational database Neo4j for storage.The fourth part is information retrieval and visual display.This part uses Cypher query language and semantic similarity matching algorithm to realize the query function,and displays the query results in the form of knowledge graphs and relationship lists.Users can search for drugs and other entities to get all related information,and they can also perform specified relationship queries.Users can further discover and explore the relationship between drugs and drugs through visual display.The system is developed based on the Python development language,with the B/S architecture as the background,the MVC development framework,the Neo4j nonrelational graph database to store data,and the use of HTML5,CSS,Cypher,ECharts,Flask and other technologies to realize the system’s functions.The contents of the drug instructions are displayed to users in a more intuitive and concise way,dynamically displaying the relationship between knowledge and realizing the retrieval of drug information. |