With the rapid development of the Internet era,information management not only promotes the intelligence and modernization of medical information management,but also generates a large amount of detailed and valuable biomedical research data.However,mos t medical texts exist in an unstructured form,which makes it difficult for biomedical researchers to discover specific knowledge automatically and in a timely manner.Therefore,it is considered to convert text data such as electronic medical records into structured information with the aid of information extraction technology,so as to provide help for the research and development of biomedicine.The most important subfields of biomedical extraction techniques are biomedical named entity recognition and relation extraction tasks,where the former focuses on extracting medical entities from text,and the latter focuses on discovering semantic relationships between these entities.The research results of information extraction technology will provide basic support for drug supervision and clinical decision-making,especially for the construction of knowledge graph and the development of question answering systems.Therefore,this thesis researches two key technologies,named entity recognition and relation extraction,for text data in the biomedical field.Its main work is as follows:1.A Chinese biomedical named entity recognition model,MFEBC,based on multifeature embedding is proposed.First,the model introduces external resources to construct lexical features,which can supplement the phrase information of characters.Second,according to the characteristics of Chinese pictographs and text sequences,character and sequence structure information are introduced respectively.And the convolutional neural network is used to encode the two structural information to obtain character and sequence structure feature embedding.Finally,the obtained multiple feature embeddings are spliced,and input into long short-term memory network to encode,and then use conditional random field to output entity prediction results.The experimental results show that the MFEBC model integrates lexical and text structure features,and can effectively identify med ical named entities.2.A Chinese medical relation extraction model,CMERE-MIA,based on multiinteraction attention mechanism is proposed.First,the word embedding and position embedding are spliced and input into the long short-term memory network to obtain the context representation.Second,the convolutional neural network is used to extract the structural features,and contextual representation are masked and pooled at the same time;Finally,the interactive attention is used to learn the influence of structural features and entity features on the context,respectively,and the context representation is reconstructed.The results of comparative experiments show that the CMERE-MIA model is better than the benchmark models,which proves that the introduction of structural information and entity information by the interactive attention mechanism can enhance the expressiveness of context and effectively improve the performance of medical relation extraction tasks.3.A Chinese medical question answering and knowledge extraction system is designed and implemented based on the knowledge graph.First,data is collected,and multi-source data are fused.Second,Neo4 j database is used to store data,a knowledge graph is built to form knowledge triples.Finally,the Chinese medical question answering and information extraction system is designed based on the constructed knowledge graph.The web system is built with the front-end and back-end separation mode,and the front-end interface and backend processing logic are implemented,to provide users with QA search and information extraction services. |