Research On The Construction Of Atrial Fibrillation Knowledge Graph Based On Multi-source Data

Posted on:2023-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Sun

Full Text:PDF

GTID:2544306623994009

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Atrial Fibrillation is one of the common arrhythmias with complex etiology,while some patients who present with asymptomatic or paroxysmal Atrial Fibrillation is difficult to predict,which making the diagnosis and treatment of atrial fibrillation extremely difficult With the rapid development of medical information technology,data in the medical field has been skyrocketing.The vast amount of medical data on Atrial Fibrillation contains a great deal of medical knowledge about the diseases and symptoms that patients suffer from.How to handle,unearth and analyze or manage medical text big data to achieve information retrieval and health knowledge services has become one of the current urgent requirements.Aim at the above issues,this project on the basis of multiple sourced knowledge such as encyclopedic websites,literature,and textbooks which uses natural language processing techniques and manual annotation to extracts knowledge from medical texts.Turning semi-structured and unstructured knowledge into structured knowledge completes the graph of Atrial Fibrillation that based on multisource data.The main work of this thesis is as follows:(1)In the named entity recognition task,a pre-training based RoBERTa-BiLSTMCRF model is proposed for the problem that Chinese entities have multiple meanings of a word.Firstly,RoBERTa-WWM adopts Whole Word Masking to acquire semantic features dynamically,and combines the actual situation of Chinese text,which fully produces an advantage of the pre-trained model.Then the contextual feature information is learned by BiLSTM Finally,using CRF to learn the relations among sequence proximity labels,which achieves named entity recognition.The validation is performed on the AF text dataset constructed in this thesis,and the results show that the proposed named entity recognition model outperforms other comparative models in terms of accuracy,recall and F1 value,which verifies the effectiveness of the model.(2)In the relational extraction task,aim at the problem that small training corpus can easily lead to overfitting,which proposes a BERT_MSD relational extraction model that based on pre-training of Chinese corpus.The model includes two modular layers,BERT and Full Connection layer.Firstly,the BERT module consisting of 2 layers of Transformer encoder is used to obtain the important features of the text.Then we use Multi-Sample Dropout strategy to prevent overfitting before the Full Connection layer,and perform relationship classification after the fully connected layer to achieve relationship extraction and improve overfitting effectively.The results of the validation on the text dataset of housing fibrillation constructed in this paper show that the model can effectively prevent overfitting,converge faster,achieve better results at the same time.(3)It constructed a knowledge graph of Atrial Fibrillation that based on Multisource Data.As for the lack of data in the field of Atrial Fibrillation research,the use of Multi-source medical knowledge data such as authoritative medical and health website data,encyclopedia website data,authoritative Chinese literature,medical textbooks and electronic medical records.By crawlers,query downloads and hospital electronic medical records to obtain the data,which fully display the information related to atrial fibrillation.Firstly,the entity,attribute and relationship categories of AF knowledge graph are determined to form the schema layer of AF knowledge graph.Then we used a combination of manual annotation and automated extraction for knowledge extraction and knowledge fusion,and completed the unstructured text annotation of 16,350 entities and 15,060 triples in one year,and automated extraction of unannotated data using the two knowledge extraction algorithms proposed in this thesis.Fusing all the data yields 10,186 entities,12,115 relationship triples and 22,220 attribute triples.Finally,Neo4j graph database is used for storage and visual presentation,which achieves an intelligent application of medical question and answer system.

Keywords/Search Tags:

Knowledge Graph, Atrial Fibrillation, Named Entity Recognition, Relationship Extraction, Manual Labeling

PDF Full Text Request

Related items

1	Construction And Implementation Of Chinese Electronic Medical Record Knowledge Graph Based On BiLSTM
2	Construction And Application Of Chinese Medical Knowledge Graph Based On CNKI
3	Design And Implementation Of Text-based Medical Knowledge Graph
4	Construction Of Tumor Knowledge Graph Based On Nested Named Entity Recognition
5	Research And Implementation Of Medical Knowledge Graph Q&A System Based On Deep Learning
6	Named Entity Recognition And Relation Extraction For Traditional Chinese Medicine Knowledge Graph
7	Research And Implementation Of Medical Commercial Knowledge Graph Construction Based On Multi-source
8	Research And Application Of Key Technologies For Knowledge Graph In Traditional Chinese Medicine
9	Research On The Construction Method Of Knowledge Graph For Medical Insurance Intelligent Audit
10	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention