Font Size: a A A

Research On COVID-19 Epidemiological Investigation Of Information Extraction Technology Based On Deep Learning

Posted on:2024-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C L YangFull Text:PDF
GTID:2544307160955539Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the spread and variation of COVID-19 in China,the national policy on its prevention and control has changed.Extracting core information from the COVID-19 epidemiological investigation can help implement and enforce precise prevention and control.The core tasks of information extraction for the COVID-19 epidemiological investigation are named entity recognition and relationship extraction,which rely on natural language processing techniques.At present,the named entity recognition and relationship extraction tasks based on deep learning work well,and the effect of deep learning methods depends on the quality of the labeled dataset.However,the need for a labeled dataset of the COVID-19 epidemiological investigation of information makes it challenging to carry out subsequent tasks based on the COVID-19 epidemiological investigation of information.There are problems of too long text and high text similarity in the COVID-19 epidemiological investigation of information,which makes it challenging to extract information based on the COVID-19 epidemiological investigation.This thesis focuses on the above problems.Therefore,to address the above problems,the main research contents of this thesis are.(1)To address the problem of lack of annotated datasets in the field of the COVID-19 epidemiological investigation of information,we collect and organize publicly available the COVID-19 epidemiological investigation of information texts based on the core requirements of precise prevention and control in government documents,with the COVID-19 epidemiological investigation of information is the human contact rather than the traditional time and place as the core,design and define entities and relationships according to the requirements,and refer to well-known datasets for dataset annotation rules to build a dataset that meets the requirements of precise prevention and control.(2)To address the problems of excessively long text and high text similarity in the COVID-19 epidemiological investigation named entity recognition task,a parallel neural network named entity recognition model(Ro BERT-Bi LSTM-IDCNN-PRelu-CRF)with fusion parameter modified linear activation function is constructed.By training the Ro BERTa model,the downstream model can better learn the feature information in the COVID-19 epidemiological investigation of information entity,and then solve the problem of too long text of the COVID-19 epidemiological investigation of information by combining the Bi LSTM in parallel with the form of IDCNN,and replacing the activation function in the IDCNN neural unit to solve the problem of high text similarity.The F1 value is 0.9641 on the COVID-19 epidemiological investigation of information dataset,the whole sentence recognition rate is 0.7023,and the named entity recognition effect is improved.(3)To address the problems of inadequate semantic utilization at the word level and significant differences in text and entity lengths in the COVID-19 epidemiological investigation tone relationship extraction,a relationship extraction model(BERT-WWMBi GRU-CNNN-ATT)is proposed that integrates attention mechanism and full-word mask strategy.By using the full-word mask strategy,the BERT embedding layer gets more vectors containing complete word meanings,which solves the problem of low utilization of word-level semantics,and then extracts the relationships between entities at definite locations in the form of Bi GRU serially combined with CNN,and introduces the dependencies between entities using the Self-Attention mechanism to solve the problem of the difference between text and entity lengths The problem of a significant difference between text and entity length.The F1 value is 0.9661 on the COVID-19 epidemiological investigation of information dataset,and the relationship extraction effect is improved.The experimental results show that the dataset designed and labeled in this thesis has good stability,and the named entity recognition model and relationship extraction model can better solve the above problems and improve the recognition accuracy of the model.The above research results meet the need for precise prevention and control and provide some reference and contribution to the subsequent research in other fields related to the COVID-19 epidemiological investigation of information.
Keywords/Search Tags:COVID-19 epidemiological investigation of information, deep learning, information extraction, named entity recognition, relationship extraction
PDF Full Text Request
Related items