Font Size: a A A

Method Research On Disease-associated Entity Relation Extraction

Posted on:2019-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q KangFull Text:PDF
GTID:1364330542497378Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
With the continuous improvement of the present economic level,the rapid development of science and technology,the improvement of living standard and the gradual improvement of people's health level,the situation is still not optimistic.Therefore,people put forward higher standards of life and health,and have higher requirements on the prevention and treatment of diseases.The published literature records a great deal of knowledge gained through practice and is an important reference for disease prevention and treatment.However,taking Pub Med,an authoritative database site in the field of biomedicine,as an example,the number of documents it has collected has reached more than 28 million.The rapid increase in the speed of publication has also made it more difficult to quickly and promptly identify disease-related influencing factors.The current development of computer technology and natural language processing technology has brought new breakthroughs,and has achieved excellent results in text retrieval,machine translation,named entity recognition,relation extraction,abstract extraction,and intelligent question and answer.Therefore,it is of great significance to make full use of advanced technologies to play a role in disease-related factors,and to extract relevant factors that can guide disease prevention,treatment,and taboo.It also helps to rapidly and comprehensively find ways to fight diseases and improve health.This research mainly uses literature research method,comparative analysis method,machine learning method and statistical analysis method,etc.,and conducts a comprehensive and systematic review of the origin,development,and status quo of entity relation extraction and its premise named entity recognition.Systematically research and,in-depth comparative analysis of the four common methods for entity relation extraction from their respective characteristics are made: dictionary-based method,rule-based method,ontology-based method and machine learning-based method,and machine learning method of relation extraction for disease-related influencing factors were filtered out.The influential factor relation extraction part adopted the method of convolutional neural network in deep learning method and support vector machine(SVM)in traditional machine learning.The convolution neural network methodcan automatically extract many different features of text for depth processing transformation,and compared to other deep learning methods,the training speed is relatively fast,which facilitates the timely optimization of parameter adjustment.Selected word features,location features stitching feature extraction,after convolution,non-linear activation,pooling and repeated training eventually got rich features of semantic information.These deep semantic features were input into the support vector machine for the final classification of the relationship,defined as beneficial to disease and harmful to disease in two broad categories.When the features extracted by the deep learning method are placed in the rbf of the SVM,the accuracy rate reaches 94.38%(at this time,the number of iterations is 11408,and 3,573 of the 3775 test sets are correct),which exceeded the time when the original SVM linear kernel is used.The accuracy rate of the one-hot encoding simple feature of 77.22%,the accuracy of 90.65% when applying deep learning extracted rich features,and the highest accuracy of 90.44% when using rbf in SVM.The experimental results showed that the convolution neural network method combined with SVM proposed in this paper obtained higher relation extraction performance under the condition of lower time complexity.Finally,based on the experimental results,the error analysis and conclusion were made.The specific content is divided into five parts.The first part is the theoretical research on entity relation extraction.First,the current situation,challenges and opportunities facing us were discussed,and the purpose and significance of the study were highlighted.Using the methods of literature research,I systematically analyzed the connotation,development history,and current research status of the relation extraction.And on the basis of summarizing and analyzing the existing research strengths and deficiencies,I designed the technical line of research ideas.The second part is the named entity relation extraction method screening.In-depth analysis of the methods based on dictionary construction,manual rules,ontology support,and machine learning,the advantages and disadvantages of these four methods were summed up.Based on the integration of machine learning methods in the existing research results and the massive data handling of the new environment in the era of big data and the liberation of the manual labor force to give full play to the advantages of new technologies,the advantages of extractingfeatures and coping with current and even massive data were highlighted.Machine-learning-based new methods for subsequent association extraction research were adopted.The third part is model design to relation extraction of disease-associated influencing factors.The overall method is a combination of convolutional neural network and support vector machine,which first extracts the rich semantic features from the convolutional neural network,and then inputs the support vector machine for relation extraction.Convolutional neural network covers the word vector screening,multi-feature selection,edge patch convolution,nonlinear activation,segmentation maximum pooling,full connection processing technologies.The support vector machine classification useed the libsvm toolkit.Among them,the special feature of natural language processing were that the word processing of the input layer mainly includes punctuation processing,digital processing,and case processing.The fourth part is the empirical analysis of disease-associated entity related extraction model based on hybrid method.Detailed description of data source and evaluation indicators,word vector filtering,dimension of features,convolutional layer,pooling layer and other conditions in each step of the specific training and optimization were made.The display of the results of the mining analysis showed that the role of entity 2 to entity 1 disease is beneficial or harmful.Finally,the overall analysis and discussion and error analysis.The fifth part is the summary and prospect of the study.The content of each part of the empirical study,the association extraction method,the relation extraction model,and the disease-associated relation extraction model based on the hybrid method were summarized.The research enriched the content and methods of disease-associated entity relation extraction field research.The main innovation is: based on the study of domestic and foreign peers,the research method of combining convolutional neural network and support vector machine is firstly used,and the respective advantages of these two methods are fully utilized.When using this fusion method to perform relation extraction analysis on large-scale disease corpus,the accuracy rate reached 94.38%,and the performance surpassed other methods.At the same time,it pointed outthe shortcomings of the evidence in the face of actual clinical use of drugs and the further research direction of expert consultants to interpret the results of related extraction in the next step.
Keywords/Search Tags:relation extraction, deep learning, machine learning, convolutional neural network, SVM
PDF Full Text Request
Related items