Font Size: a A A

Research On The Method Of Entity Relation Extraction In Biomedical Literature And Its Applications

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2404330623450972Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Biomedical literature is the principal source important source of biomedical big data and it contains a massive amount of valuable information.However,there are tens of millions articles around existing in unstructured texts.Therefore,we need to develop efficient and advanced computational methods to extract knowledge from literature.Text mining methods based on natural language processing(NLP)is capable of identifying key biomedical concepts from literature,including genes,drugs,diseases,variants,etc..Based on recognized named entities,the relations between conceptual entities can be extracted as well.It has been demonstrated by related research efforts that,due to its distinct language characteristics and its dependency on complex domain knowledge,general purpose NLP methods and tools are not directly applicable in biomedical text mining,which requires dedicated research.Dozen of biomedical text mining methods and tools were developed for named entity recognition(NER)and most biomedical concepts can now be recognized via programs.On the contrary,methods for relation extraction for entities only becomes the new research focus recently.A typical relation extraction workflow is a complex procedure,which involves in depth syntactic parsing and semantic analysis.The accuracy of current methods are far below satisfaction.In addition,the size of biomedical literature is huge.For instance,the biggest biomedical literature database,PubMed,indexes over 20 million abstracts and 4 million fulltexts.This poses a great challenge to computational abilities.Consequently,the research focus of this thesis is to design and implement an accurate method for relation extraction.The contributions of this thesis can be summarized as follows:(1)A rule-based relation extraction method using dependency parsing:We analyzed existing rule-based relational extraction methods and found that they could only extract simple relations between entities,for instance,co-occurrence in the same sentence,which skips an in-depth analysis of the complex grammatical structure.To address this problem,we proposed a flexible and extensible rule-based method using the dependency tree to extract entity relationships from unstructured texts,which can handle multiple types of entities and relationships.Our method achieved an average F-score of 73% on the CPI corpus,and an average F-score of 61% on the DDI corpus.(2)A deep learning method for relation extraction:A typical drawback of rule-based methods is that they are weak in generalization.One solution is to employ machine-learning models like the Support Vector Machine(SVM)or Na?ve Bayes(NB)classifiers.Recently,deep learning(DL)models become increasingly prevailing.Some DL models are suitable for processing sequence data like texts.Therefore,we proposed a relation extraction method based on the long short termmemory(LSTM)model and dependency information.The learning network includes a feature layer,the LSTM layers,a max-pooling layer and the Softmax layer.The feature layer utilize features developed from the dependency tree as the input;each node in the LSTM layers contains contexts;the max-pooling layer selects the optimized result;the softmax layer deals with normalization and generates the output.Our method achieves a72% F-score on the DDI corpus,which outperforms other methods based on kernels and CNNs.(3)Design and implementation of parallel relation extraction:We deployed the literature database and a text-mining pipeline on Tianhe-2;taking into account the characteristics of Tianh-2 architecture,we developed schemes that can exploit the parallelism at multiple layers to fully harness the computational power.Our research focus is on task management and strategies for load balancing and we adopted MPI for implementation.(4)An Example Application for Relationship Extraction-CNVisionCopy number variations(CNVs)are related with many diseases.Current interpretation of CNVs heavily relies on manual investigation of literature and databases.We carried out an application research based on our extraction method and its parallel implementation: we text-mined all CNV-related articles and extracted all CNV-disease relations to construct the CNV-GT database;we also developed a Web-interface for user queries.
Keywords/Search Tags:Relational Extraction, Dependency Tree, Biomedical literature, Parallel Computing, deep learning, LSTM, CNVs
PDF Full Text Request
Related items