Font Size: a A A

Text Mining Based On Extraction Method Of Glioma Protein-protein Interaction

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiFull Text:PDF
GTID:2404330596462519Subject:Engineering
Abstract/Summary:PDF Full Text Request
The incidence of glioma accounts for 40-50% of the incidence of intracranial tumors.At present,molecular genetics to explore the pathogenesis of glioma and clinical targeted therapy for each subtype of glioma has become a hot research topic.Today,information technology has developed rapidly,such as artificial intelligence and natural language processing.At the same time,the published research literature in the biomedical field has also exploded.The need to reveal the relationship of biomolecules have promoted the deep integration between biomedical and computer technology.From named entity recognition,interaction relationship extraction,to biological event extraction,research on molecular biology based on text mining and information extraction technology has developed rapidly.This paper uses the unstructured biomedical literature as a data source to study the key technologies of Named Entity Recognition(NER)and Protein-Protein Interaction Extraction(PPIE).We uncover the pathogenesis of the disease by extracting effective biomedical structural information.The main work of this paper is as follows:(1)We use the CRF model for protein NER.First,we perform word segmentation,part-of-speech tagging,and chunk analysis on text.We extracted the text and obtained rich text feature sets such as word features,part of speech features,chunk features,affix features,morphological features,keyword features,stop word features and spelling features.Then the sequence forward selection algorithm is used for feature selection to construct the CRF feature model.This method achieved a comprehensive F value of 71.46% on the manually labeled JNLPBA 2004 Genia4 ER standard corpus.(2)Based on the results of protein-named entities identified by the CRF model,word2 vec,dependency syntax analysis,and SVM models were used for protein interaction extraction.We use the dependency parser to construct a set of semantic structure features.Word2 vec is used to construct relation vectors.Then,the semantic structure features and relationship vectors are put into the SVM classifier for protein interaction extraction.Experiments have shown that these features improve the results of the SVM classifier and significantly improve system performance.(3)We obtain the text data of glioma protein in Pubmed database through E-utilities interface.Taking the NER and PPIE of glioma proteins as an example,the practical application of protein NER based on CRF model in biomedicine was expounded.At the same time,we also discussed the practical application of PPIE based on dependency syntax analysis and SVM model.
Keywords/Search Tags:Glioma, Text mining, Named Entity Recognition, Protein-Protein interaction extraction, Conditional Random Field, Support Vector Machine, Wor2vec
PDF Full Text Request
Related items