Font Size: a A A

Recognition And Functional Prediction Of MiRNA And Phylogenetic Analysis Of HIV Based On Network And Nonlinear Method

Posted on:2020-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L MaFull Text:PDF
GTID:1364330602959617Subject:Statistics
Abstract/Summary:PDF Full Text Request
MicroRNA(microRNA is abbreviated as miRNA)is a newly discovered non-coding RNA(about 22 nucleotides)in recent years,which plays a key regulato-ry role in many important life processes.People have great expectations for its prospects in disease diagnosis and treatment.The research on miRNA is one of the most advanced directions in the field of life science.HIV-1 virus is the most common pathogenic strain of HIV,with a very high fatality rate.It can evolve into many closely related variants in a short time,showing different.infectivity and evolutionary dynamics.Therefore,accurate subtype localization of HIV-1 is the guarantee for the development of effective vaccines.In this paper,we mainly study the recognition of precursor microRNA(abbreviated pre-miRNA),the pre-diction of disease-related miRNA,and the classification of HIV-1 subtypes,and have achieved some research results.It mainly includes the following three parts:(1)The high-precision calculation method for pre-miRNA recognition was s-tudied.Distinction between pre-miRNAs and length-similar pseudo pre-miRNAs can reveal more about the regulatory mechanism of RNA biological processes.Ma-chine learning techniques have been widely applied to deal with this challenging problem.However,most of them mainly focus on secondary structure information of pre-miRNAs,while ignoring sequence-order information and sequence evolution information.In this study,We first extract features from PSI-BLAST profiles and Hilbert-Huang transform,which contain rich sequence evolution information and sequence-order information respectively.We then obtain properties of small molec-ular networks of pre-miRNAs,which contain refined secondary structure informa-tion.These structural features are carefully generated so that they can depict both global and local characteristics of pre-miRNAs.In total,our feature space covers 591 features.The maximum relevance and minimum redundancy(mRMR)feature selection method is adopted before support vector machine(SVM)is applied as our classifier.The constructed classification model is named MicroRNA-NHPred.The performance of Micro RNA-NHPred is high and stable,which is better than that of those state-of-the-art methods,achieving an accuracy of up to 94.83%on same benchmark datasets.We also tested on independent data set and the results show that MicroRNA-NHPred is superior to the two best predictors in recognizing pre-miRNA.(2)A computational method of miRNA-disease association prediction based on the cascade combination recommendation method was proposed.Many experimental studies have shown that the change and disorder of miR-NA may lead to the occurrence of many complex diseases,especially cancer.The prediction of potential miRNA disease associations not only helps to explore the pathogenesis of diseases,but also helps to understand the biological process.How-ever,it is expensive and time-consuming to verify the association between miRNA and diseases through biomedical experiments.Currently,researchers have built multiple databases to store useful data about miRNA.Based on these miRNA related data,researchers have designed many effective calculation methods to ac-curately reveal disease-related miRNA.In this study,we introduce a new computational model,the predictable mod-el on heterogeneous network,to identify potential miRNA disease associations.Firstly,we use the data from HMDDv2.0 database to generate a rough recommen-dation result based on the hybrid recommendation algorithm of heat conduction and material diffusion,and roughly calculate the probability score of each pair of miRNA-disease association.Then,by integrating other data sources,we construc-t a heterogeneous network,which consists of disease similarity network,miRNA similarity network and miRNA-disease association network.Among them,the miRNA-disease association bipartite network is constructed according to the prob-ability score of miRNA-disease association in the first step;the disease similarity network is constructed from disease function information;and the miRNA similar-ity network is constructed from the following biological information:miRNA fam-ily information,miRNA cluster information,experimental effective miRNA-target Association and disease-miRNA interaction information.Structural perturbation method is used in heterogeneous networks to accurately predict the potential as-sociation between miRNA and disease.Our proposed method-CCRMDA,which fully considers the network structure and information transmission,we tested the prediction results of 15 diseases in different methods,the average AUC value of CCRMDA is higher than some currently known methods,shows that CCRMDA can be used as an effective calculation method to improve the prediction accuracy of disease-related miRNA.In addition,we conducted case studies of three impor-tant human cancers,and 90%(breast cancer),96%(liver cancer)and 88%(lung cancer)of the top 50 miRNA were predicted by the latest data and literature,indicating that CCRMDA has a reliable predictive ability.(3)A k-mers method based on Position-Weighted for HIV-1 subtype classifi-cation is proposed.HIV-1 can rapidly evolve into many closely related variants in a short period of time,showing different infectivity and evolutionary dynamics.In order to rapid-ly develop an effective HIV vaccine,it is first necessary to quickly and accurately describe the evolutionary relationship of HIV.In this study,we propose a new and effective alignment-free method for phylogenetic analysis of HIV-1 viruses using complete genome sequences.Based on position-weighted k-mers,we first convert a complete genome sequence into k-mer position distribution vectors.We define a frequency vector based on these k-mer position distribution vectors.Then,we propose a metric to determine the optimal k value.Finally,for the optimal k value,we use the Manhanttan distance on frequency vectors to detect the phylogenetic relationships among complete genome sequences of different subtypes of viruses.We name our method the Position-Weighted k-mers(PWkmer)method.Valida-tion and comparison with the Robinson-Foulds distance method and the modified bootstrap method on a benchmark data set show that P Wkmer method is reli-able for phylogenetic analysis of HIV-1 viruses.PWkmer can resolve within-group variations for different known subtypes of Group M of HIV-1 viruses.PWkmer is simple and computationally fast for whole genome phylogenetic analysis.
Keywords/Search Tags:Pre-miRNA, Secondary structure, Hilbert-Huang transformation, Support Vector Machine, PSI-BLAST profile, Disease-miRNA association, Hybrid recommendation algorithm, Structural perturbation algorithm, HIV-1, k-mers with position weight
PDF Full Text Request
Related items