Font Size: a A A

Research On The Algorithms Of Detecting Protein Complexes And Biomarkers Based On Neural Network

Posted on:2021-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D DongFull Text:PDF
GTID:1360330614472309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the 21st century,with the development of various biological sequencing technologies,more and more biological data have been generated,including genomic data,transcriptome data,and proteome data.Using machine learning methods to mine valuable information from these massive omics data will promote people's recognition,diagnosis and treatment of diseases.Cells are the basic units that make up an organism,and the functions of cells are achieved by different biological molecules,including proteins,RNA,and DNA.Abnormality of any molecule in the cell may cause cell dysfunction and disease.Proteins are a class of organic macromolecules,and protein interaction networks are a network of interactions between proteins.The study of protein interaction networks can not only promote a systematic understanding of various biological processes,reveal the mechanism of disease,but also it plays a positive role in the diagnosis and treatment of diseases.miRNAs are a class of non-coding RNA molecules with a length of about 22nucleotides.Many miRNAs are closely related to human diseases.The use of known miRNA-disease relationships to identify new disease-related miRNAs can effectively help predict therapeutic targets.And assist in the diagnosis of the disease.Genes are DNA fragments with genetic effects.Many diseases are closely related to abnormal gene expression.Using the gene expression data of patients to find abnormally expressed genes will have important guiding significance for the early diagnosis and treatment of diseases.The relationship between proteins constitutes a Protein-Protein Interaction(PPI)network.The study of protein interaction networks can not only systematically understand various biological processes,reveal the mechanism of disease,but also research on new drugs and diseases.Diagnosis and treatment play a positive role.Mi RNA is a type of non-coding RNA molecule with a length of about 22 nucleotides.Many miRNAs are closely related to human diseases.The miRNA-disease relationship network can be formed by using the miRNA-disease relationship.Research on the network can effectively predict therapeutic targets and assist in the diagnosis of diseases.Genes are DNA fragments with genetic effects.Many diseases,especially cancers,are caused by gene mutations.Using the gene expression data of patients to find effective prognostic markers will have important guiding significance for the early diagnosis and treatment of diseases.In this article,we mainly focus on the study of protein,miRNA and gene-related omics data by neural network models.For the first time,we proposed a protein complex detection algorithm by combining supervised models with local structural information,a disease-miRNA relationship prediction algorithm based on edge perturbation,and a recognition algorithm for prognostic markers of melanoma.The main achievements of this paper are as follows:(1)Predicting protein complexes by using supervised learning method combined with local structural information.Traditional unsupervised protein function module detection algorithms often assume that protein complexes only exist in high-density regions of the PPI network.However,some real complexes exist in low-density areas.In recent years,more and more researchers focused on supervised protein complex detection algorithms.Cause the PPI networks contain considerable amount of noise and many of the known complexes are incomplete,we designed a scoring function combining a supervised model and local structure information of the current module,and designed a both forward and backward search strategy to guide the serach process of the complexes.Compared with the supervised and unsupervised complex detection algorithms,our proposed method achieved better performance and robustness.(2)Predicting miRNA-disease associations by edge perturbation-based method.How to extract useful features from miRNA-disease interaction network is one of the most critical steps for miRNA-disease association prediction.Different from the previous works,we design a feature extraction method based on edges perturbation.Since adding or deleting an edge from a graph will affect the overall structure of the graph,we regard the influences of each edge on the overall structure as the features to measure the importance of the edge.The extracted features are used to train a multi-layer perception model to predict the candidate disease-miRNA associations.In the case study of three diseases,we found that there are 42,46 and 41 of the top 50 predicted miRNAs are confirmed by the published experimental discoveries respectively.In addition,analysis on TCGA kidney cancer miRNA expression data shows that the two newly predicted miRNAs(has-mir-96 and has-mir-221)can be directly used as biomarkers to distinguish cancer from normal samples.(3)Identifying prognostic biomarkers based on autoencoder.Melanoma is a cancer with a poor prognosis for patients,which is closely related to the complex immune system.Traditional melanoma tumor prognostic markers are often designed using statistical methods or simple linear regression models with poor predictive power.Therefore,we designed a more predictive biomarker.First,we divided genes into two groups based on their correlation with lymphocytes and tumor cells,respectively.Second,we trained two multi-layer autoencoders model to compress the two group of gene expression data.Finally,we designed two prognostic markers S_H and S_L based on the compressed features.Through experimental analysis,we found that S_H is associated with immune cytotoxicity and S_L is associated with MYC pathway activity.Our proposed prognostic markers S_Hand S_L have significant prognostic ability validated on independent melanoma patient dataset.And S_Hcan also be used to predict the prognosis of stage III patients.Combining S_H and S_L with clinical inforamtion can significantly improve the prognosis of patients.The two markers S_H and S_L provide a practical measurement method for the prognosis of melanoma patients and can be used to improve the therapeutic efficacy of melanoma.
Keywords/Search Tags:Cancer, protein, protein-protein interaction network, complex detection, miRNA, gene expression data, biomarker
PDF Full Text Request
Related items