Font Size: a A A

Metabolite-disease Association Prediction Based On Biological Networks

Posted on:2022-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:J J TieFull Text:PDF
GTID:2510306344951459Subject:Biomedicine Engineering
Abstract/Summary:PDF Full Text Request
Metabolites play an important role in the maintenance,growth and reproduction of organisms.The level of metabolites can directly reflect the physiological state of human body.With the development of high-throughput data,a large number of metabolite data have been detected,it is significant to identify changes in human metabolites that cause abnormalities or diseases in the body.A large number of studies have shown that the occurrence of disease is always accompanied by the change of metabolites,and the study of the associations between metabolites and disease can help to better understand the diseases.Using traditional biological techniques to identify disease-associated metabolites is costly and laborious.In this thesis,based on known metabolite-disease associations,combining data resources related to metabolites and diseases,the calculation methods for predicting the metabolite-disease associations are proposed.The results predicted by computational methods can provide the next foothold for biological experiments,and can verify the most relevant metabolites predicted for a particular disease,and further understand the pathogenesis of the disease.The research content of the thesis is as follows:(1)A computational method based on bi-random walk is proposed for metabolite-disease association prediction(MDBIRW).MDBIRW mainly considers that using random walk cannot make full use of the information of metabolite network and disease network.Bi-random walk was used to the disease-related metabolites by walking in both the metabolite network and disease network simultaneously.MDBIRW first integrates the similarity of metabolites and diseases,integrates metabolite functional similarity and gauss kernel similarity of metabolites to obtain the final metabolite similarity,and integrates the disease semantic similarity and gauss kernel similarity of disease to obtain the final disease similarity.Then,a bipartite heterogeneous network is constructed based on the metabolite similarity network,known metabolite-disease associations network and disease similarity network.Finally,bi-random walk is used to predict disease-related metabolites on the constructed heterogeneous network.In order to evaluate the performance of the method,we use leave one out cross validation(LOOCV)and 5-fold cross validation(5-fold CV)to evaluate the performance of MDBIRW.The experimental results show that MDBIRW performs well in predicting disease-related metabolites.(2)A non-negative matrix factorization method(RCNMF)based on relation completion is proposed to predict the metabolite-disease associations.The chemical structure information of metabolites is recorded in HMDB database.By converting the chemical structure of the metabolites into binary fingerprint sequence,the molecular fingerprint similarity of any two metabolites was calculated.Using the chemical structure information of metabolites to calculate the similarity of metabolites can avoid the inaccuracy of the calculated similarity due to less known metabolite-disease associations and too sparse metabolite-disease association matrix.Meanwhile,based on the biological similarity of metabolites and diseases,the sparse association matrix is completed by using WKNKN method.Finally,non-negative matrix factorization algorithm is introduced to predict the potential metabolite-disease associations.Cross validation and case study both found that RCNMF is an effective tool to predict the associations between metabolite and disease,which can be used as the basis for biological experimental verification.(3)A method based on DeepWalk and random forest is proposed to predict metabolite-disease association(DWRF).First,metabolite-gene associations are introduced,and using DeepWalk to extract the features of metabolites from the metabolite-gene associations.Then,the molecular fingerprint similarity of metabolites and disease semantic similarity are calculated based on biological information.The feature vector of metabolite-disease pairs is constructed by connecting the features extracted from the metabolite-gene network,disease semantic similarity and molecular fingerprint similarity of metabolites.Finally,the features are input into the random forest classifier to predict the potential metabolite-disease associations.Compared with other methods and case study,DWRF can effectively explore the associations between metabolite and disease.(4)A method based on graph convolutional networks to infer potential metabolite-disease association(MDAGCN).First,three kinds of metabolite similarities and three kinds of disease similarities were calculated.The final disease similarity and final metabolite similarity will be obtained by integrating three kinds'similarities of metabolite and disease and filtering out the noise similarity values.Then,heterogeneous network is constructed based on metabolite similarity network,disease similarity network and known metabolite-disease associations' network.Finally,the heterogeneous network with rich information is input into the graph convolutional networks.By aggregating node information,the new features of nodes are obtained to infer the potential association between metabolites and diseases.The experimental results show that MDAGCN has achieved reliable results.
Keywords/Search Tags:Metabolite-disease association, Bi-random walk, Non-negative matrix factorization, DeepWalk, Random walk, Graph convolution networks
PDF Full Text Request
Related items