Font Size: a A A

Research On Algorithms For Analyzing Medical Network Modules Based On Multiple Biological Sources

Posted on:2021-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ManFull Text:PDF
GTID:2370330614971374Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A large amount of medical network data has been generated since the development of high-throughput experimental technology,and the collected network data with multiple dimensions provide platform for us to study the cell function.Detection of molecular function modules in the network can help us to explore the pathogenesis of diseases and the potential mechanism of cell operation,which is of great significance to medical research.However,there are some common problems in medical networks such as fewer links among nodes and noise interactions.Maximizing the valuable information obtained from multiple networks by means of fusion of multi-source biological data so as to provide assistant for the medical clinic.Therefore,the module identification algorithms are introduced into this manuscript,and the multi-view clustering method is used to select different views of medical networks in order to detect functional modules in networks,and then the molecular function is predicted by means of module analysis.The main research contents of this manuscript are as follows:First,it is known that the protein-protein interactions is sparse and noisy interactions(e.g.false positives)existed in protein interaction networks,then the topological modules with proteins densely linked with each other are not obvious.Therefore,in this manuscript,the similarity of any protein pairs in the network is deeply explored by using the method of subspace learning to learn the linear self-expression of proteins in the protein interaction network.By fusing the data of curated protein complex that has been examined by human,the semi-supervised method can guide the detection model of protein functional module to learn a more accurate protein-module membership matrix.In this paper,the SNFM algorithm was proposed.Compared with traditional unsupervised clustering algorithms(such as k-means,NMF,etc.),the ACC and MMR metrics reached 0.56 and 0.42 in the DIP network,and the F1 score increased by 12% compared with NMF algorithm.In the case analysis part,we selected five typical modules which including protein functional modules with bipartite structure.The reliability of SNFM algorithm in protein functional module detection was verified through the enrichment analysis of pathway and GO enrichment analysis.Second,in order to tackle the issue of protein functional module detection by integrating multiple view data to complement the drawbacks in protein interaction network,the first step is to select the collected different biological views,and then the matching degree of the gene-disease and gene ontology views with the human curated protein complex data was obtained.Considering the sparse and low-rank features of the protein interaction network,the feature matrix was obtained by fusing the gene ontology view with MLRSSC algorithm.At the same time,the prior information is re-expressed by means of the common neighbor relationship among proteins in the network,and then the non-negative matrix factorization method is used to cluster proteins.The results show that the proposed algorithm SLRSSC is superior to the SNFM algorithm,especially on the BIOGRID database,the F1 score of SLRSSC is 18% higher than the multi-view clustering algorithm MVCC;In the three aspects covered by gene ontology,and a higher proportion of protein complex enrichment at different critical values is obtained by the method SLRSSC presented in this manuscript.Third,due to the lack of sensitivity,specificity,and blurred boundaries of the existing disease classification system,it is difficult to apply it to assist doctors in the treatment of early special diseases.Considering that the disease attribute data of multiple views contains more value information,this manuscript proposes a method of fusing multiple disease attribute view data to mine more effective information in anticipation of solving the aforementioned problem.The proposed model adopts a step-by-step view combination strategy from few to many to fuse different views,and adopts the MLRSSC and MVCC multi-view clustering method to greedily select the optimal combination of different views.Through experiments,from the combination of multiple views,this manuscript finds that after the fusion of the symptoms,protein module and biological process,the disease classification results we got are the best.In order to verify the effectiveness and superiority of the method in this manuscript,in the experiment,we compare the method of this manuscript with the similarity of the existing disease classification system from the methods of sharing genes,protein modules,phenotypes,GO slim,etc.The new disease classification results are significantly better than the ICD system,and when the similarity is 0.5,the GO term modularity value is 0.2 higher than the NCD result of the benchmark method.
Keywords/Search Tags:Functional module, Semi-supervised, Subspace learning, Multi-view clustering, Disease classification
PDF Full Text Request
Related items