Font Size: a A A

Gene Co-expression Network Analysis Based On Fuzzy Clustering

Posted on:2019-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhuFull Text:PDF
GTID:2370330596466404Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of gene sequencing technology has made the acquisition of gene expression data easier.When dealing with massive data,the study of single gene expression patterns has shown some limitations.Therefore,it is necessary to construct a co-expression network using these data within the whole genome,and to have further study on the gene expression pattern as well as the gene interaction from the perspective of systems biology.By analyzing the gene co-expression network,the features of co-expression modules can be identified,the function of the unknown gene can be predicted,and the inner mechanism of organism function can be revealed.Many genes are known to be pleiotropic,i.e.a single gene can be involved in the regulation of various biological functions,thus affect more than one trait.For this reason,modules in gene co-expression network are supposed to have overlapping structures theoretically.However,most of the traditional clustering methods ignore this feature in gene co-expression network.A small part of methods take this feature into account,but do not care much about accuracy of the overlapping feature.This facts leads to the poor performance of the current clustering method on gene co-expression network.In this thesis,a new fuzzy clustering algorithm aiming at those current problems is proposed.The main works of this thesis are as follows:(1)A fuzzy clustering model based on the network density is proposed.To deal with the overlapping structures in network,these overlapping stuctures can be identified by a fuzzy clustering based density model(abbreviated as FCBD model).Since there is no benchmark in gene co-expression network,another biological network with overlapping structures,protein-protein interaction network,is used to evaluate the accuracy of this model.It is proved that the FCBD model,the f-score of which is 0.46,reaches the best accuracy among 5 methods mentioned in the thesis.It gives about 10% improvement over CFinder,which ranks second with the f-score 0.42.(2)In this thesis,an improved fuzzy clustering algorithm based on the above model is proposed for the weighted overlapping network(Weighted Fuzzy Clustering Based Density,abbreviated as WFCBD).Gene co-expression network is known as a weighted and scale-free network.The FCBD model is improved according to these characteristics of gene co-expression network,the proposed algorithm is named WFCBD.This algorithm,compared with other algorithms,is evaluated by 5 different datasets using the indices modularity(EQ).The results show that the WFCBD algorithm is stable and reliable in different datasets.(3)Finally,a soybean gene co-expression network is constructed and the application effect of the algorithm WFCBD is evaluated.The soybean gene coexpression network is constructed on the 531 samples information and contains 33238 genes.After clustering,the 77 obtained clusters(modules)are given function annotation and biological explanation.Those pleiotropic genes in the overlapping regions are identified,and the function of unknown genes among those genes are predicted.Compared with other algorithms,the clustering results of WFCBD have more biological significance.
Keywords/Search Tags:Gene expression data, Gene co-expression network, Fuzzy clustering
PDF Full Text Request
Related items