Font Size: a A A

Research On Disease Gene Mining Algorithm Based On Data Fusion

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:D G LiuFull Text:PDF
GTID:2404330614961430Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Based on gene expression data,selecting disease-related genes from a large number of genes is of great significance for understanding the occurrence and development of diseases,and promoting diagnosis and treatment of diseases.At present,most research work is based on gene expression differences to find disease genes,it is difficult to find disease-related genes with small differential expression,and the functional data obtained through the difference network can better reflect the level of gene function change in a single sample,and find nondifferential expression Disease-related genes.In order to fully search for disease-related genes,this article will design a disease gene selection method that combines gene expression data and functional data based on balanced and imbalanced gene expression data,respectively.This method can not only discover genes with large expression differences,but also find expression differences.Smaller but altered genes.The main research work of this article is as follows:(1)This paper proposes a disease gene mining algorithm that combines gene expression data and functional data(GFDGM).Traditional differential analysis methods based on gene expression data are difficult to select genes with little expression differences but functionally related to disease.In response to this problem,this paper proposes a disease gene mining algorithm that combines gene expression data and functional data.First,a sample-specific network method is used to construct a differential network that reflects functional changes.Secondly,the functional data is to quantify the degree of function change of each gene based on the difference network,and fuse gene expression data and functional data.Finally,in order to ensure a low degree of redundancy and a large degree of correlation between genes in the fusion data,a non-dominated sorting gene selection method was designed based on mutual information to mine disease-related genes.The experimental results show that the gene subsets mined by the gene selection algorithm based on the fusion data are significantly improved in disease diagnosis compared with the original data,and the gene subsets contain disease-related genes with small expression differences.(2)This paper proposes a disease gene mining algorithm(IFDGS)for imbalanced fusion data.In the process of imbalanced data fusion,the data fusion method based on(1)has insufficient access to the functional information of a small number of samples,which makes it more difficult for the gene selection algorithm to fully search for disease-related genes in the case of imbalanced sample distribution between the classes.Therefore,this paper proposes a disease gene mining algorithm that fusion imbalanced gene expression data and functional data.First,the algorithm uses the classic oversampling method SMOTE to oversample a small number of samples in order to convert imbalanced data into balanced data.Secondly,the data fusion method of(1)is used to fuse data on the basis of balanced data.Finally,the algorithm uses the classification error rate and the number of features as the objective function,combined with the multi-objective evolutionary optimized gene selection algorithm to search for disease genes.In order to ensure that the diversity of the population and the subset of genes are related to the disease,the IFDGS algorithm has designed an initialization strategy and a population update strategy.By comparing four imbalanced data sets and four classic imbalanced databased feature selection algorithms.Experimental results show that IFDGS has a certain competitive advantage in diagnosis compared to the comparison algorithm.
Keywords/Search Tags:Expression data, Functional data, Imbalanced data, Fusion data, Multi-objective evolutionary optimization
PDF Full Text Request
Related items