| The advancement of DNA microarray technology has made it possible to study diseases at the genetic level.However,the characteristics of microarray data have brought challenges to traditional data analysis methods.How to construct an accurate classification model is an important issue in the field of disease research.Firstly,this thesis summarizes the research development and current status of disease classification models,grouping strategies,and weighted gene co-expression network analysis(WGCNA),introduces WGCNA and sparse logistic regression model in detail,and chooses the sparse group Lasso as the basic model for this study.Secondly,a new module division approach(new WGCNA)is proposed in light of WGCNA’s inadequacies.Based on the WGCNA,a new step is added to divide genes into the modules closest to them.The experimental results of four microarray datasets and simulation data show that new WGCNA outperforms WGCNA in various indicators such as Dunn index and module quality.That is,the implementation of the new step is conducive to improving the module division results of WGCNA.Then,in order to address the issues of class imbalance in microarray data and incorrect group weight in the sparse group Lasso model,new approaches for computing sample weight and group weight are proposed,and the MSGL model is built.A block coordinate descent approach for solving the MSGL model is also provided.Finally,the module division results of the new WGCNA are used for gene grouping,thus forming the new WGCNA-MSGL model for disease classification.Experimental results of the colitis data show that this model can achieve higher prediction accuracy and better robustness under fewer eigengenes than the other six classification models.The experimental results on simulated data demonstrate the advantages of the new WGCNAMSGL model in variable selection. |