| With the development of gene chip technology,more and more researchers pay attention to gene expression data.Through reasonable analysis of gene expression data,useful key information is extracted,and correct prediction results will be used to bring effective guidance recommendations for the treatment of diseases.However,gene expression data are generally characterized by high dimensions and few samples.With the increase of dimensions,it is easy to cause the problem of "dimensional disaster",which greatly reduces the computational efficiency and accuracy of existing data mining and machine learning algorithms,which brings certain challenges to classification learning.For the classification task of gene expression data,this article explores how to extract a few genes that are really related to classification from high-dimensional data with a large number of redundant genes and noise,which is convenient for classification research.Considering that the single feature selection method has limited dimensionality reduction effect in gene expression data,this paper proposes a hybrid feature selection algorithm by combining the information gain method that can effectively select the genes related to classification and the improved Lasso algorithm with strong ability to remove redundant features.: IGIL-Seleciton,aims to find a feature selection method with better ability to select target genes and more applicable in different types of gene expression data.Through a series of experiments on two types of gene expression datasets of binary classification and multi-classification,the results show that the gene selection effect of IGIL-Seleciton is better than using the information gain method and Lasso method alone,and IGIL-Seleciton When classifying or multi-classifying gene expression data,redundant genes can be removed steadily and efficiently,and fewer key genes can be selected,while maintaining good classification accuracy.In summary,for the task of gene expression data classification,this paper analyzes previous research results and proposes a feature selection method based on the information gain method combined with the improved Lasso algorithm.Experiments show that it has a stable and good information gene selection ability.It is a relatively good dimensionality reduction method.At the same time,further research is needed on the discretization of gene values and the selection of classifiers. |