A Study Of Gene Selection Method Based On Lasso And Binary Particle Swarm Optimization | | Posted on:2020-04-26 | Degree:Master | Type:Thesis | | Country:China | Candidate:Y Xiong | Full Text:PDF | | GTID:2404330596491440 | Subject:Control Science and Engineering | | Abstract/Summary: | PDF Full Text Request | | Cancer is a serious threat to human health,therefore it is critical to analyze and process the microarray data for the diagnosis and treatment of cancer.Gene selection is an important step for analysis of microarray data.In recent years,although many available gene selection methods could select the gene subset with a low redundancy and a high accuracy,selecting the optimal gene subset is still very challenging because most of the existing methods are with poor interpretability and possibly delete the key gene that is sensitive to the class.In order to solve problems above,one kind of gene selection methods based on the least absolute shrinkage and selection operator(Lasso)and improved binary particle swarm optimization(BPSO)is proposed in this dissertation.On the basis of BPSO,the advantage of Lasso that can simultaneously carry out important gene selection and parameter estimation is fully used,and the extreme learning machine(ELM)is combined to select the optimal gene subsets.The main contributions of this thesis are as follows:(1)To overcome the drawbacks that traditional gene selection methods are with poor interpretability and some important genes might be deleted,a new gene selection method(Lasso-CBPSO-ELM)combining Lasso and binary particle swarm algorithm is proposed.The method firstly preprocesses the original dataset with the improved signal-to-noise ratio,and then uses Lasso that can perform well on variable selection and parameter estimation to establish a key gene pool and to obtain the corresponding contribution value of gene that indicates the genes’ sensitivity to samples’ classes.Finally,the contribution value is encoded to the BPSO and a new velocity mapping formula is defined to improve the BPSO so that the best gene subsets are selected.The experimental results on several public microarray data verify the effectiveness of the proposed method compared to the relevant methods and show that Lasso-CBPSO-ELM can select the meaningful gene subset with high classification performance.(2)The Lasso-CBPSO-ELM method has some disadvantages that the Lasso may lead to over-fitting while the BPSO is likely to suffer from the local optimal problem.More seriously,the true gene structure might be ignored.To overcome the problems,the method combing the improved Lasso based on geodesic distance clustering and the improved BPSO with leaping factor is proposed.Firstly,To make full use of the relationship between high-dimensional genes,the initial gene pool is divided into several clusters based on their geodesic distances that can reflect the true gene structure.Lasso is then employed to select high predictive genes in clusters twice and to further calculate the contribution values.With the third-level gene pool established by double filter strategy,the improved BPSO is to perform gene selection by defining a leaping factor that makes the particle jump out from the local optimal position.Finally,the experimental results on both two-class and multi-class microarray data have demonstrated that this method select can select the meaningful gene subset with high classification performance. | | Keywords/Search Tags: | Binary particle swarm optimization, feature selection, microarray data, Lasso, extreme learning machine, k-medoids, geodesic distance | PDF Full Text Request | Related items |
| |
|