Font Size: a A A

The Research For Bological Data Analysis Methods Based On Multiobjective Evolutionary Learning

Posted on:2022-09-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:1480306491461984Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary subject that uses computers as auxiliary tools to model,analyze,or simulate problems in the biological field utilizing mathematical and statistical methods.With the breakthrough development of biotechnology,biological data has been accumulated massively.Single-cell RNA-seq(sc RNA-Seq)data and cancer gene expression data,as two typical biological data,provide a basis for mining the deep biological laws contained in them,and also pose a challenge to biological data analysis.When analyzing the sc RNA-Seq data and cancer gene expression data,accurate grouping and identification of the data are particularly critical.Accurate identification of sc RNA-Seq data is a clustering problem,which is the basis of in-depth biological analysis;correct grouping of cancer gene expression datasets,namely accurate cancer diagnosis of patients,is a classification problem,which is helpful to determine a personalized cancer treatment plan for the patient.Therefore,the study of efficient learning methods for clustering and classifying those two kinds of biological data has become an important direction in the field of bioinformatics.This dissertation focuses on sc RNA-Seq data and cancer gene expression data to carry out a series of research work for the sc RNA-Seq data clustering analysis and cancer diagnosis methods.To improve the identification and diagnosis capabilities of the algorithm,it takes the characteristics of biological data,breaks the limitations of traditional learning algorithms,uses multiple learning validity indices as the objective functions to optimize learning results,and captures multiple attributes of different datasets.According to the above ideas,four biological data analysis methods are proposed based on multi-objective evolutionary learning in this study.Specifically,the main works are as follows:(1)An evolutionary multiobjective deep clustering model for clustering the sc RNA-Seq data is proposed.First,differential gene expression analysis is used to remove redundant and irrelevant genes from high-dimensional raw data,identifying differentially expressed genes under different biological conditions.After that,those recognized gene data is projected into different low-dimensional non-linear embedding subspaces through a deep autoencoder.Then the basic clustering algorithm is applied to various non-linear embedded subspaces to produce multiple basic clustering results.When the population is initialized,the individuals are specially encoded to generate multiple cluster ensembles with different basic clustering results.To guide the evolution,two cluster validity indices and the number of basic clusterings are used as the objective functions.Then the multi-objective clustering problem model is established based on those three objective functions,the final clustering result is obtained by optimizing that model under the hypervolume-based multi-objective optimization.In order to demonstrate the effectiveness of the proposed model,it is compared with eight clustering algorithms and three multi-objective optimization algorithms on six real sc RNA-Seq datasets.The experimental results show that the proposed model has advantages in clustering sc RNA-Seq data.In addition,the extended experimental results demonstrate the effectiveness of the proposed model from multiple perspectives.(2)A multi-objective robust continuous clustering method is proposed for clustering the sc RNA-Seq data.To address the problem of instability connection problem in the robust continuous clustering algorithm,the proposed method uses two cluster validity indices as the objective functions to establish a multi-objective clustering model and employs a decomposition-based multi-objective method to dynamically optimize the connection weights.Moreover,in order to select appropriate contraction parameters for different sc RNA-Seq datasets,that parameter is added to the connection weight vector for dynamic optimization.In the optimization process,the archive is used to retain the non-dominated solution set and updated by the generated offsprings.Finally,the optimal clustering result in the archive is output.In order to evaluate the performance of the proposed algorithm,two evaluation metrics are calculated on six real sc RNA-Seq datasets.The experimental results demonstrate that the proposed method is superior to other algorithms in clustering the sc RNA-Seq data.In addition,the biological significance of the method is reflected by the visual analysis and biointerpretability analysis.(3)A multi-objective ensemble cuckoo search algorithm based on decomposition for cancer diagnosis is proposed.First,a multi-objective cancer diagnosis model is established based on four objective functions,including two entropy-based measures: relevance and redundancy,the number of features and the accuracy.Then,the population is initialized and the fitness of the population are calculated.A decomposition-based multi-objective framework is used to optimize the population and obtain the final classification result.To calculate the objective functions of each individual in the population,a binary encoding method is proposed to encode each individual,selecting a subset of genes.Moreover,two improved search strategies are proposed.After that,the candidate pool is generated including multiple discard probability values and those search strategies.Based on it,an ensemble mechanism is proposed that can select the search strategy and discard probability from the candidate pool according to the success probability of previous iterations.It increases the probability of producing highquality candidate solutions.To demonstrate the effectiveness of the proposed method,thirtyfive cancer gene expression datasets and a colon adenocarcinoma dataset are used.Experimental results from multi-objective and classification perspectives demonstrate that the proposed algorithm can effectively diagnose cancer subtypes.Other analysis results also demonstrate the effectiveness of each component of the proposed algorithm.(4)A multiobjective PSO-based hybrid algorithm is proposed for cancer diagnosis.First,the population is initialized,then the population based on four objective functions is optimized.After that,the non-dominated solutions are kept in the archive based on the dominance relationship.Finally,the optimal classification result is output.During the evolution process,the binary coding strategy is used to calculate the objective functions of each individual.To balance the convergence ability and global ability of the algorithm,the mutation operator and effective local search operator are proposed.The mutation operator can enhance the exploration ability of the particle swarm.In terms of the local search operator,it is based on the "best/1" operator in the differential evolution algorithm,using the individual optimal particle and two random particles to generate high quality solutions in the neighborhood.To verify the performance of the algorithm,in the experiment,several evaluation metrics are evaluated on thirty-five cancer gene expression datasets and six real-world disease datasets.Meanwhile,seven multi-objective algorithms,six classification algorithms,and five feature selection methods are compared with the proposed algorithm.The experimental results from different perspectives and other analysis results comprehensively demonstrate the effectiveness of the proposed algorithm for cancer diagnosis.
Keywords/Search Tags:Clustering, Classification, Machine Learning, Dimensionality Reduction, Multiobjective Clustering, Multiobjective Classification, Multiobjective Optimization, Single-cell RNA-Seq Data, Cancer Gene Expression Data
PDF Full Text Request
Related items