Font Size: a A A

A Study Of Data Generation Method For Gene Expression Profile Based On Conditional Generative Adversarial Networks

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:S FangFull Text:PDF
GTID:2480306506463294Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Effective analysis of gene expression profile data could help to find the key pathogenic genes and identify cancer,thus providing technical support for the clinical diagnosis of cancer.Due to its characteristics of high dimension and few samples,most of the methods are difficult to obtain satisfactory results in gene expression profile data processing,so it is necessary to expand them.Conditional generative adversarial network(CGAN)has achieved good performance in image generation,but it directly generates samples from gene expression profile data,which leads to the problem that algorithm could not convergence and“dirty” samples in the generated samples.To solve the above problems,an improved conditional generative adversarial network is proposed in this thesis to expand gene expression profile data and improve the quality of the generated samples,so as to provide a new idea for the study of gene expression profile data and high dimension and few samples.The main work of this thesis is as follows:(1)In order to solve the problem that traditional CGAN algorithm could not converge and there are “dirty” samples in the generated samples,a conditional generative adversarial network based on feature loss penalty and probability model(CGAN-FLP-PMT) is proposed in this thesis.On the one hand,in order to improve the convergence of traditional CGAN in sample generation of gene expression profile data,a feature loss penalty strategy is proposed in this thesis,and the discriminant loss function is improved,so that the generator model and the discriminant model could game each other to reach a Nash equilibrium state,and the algorithm could be converged.On the other hand,in order to solve the problem of the “dirty”samples from the generated samples,a strategy of the probability model is proposed to set the threshold in this thesis,which could discriminate and screen the generated samples,and then discard “dirty” samples.Finally,the experimental result on several public gene expression profile data sets shows that the CGAN-FLP-PMT algorithm could generate high-quality samples and achieve the purpose of data expansion,which verifies the feasibility of the algorithm.(2)In order to improve data generation ability of CGAN-FLP-PMT and solve the problem that the optimal threshold is difficult to be determined in the probabilistic model,a conditional generative adversarial network(CGAN-PMFB-PW)based on the feedback mechanism of the probabilistic model and particle swarm optimization algorithm is proposed.On the one hand,because the data generation ability of CGAN-FLP-PMT is limited,this thesis proposes a feedback mechanism of probability model,which could stimulate the generator model according to the feedback information of probability model and strengthen its generation ability.On the other hand,due to the acquisition of the optimal threshold value in the probabilistic model is a process of many manual attempts,it is difficult to determine the optimal threshold,so an improved particle swarm optimization algorithm based on inertial weight is proposed in this thesis,which could perform a global search for the optimal threshold in the probabilistic model space.Finally,Experimental result on multiple gene expression profile data sets shows that the CGAN-PMFB-PW algorithm could improve the classification performance and generate high-quality samples stably,which achieves the purpose of data expansion for gene expression profile.
Keywords/Search Tags:Gene expression profile data, conditional generative adversarial network, particle swarm optimization algorithm, data generation
PDF Full Text Request
Related items