Font Size: a A A

Research On Feature Construction Algorithm For Lifespan Estimation Problem Based On Autoencoder And Transformer

Posted on:2024-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2544307064997179Subject:Engineering
Abstract/Summary:PDF Full Text Request
When cancer patients and clinicians work together to make treatment decisions at this stage,they focus on the factor of the length of survival.Most existing studies investigate the risk of survival or recurrence of cancer patients after a specific period(e.g.,1 year or 5 years),but do not provide a more specific understanding of cancer patients’ survival period.With the rapid development of modern high-throughput technologies,biomics data are increasingly being publicly released and applied to a variety of diseases,such as cancer.A large number of studies have been conducted using DNA methylation datasets to find clinical associations between DNA methylation biomarkers and tumours.However,these datasets are not conducive to model training and subsequent studies due to the "large p small n" problem where the number of features is much larger than the number of samples.To predict the specific survival time of patients and solve the problem that the feature dimension is much larger than the number of samples,this paper proposes a SLOGAN model based on an autoencoder and Transformer,which combines feature selection and feature construction,and performs specific survival time prediction on the final selected feature subset.The main purpose of performing feature selection in this work is to reduce redundant features and data noise,and reduce computational overhead while improving the accuracy of prediction.However,since feature selection cannot generate new features,it cannot improve the quality and information abundance of the features themselves.Therefore,this work introduces an autoencoder and Transformer-based feature construction method to map the original features to a new space,thus improving the prediction performance of the model on a subset of features.Meanwhile,the idea of adversarial learning is combined in the part of feature construction by using an ordinary autoencoder in the generator part and adding a Transformer mechanism in the discriminator part,and a loss function sparse loss is proposed to assist the training of the model and increase the quality of constructed features.And in the process of model construction,we use the idea of adversarial learning to "confront" the input and output of the generator,so that the generator can learn the information of the original features better,and the constructed features can show good performance in prediction.In this paper,we use 10 datasets from TCGA database and design six experiments based on the above datasets,including the selection of the number of cycle construction,the selection of the number of hidden layer nodes,the verification of the necessity of feature construction,the experiment of feature selection method comparison,the experiment of feature construction network model comparison,and the experiment of feature construction network model dissolution.By comparing the prediction performance under different cycle construction times,it can be found that the quality of constructed features is not better with more cycle construction times.To select the appropriate number of intermediate hidden layer nodes,this paper compares the construction results of the SLOGAN algorithm under different intermediate hidden layer nodes.The necessity of feature construction can be verified by comparing the regression prediction performance of feature subsets on the model after feature selection only with that after feature selection and feature construction.In this paper,by comparing the SLOGAN feature selection method with existing feature selection algorithms and the SLOGAN feature construction model with existing neural network models,it is demonstrated that the features generated by the SLOGAN algorithm are superior.To verify the effectiveness of each part of the neural network in the feature construction part of the SLOGAN algorithm,ablation experiments are conducted in this paper to compare the performance of the features constructed by removing different parts of the network on the regression prediction model.The experimental results show that the new features constructed by the SLOGAN algorithm achieve better prediction performance in the regression prediction problem,and demonstrate the necessity of feature construction and the indispensability of each part of the feature construction model as found by the ablation experiments.
Keywords/Search Tags:DNA methylation, Feature selection, Feature construction, Autoencoder, Generative adversarial networks
PDF Full Text Request
Related items