Font Size: a A A

Research On Feature Construction Algorithm For OMIC Data Based On The Siamese Network

Posted on:2022-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2480306758491874Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
The development of modern biotechnologies has generated rapid increase in the amount of OMIC data,more and more researchers have joined this field as the OMIC data became public.Biomics can be applied to various diseases,such as many kinds of cancer.Transcriptomes and methylomes are two major OMIC types,where transcriptomes measure the expression levels of all the transcripts,while methylomes depict the cytosine methylation levels across a genome.In some studies,the feature dimension of transcriptome and methylome is as high as tens of thousands,with small number of samples,which induces the “large p small n” paradigm.This study focused on the problem about OMIC with “large p small n” paradigm,and proposed an OMIC data feature construction method based on Siamese network,which transforms the original feature into a new space with minimized the distances of the same class and maximized distances of different class,and named as SiaCo.In order to solve the negative impact caused by “large p small n” paradigm,feature selection is added to the algorithm process to screen the high-dimensional features and remove the impact of needless and noisy data on the deep learning model.Although the dimension of new features that engineered by feature selection and construction has been reduced to hundreds,the number of features is not proportional to the effect and efficiency of classification.Taking all features for classification will lead to a decline in the final classification accuracy,so the Incremental Feature Selection strategy was applied to choose an optimal subset from the engineered features for the higher classification accuracy.Because of the limited number of OMIC samples,this study adopts three-fold cross-validation to classify.This paper proposed a simpler loss function calculation method,which takes less training time without reducing the performance of the SiaCo model.A comprehensive evaluation of the SiaCo algorithm was carried out using transcriptome and methylome datasets in this paper.In order to determine a universal group of parameters for the SiaCo experimental process,so that the SiaCo algorithm can be widely used in the transcriptome and methylome,reduce the threshold for nonprofessionals to use this model,and provide more convenient algorithm application,this paper designed six parameter adjustment experiments.This paper compared six feature selection algorithms including TRank,maximal information coefficient,Chisquare test,random forest and others to choose a feature selection method that is more suitable for the SiaCo model,and determined that the maximal information coefficient was the best match.This paper also considered the degree of matching between construction features and eleven classifiers,including support vector machines,gradient boosting decision tree,Bayesian network,extreme learning machine and etc.For the best performance of the SiaCo model,the learning rate,the number of feature selection,the design structure of the SiaCo sub-network and the loss function were experimentally discussed.This study stated the construction effect of the SiaCo algorithm by comparing the original and the constructed feature,and designed two experiments,the comparison of classification effects and the analysis of differential expression,from the perspective of combines features and single features.Experiments were carried out on the common batch effect effects of OMIC data,and the effect of incremental feature selection strategy compared with the random selection method was further analyzed to demonstrated the robustness of the SiaCo model.To test the effect of the SiaCo model from more aspects,this paper also attempts to conduct experiments on multi-classification tasks and non-OMIC datasets to compare the classification power of original and construction features.The results showed that compared with the raw features,the features constructed by the SiaCo model achieved higher classification power in the binary classification problem,and the improvement on multi-classification and non-OMIC tasks.In contrast,the single SiaCo feature didn't show more significant inter-class differences than the original feature,which may be because the Siamese network optimized the overall power of the SiaCo features rather than the discriminative ability of the single feature.
Keywords/Search Tags:Siamese network, feature construction, feature selection, OMIC data
PDF Full Text Request
Related items