| With the rapid development of biotechnology,a large amount of data and information has been recorded,which provides a great opportunity to explore the fundamental biological mechanism behind the disease.Identification of candidate genetic biomarker of Alzheimer’s disease(AD)is of great significant to the understanding of the disease development,which has many clinical benefits such as advance prevention,identification,and intervention of the disease.The construction and application of disease diagnostic prediction require precise feature selection and feature representation as well as feature fusion and feature discovery.This paper proposed specific machine learning model to study these representative problems in a series of technical steps.The main tasks completed in this paper include:(1)proposal of a learning mode based on combining two feature selection methods in a bid to discover genetic biomarker candidates that are highly associated with AD.Massive biological data record a large amount of significant and redundant information about the disease.The number of genetic biomarker candidates of each patient can be as high as million,which leads to high computing cost and poor classification performance caused by irrelevant genetic features.In order to solve this problem,this paper adopts hybrid feature selection and adds an additional framework to improve the error reduction in the process of eliminating redundant genetic features.The learning model is composed of a filter feature selection method,wrapper feature selection method and correlation bias reduction.They share the basic feature selection method,continuously train,and calculate the interrelationship of each genetic biomarker candidates;therefore,a list of representative genetic biomarkers is obtained.As a post-processing step,an additional correlation bias reduction framework is used on the basis of the hybrid feature selection method to avoid the incorrect removal of genetic biomarker candidates from the list of representative genetic markers due to its highly correlated gene positions.The model benefits from the advantage of two feature selection methods and is able to quickly select significant genetic biomarkers,with the additional correlation bias reduction framework prevents the elimination of unique genetic features.The model has been applied to two public AD datasets and the experiment result show that the proposed method has achieved the current optimal performance compared with existing feature selection method.(2)proposal of the integration of dual modalities data and feature kernel-based weighting approach on a kernel-based learning model to discover the connection of genotype and phenotype with AD,improve diagnostic performance of the disease and enhance the generalization ability of the model.However,due to the high heterogeneity and complexity of omics data,the performance of kernel learning method will be reduced caused by the characteristics of omics data.Therefore,in order to solve these problems,feature kernel-based weighting approach is used to achieve sparse feature kernel selection on a multiple kernel learning framework,which can reduce the negative impacts on the architecture of kernel learning method arise from using omics data.At the same time,the method can improve overfitting problem caused by the scarcity of samples in biological data mining.The experimental results show the robustness of the proposed method and the positive effect of the feature kernel-based weighting approach.Compared with other kernel-based learning frameworks,this method has better diagnostic ability and improves the problem of overfitting of omics data.This shows that the sparse kernel selection is an effective method to improve the lack of training samples. |