| The whole-genome selection method proposed by Meuwissen in 2001 has greatly accelerated the genetic processes of animal and plant breeding,and then genomic selection methods based on different principles have been continuously proposed by breeders,continuously enriching and expanding the method system of genomic selection.As sequencing technology advances and the price of sequencing continues to fall,genotype data has become more refined and the potential relationship between various variables has become more and more complex.Traditional genomic selection methods cannot meet the explosive growth of genes.Due to the computational demands brought by Big Data,it is no longer possible to build an efficient model to accurately capture the structure in the data.How to get rid of the traditional genomic selection method and find a more reasonable and efficient way to estimate the breeding value of individuals has become the pursuit of every data analyst.With the development of computers and artificial intelligence,machine learning that use computer programs to identify any relevant connections between the independent and dependent variables.and generalize to unknown datasets is applied to genomic selection,but due to the complex parameter adjustment,the predictioin correlation of a single model is not stable.Hence,the ensemble learning that integrates various machine learning models is applied to genomic selection.Based on the framework of Blending,this study aimed at the shortcomings of Blending’s unstable prediction results and combined with the actual situation of breeding data,improved the Bootstrap method in Bagging and proposed an upgraded version of Bootstrap,called "Q-Bootsrap".Blending was combined with Q-Bootstrap to create the new algorithm Babling,which improves the algorithm’s predictability and stability over Blending;at the same time,this paper also proposes a algorithm called Max Diff for selecting a group of model combinations with differences to meet Babling’s needs for basic learner in practical applications for the heterogeneous ensemble learning.The main findings of this study are as follows:(1)This article proposes a new ensemble learning framework—Babling: This method combines the Bootstrap method in Bagging and Blending.While ensuring the speed of Blending,it further improves the prediction accuracy and stability of the Blending algorithm.(2)One of the disadvantages of the Bootstrap approach in Bagging is to generate a resampling subset through random sampling,and Babling’s predictive performance depends on choosing an appropriate resampling subset.Therefore,in order to overcome the bias brought by the randomness of Bootstrap to the Babling algorithm,we propose an upgraded version of Bootstrap called "Q-Bootstrap",which learns the difficulty of the sample according to the error of the prior learner on the training set.The data points are classified,and the resampling subset is extracted based on the classification.The research results show that with the continuous increase of the sample size,the Q-Bootstrap method can further increase the prediction accuracy and stability of Babling.(3)The combination of basic learners can affect the effectiveness of the ensemble learning.Therefore,for the selection of basic learners in ensemble learning,we collected 66 machine learning models for predicting regression problems from the R language website and constructed a feature matrix with 66 rows and 30 columns based on the literatures and the R package manuals.Based on the feature matrix constructed above,we propose a algorithm called Max Diff which can maximize differences between base learners and combine cluster analysis and distance analysis algorithms.This algorithm can select a group of model combinations with differences from the feature matrix and apply it to Babling,to further improve the application potential of Babling in practice.(4)Finally,the article uses multiple sets of simulated data and real data to evaluate the prediction accuracy of Babling,Bagging,Blending,Stacking,a single model,and GBLUP.The research results show that Babling outperforms other ensemble learning methods and GBLUP in terms of breeding value estimation accuracy in the face of different data sets.In summary,this study propose a new heterogeneous ensemble learning method—Babling.This method not only expands the method system of heterogeneous ensemble learning,but also provides a new strategy for the application of genome selection in animal and plant breeding;and this study also proposes a corresponding method called Max Diff for the combination of basic learners in ensemble learning.This algorithm further improves the practicality of Babling in breeding. |