| In recent years,deep neural networks(DNNs)have achieved great success in the classification of video and audio,which contributes to the revival of deep learning technology.Despite the solid performance of DNNs,it still has some disadvantages,such as the need for large number of training samples,the performance depending on parameter tuning,the preset structure,the difficulty in theoretical analysis,and the high demand on hardware configuration.From the perspective of data acquisition,even in the era of big data,it is so expensive and time-consuming to collect fully labeled samples that the specialists have to label the samples manually.Given the clear advantages of deep learning technology,it is meaningful to overcome the above defects of DNNs and explore the application of non-neural network deep learning technology on small and medium data sets.In 2017,the deep forest(DF)is introduced to be an alternative to DNNs.This model is a multi-layer cascade structure,and each layer is the ensemble of a group of independent forests(units).Compared with DNNs,the DF has few parameters,adaptive model complexity,stronger interpretability,and more suitability for small and medium data sets.The performance of the DF is highly comparable with that of DNNs with far less training time.The DF is not only a classification algorithm but also can be regarded as a deep learning framework and applied to different classification scenarios through endowing the unit with different functions.Based on the DF framework,combined with the perspective of algorithm optimization and application,four improved models are presented in this doctoral thesis,namely,the DF based on ensemble pruning optimization,the DF applied to sample imbalance problem,the DF applied to partial label problem,the DF applied to combination drug prediction.The main research work is summarized as follows.(1)Each layer of the original DF is the ensemble of decision trees ensemble.The decision trees with poor performance will bring a negative impact on the prediction of the model;the decision trees with similar classification behavior will bring redundancy to the model.To solve such problems,an ensemble pruning method based on feature vectorization and quantum walk is proposed,to optimize the unit of DF,and ultimately to achieve an improved deep forest with a simplified model,improved performance.(2)In many cases of imbalanced learning,we tend to pay more attention to the minority class samples.If DNNs are trained with imbalanced data,it would cause two obvious defects:the classification results inclining to the majority class and the over-fitting when the samples,especially the minority class samples,are insufficient.In this work,a new unit for DF is designed,by integrating the synthetic minority over-sampling technique into the iterative process of the AdaBoost algorithm,which strengthens the learning of the minority class samples at the classification boundary,to improve the recognition ability of the model for the minority class.(3)Partial label learning is a branch of semi-supervised learning.A partial label sample corresponds to multiple candidate labels,among which only one label is valid.The task of partial label learning is to learn a classifier from partial label samples,so as to accurately predict the true label of an unknown sample.Because the classifier cannot directly access the true label of the training sample in the learning process,it further increases the difficulty of classification.This research uses an improved error correction output coding algorithm as the unit of the DF,transforming a semi-supervised learning problem into a combination of multiple supervised learning problems.Meanwhile,an evaluation method with uncertainty is designed to meet the growth of the DF cascade in the case of partial label learning.Finally,the improved unit is embedded into the reconstructed cascade framework,to deal with the classification problem of partial label samples.(4)Combination therapy is widely used in cancer treatment.The combination of multiple drugs can target multiple molecules or diseases in cancer cells at the same time,which can effectively reduce the resistance of the tumor to a single drug.However,it is impractical to search for all possible drug combinations for a specific disease,because the increasing number of drug would lead to an exponential increase in the number of drug combinations that need to be searched.Therefore,effective computational models are in urgent need to reduce the solution space of drug combination search.In view of the characteristics of high dimension,large feature redundancy and data imbalance in combination drug samples,two specialized units for the DF are suggested in this work,including an extreme tree forest based on data complexity dimension reduction,and a random forest with resampling.The proposed model could effectively solve the problem of classification difficulty in combination drug prediction.In summary,the above four models are based on the improvement of the DF algorithm.In their respective application fields,compared with the most advanced methods,the performance of the proposed four models is greatly enhanced,and the interpretability of the models is more intuitive.This research work,in a certain way,has a guiding significance for exploring the deep learning technology of non-neural networks. |