Font Size: a A A

Research And Implementation Of Aided Diagnosis System For Infectious Liver Disease Based On Ensemble Learning

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:F W DongFull Text:PDF
GTID:2404330596997070Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Infectious liver disease is a kind of infectious disease with a wide distribution,which is highly contagious and it has various ways of transmission,which has a huge impact on society.Infectious liver disease is usually diagnosed according to the marker index,but when the disease is in the non-acute phase,the insignificant and unstable markers bring great difficulties to disease diagnose.Therefore,the use of machine learning methods to explore the hidden relationship between a large number of features and diseases is one of the main directions in the study of diagnosis of infectious liver diseases.This study takes the diagnosis of infectious liver disease as the main research object,and discusses the feature selection method and classification model design in the training process of the diagnosis model.In this thesis,the improved ensemble feature selection method is used to select the data,then the ensemble learning method is used to construct the classification model and ensemble pruning.Finally,the design and implementation of the auxiliary diagnostic system for distributed infectious liver disease is completed.The specific work of this thesis is as follows:(1)Anew ensemble feature selection method CB-EFS is proposed.The CB-EFS method first clusters the feature subsets obtained by multiple feature selectors to obtain more differentiated subsets to enhance the ensemble performance.Then,the subsets are voted and integrated,and the higher ranked features are selected.There are two innovations in this method:First,because there are fewer feature subsets for clustering,and it is diff-icult to determine the window radius in the cluster when using the traditional mean-shift based clustering method,so that the optimal clustering result cannot be obtained.CB-EFS makes improvements.Firstly,multiple cluster radii are used to cluster multiple cluster cores,and all the cluster cores are clustered again to obtain the cluster distribution law,so that the cluster core group is representative and the difference is larger.As a clustering result,a better clustering performance is obtained.Secondly,when integrating feature subsets,the original Borda voting method can well select features that are recognized by most selectors,but its simple linear weighting based on order will result in partial minority.The feature with good performance cannot obtain a higher final ranking.In this thesis,the improved nonlinear weighting method is used to make these features get higher weight and greater probability of being selected,and it is more intuitive to see the importance degree of all features.The experimental results show that the accuracy of classification using CB-EFS method is 0.998%higher than that of other feature selection methods,and it has better sensitivity and stability.(2)The ensemble classification model achieves better classification performance by integrating multiple base classifiers,but using too many classifiers will reduce the generalization ability and classification speed of the integrated model,and will waste computing resources.Therefore,removing partial classifiers from ensemble through ensemble pruning can improve the ensemble performance and save computing resources.In this thesis,to improve the Pareto ensemble pruning method,and the three-objective optimization ensemble pruning method is proposed.Based on the original two optimization objectives of maximizing classification accuracy and minimizing ensemble scale,the maximum base classifier difference is added to solve the over-fitting problem that exists in the original method.At the same time,the algorithm solving method is optimized for the case that the Pareto optimal solution which satisfies the three objectives is rare.The experimental results show that the accuracy of the integrated classification model using the pruning method is 0.67%higher than that of the original method,and the degree of over-fitting of the model is significantly reduced.(3)Based on the above research,the design and implementation of Hadoop-based distributed infectious liver disease auxiliary diagnosis system is completed,and the intelligent diagnosis of infectious liver disease and the trend of disease transmission are visualized.
Keywords/Search Tags:Diagnosis of infectious liver disease, ensemble feature selection, mean-shift based clustering, ensemble pruning, medical diagnostic system
PDF Full Text Request
Related items