Font Size: a A A

Improvement Of Random Forest Algorithm And Its Application In Medical Diagnosis System

Posted on:2021-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:R GaoFull Text:PDF
GTID:2392330614966070Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of medical information technology,major hospitals have gradually established complete electronic information systems,providing sufficient data support for the integration of medical diagnosis and big data mining technology.Among them,the random forest algorithm is one of the most widely used mining algorithms due to its high classification accuracy.However,since medical data is usually extremely unbalanced and has high feature dimensions,the classification performance of traditional random forest algorithms in the medical field is severely weakened.At the same time,the random forest itself needs to construct multiple decision trees,which results in long computation time.In view of the above problems,this paper analyzes and studies the application of random forest algorithm in the field of medical diagnosis in depth,proposes several targeted algorithm improvements,and finally designs and implements a set of medical auxiliary diagnosis system for sepsis.The main work is as follows:First,to solve the problem of extremely unbalanced medical data and high feature dimensions,an improved random forest improvement algorithm RW?RF(Relief F & Wrapper Random Forest)based on feature reduction is proposed.The improved Relief F algorithm is used to distinguish features according to their classification capabilities.During the random forest construction process,features are extracted hierarchically and the decision tree is recursively trained until the subtree classification performance reaches the best.The experiment proves that the RW?RF algorithm has better classification accuracy than the traditional random forest algorithm and also performs well in imbalanced data.Secondly,in order to reduce the time complexity of the algorithm,an improved random forest algorithm based on Spark is proposed,and the two parts of the RW?RF algorithm are designed in parallel.One is the parallel computation of feature weights,and the other is the parallel construction of random forests.Experiments show that the parallelized algorithm has better operation efficiency,scalability and scalability.Finally,based on the improved random forest algorithm and Spark platform,an auxiliary diagnosis system for sepsis was constructed.The system includes several stages of data processing,classification rule acquisition,model evaluation,and disease prediction.The effectiveness and feasibility of the system have been proven using the sepsis data set published by the whale community.
Keywords/Search Tags:Random forest, RW?RF algorithm, parallelization, medical diagnosis, sepsis
PDF Full Text Request
Related items