| With the improvement of domestic living conditions and the aggravation of the aging population,the number of diabetic patients has increased exponentially.However,the current diabetes prevention and diagnosis measures cannot well meet the needs of the diabetic population.At the same time,the development of informatization has enabled the medical industry to gradually accumulate a large amount of diabetes-related data.How to use the data to discover the hidden laws and further use scientific methods to predict diabetes is a hot research topic in the field of diabetes.This thesis uses data mining related methods to study diabetes data.Aiming at the fact that the existing diabetes prediction models do not perform feature discovery based on the characteristics of diabetes data,and the pursuit of the accuracy of specific data sets causes the problem of poor generalization ability.This thesis proposes the optimization of diabetes model based on genetic algorithm,which mainly includes the following contents:(1)Since diabetes prediction model ignoring the high redundancy and complex composition of diabetes data,this thesis proposes a greedy feature selection algorithm based on random forest.Firstly,using the diabetes data to construct a random forest model to evaluate the contribution of each feature of the diabetes data for the establishment of the model;Secondly,sorting the feature contribution and adding the features which will be evaluated in the set;Finally,retention is helpful for evaluating the features of the proposed model.The experimental results show that the use of random forest-based greedy feature selection algorithm can extract features of diabetes data which effectively remove the redundant and irrelevant features of the diabetes data,and improve the prediction effect of the proposed model.(2)Aming at the over-fitting problems caused by the use of complex models to pursue the high accuracy of small data sets.This thesis proposes the using of genetic algorithms to optimize logistic regression to establish a diabetes prediction model.Genetic algorithm selection,crossover and mutation are added when seeking the optimal solution of logistic regression.By optimizing logistic regression,the possibility of logistic regression is effectively reduced,and the generalizaion ability of logistic regression is improved.This thesis combines the above-mentioned greedy feature selection method based on the random forest with logistic regression which was optimized by genetic algorithm to form a diabetes prediction model with fusion feature selection.Experiments show that the diabetes prediction model integrating feature selection can select highly relevant features,which reduces the training time of the model and improves the effect of the diabetes prediction.(3)Combined with the above research content,by analyzing the needs of the diabetes prediction system,the prototype system of the diabetes prediction model was designed and realized.Finally the function of the system was verified through experiments.Experiments show that the system can realize the prediction of diabetes and effectively assist the medical staff to manage and detect patients,and validate the validity of the proposed model. |