Font Size: a A A

Research On Medical Insurance Data Mining Based On Hadoop

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2404330605956125Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the wide application of computers and the Internet,the amount of data generated and created by human beings has shown explosive growth.China has become one of the countries with the largest total amount of data and the richest data types in the world.At the same time,human beings are also users of data.How to process the data and make it into useful information has become an important research content in the field of machine learning,and thus data mining technology has emerged.As the most important part of social insurance,medical insurance is also a basic guarantee of people’s lives.It makes full use of the massive medical insurance data generated every day and performs data mining to discover the links between the data and provide clinical support and scientific decision-making for diseases,improving the effectiveness of medical treatment and the customization and modification of auxiliary policies are of practical significance.At present,many researchers at home and abroad apply data mining technology to the research of medical insurance data,including analysis of medical expenses,the identification of medical insurance fraud,the rational drug use of related diseases,and the management of medical insurance system,etc.This paper proposes to use data mining related technologies to analyze and predict different medical insurance data respectively,and to explore and analyze Cardio-cerebrovascular disease data to obtain the intrinsic correlation between Cardiocerebrovascular disease and some attribute characteristics.The blood glucose level of diabetes mellitus is predicted.Candidate data sets are continuously updated through feature engineering,and a prediction model with better learning ability is obtained through training.Through cross validation,the mean square error is taken as the evaluation standard of the model,and the predicted blood glucose level is obtained,thus improving the prediction accuracy.The analysis and prediction of data are realized by Hadoop cluster built on ordinary computers,and the MapReduce framework is used for parallel computing processing.Through data acquisition,data preprocessing,data visualization analysis,data mining,feature engineering,continuous model training,and multiple cross-validation,the experimental results have obtained the intrinsic relationship between cardiovascular diseases and some attributes,which proves that the obtained diabetes model has significantly improved the prediction accuracy.
Keywords/Search Tags:Hadoop, Data mining, Medical insurance, K-Means algorithm, Light GBM model
PDF Full Text Request
Related items