Research On Medical Insurance Data Mining Based On Hadoop

Posted on:2021-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2404330605956125

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the wide application of computers and the Internet,the amount of data generated and created by human beings has shown explosive growth.China has become one of the countries with the largest total amount of data and the richest data types in the world.At the same time,human beings are also users of data.How to process the data and make it into useful information has become an important research content in the field of machine learning,and thus data mining technology has emerged.As the most important part of social insurance,medical insurance is also a basic guarantee of people’s lives.It makes full use of the massive medical insurance data generated every day and performs data mining to discover the links between the data and provide clinical support and scientific decision-making for diseases,improving the effectiveness of medical treatment and the customization and modification of auxiliary policies are of practical significance.At present,many researchers at home and abroad apply data mining technology to the research of medical insurance data,including analysis of medical expenses,the identification of medical insurance fraud,the rational drug use of related diseases,and the management of medical insurance system,etc.This paper proposes to use data mining related technologies to analyze and predict different medical insurance data respectively,and to explore and analyze Cardio-cerebrovascular disease data to obtain the intrinsic correlation between Cardiocerebrovascular disease and some attribute characteristics.The blood glucose level of diabetes mellitus is predicted.Candidate data sets are continuously updated through feature engineering,and a prediction model with better learning ability is obtained through training.Through cross validation,the mean square error is taken as the evaluation standard of the model,and the predicted blood glucose level is obtained,thus improving the prediction accuracy.The analysis and prediction of data are realized by Hadoop cluster built on ordinary computers,and the MapReduce framework is used for parallel computing processing.Through data acquisition,data preprocessing,data visualization analysis,data mining,feature engineering,continuous model training,and multiple cross-validation,the experimental results have obtained the intrinsic relationship between cardiovascular diseases and some attributes,which proves that the obtained diabetes model has significantly improved the prediction accuracy.

Keywords/Search Tags:

Hadoop, Data mining, Medical insurance, K-Means algorithm, Light GBM model

PDF Full Text Request

Related items

1	Research On Suspected Fraud Identification Of Commercial Medical Insurance Based On Data Mining
2	Chinese Parallel LDA Algorithm Based On Hadoop And Data Mining In Electronic Medical Records
3	Analysis And Research Application Of Hyperthyroidism Disease Model Based On Medical Big Data
4	Cardiopulmonary Occupational Disease Computing Model And Algorithm Based On Hadoop
5	The Application Of Data Mining Technology In Medical Information System
6	Research On Hadoop-based Medical Data Storage
7	Analysis And Research Of Tumor Mode Based On Medical Big Data
8	A Study On Collection And Application Of Glaucoma Clinical Cases For Big Medical Data
9	Research And Design Of TCM Data Mining System Based On Hadoop
10	Research And Application Of Closed Itemsets Mining Algorithm In The Formulation Of Medical Insurance List