Research And Application On Health Insurance Outlier Decetion

Posted on:2018-06-09

Degree:Master

Type:Thesis

Country:China

Candidate:D X Liang

Full Text:PDF

GTID:2334330512481432

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the furtherance of China's health care informatics,digitalizing health records has shown a trend of explosive growth.More and more researchers have shown interests in digging valuable information from massive health care data,especially in mining health insurance fraud which is hidden and harmful.However,most existing outlier detection approaches are not effcient when facing high-dimensional,unbalanced and mixed dataset due to lacking of consideration.Hence,it's significant to establish an fast and high accuracy outlier detection especially for health data.A two-stage hybrid method called MAVF-CIForest combining data mining and health insurance fraud was proposed in this dissertation,with respect to the characteristics of health dataset in someplace.Meanwhile,A health insurance outlier detection system on the Apache Spark platform was designed and achieved.The main contents of this dissertation were as follows:1.With respect to the high-dimensional and unbalanced dataset,a attributes selection methon based on ensemble and sampling was proposed in this dissertation.This approach increase the selection probability of attribute which is in favor of positive samples.Meanwhile,methon ensure the diversity of ensemble model by using attributes selector based on hierarchical sampling.Experiments demonstrated that the proposed feature subspace selection method is effective when facing high-dimensional and unbalanced health insurance dataset,the accuracy of the outlier detection algorithm had been improved by 90%.2.With respect to the mixed dataset,a two-stage hybrid method called MAVFCIForest was proposed in this dissertation.MAVF algorithm is in charge of categorical data.The results of MAVF became the input of CIForest algorithm combined with continuous data.The performance of the algorithm when facing the unbalanced data was improved by improved randomly generated hyper-planes,weighted voting approach and optimized model selection and combination.Experiments demonstrated that the algorithm is effective,the accuracy of the outlier detection algorithm have been improved by 22% facing highly unbalanced dataset and by 3% facing mixed dataset.3.Based on Spark's parallelization of the algorithm model,A health insurance outlier detection system was designed and achieved on the Apache Spark platform.The system includes the proposed MAVF algorithm and CIForest algorithm,as well as information gain rate algorithm.These algorithms can be seen by users,users can import the data by themselves,and adjust the parameters to achieve outlier detection according to different datasets and scenes.Also,system can display the corresponding test results through the visual module.And the result is promising.

Keywords/Search Tags:

outlier detection, mixed data, Health insurance data, ensemble learning, spark

PDF Full Text Request

Related items

1	Identify Abnormal Data Objects In Medical Insurance Based On Outlier Detection Method
2	Research On Outlier Detection Algorithm And Its Application In Abnormal Detection Of Clinical Prescription Data In Electronic Medical Records
3	Research On Key Issues Of Fraud Detection In Medical Insurance Big Data
4	Research Of Quality Optimal Control Method Of Big Data For Remote Health Care Monitoring
5	Research On The Collection And Analysis Of Body Data Based On Big Data
6	Research And Application On The Medical Insurance Reimbursement Fee’s Decision Model Based On Big Data
7	Research And Implementation Of Cardiovascular Disease Prediction System For The Elderly Based On Big Data Framework
8	Early Detection Of Coronary Heart Disease Based On Ensemble Learning Algorithm
9	Research On Decision Model For Incomplete Mixed Data Of Electronic Health Records
10	Research On The Key Technologies Of Health Database Storage Architecture And Efficient Data Access