Font Size: a A A

Research And Application On Health Insurance Outlier Decetion

Posted on:2018-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:D X LiangFull Text:PDF
GTID:2334330512481432Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the furtherance of China's health care informatics,digitalizing health records has shown a trend of explosive growth.More and more researchers have shown interests in digging valuable information from massive health care data,especially in mining health insurance fraud which is hidden and harmful.However,most existing outlier detection approaches are not effcient when facing high-dimensional,unbalanced and mixed dataset due to lacking of consideration.Hence,it's significant to establish an fast and high accuracy outlier detection especially for health data.A two-stage hybrid method called MAVF-CIForest combining data mining and health insurance fraud was proposed in this dissertation,with respect to the characteristics of health dataset in someplace.Meanwhile,A health insurance outlier detection system on the Apache Spark platform was designed and achieved.The main contents of this dissertation were as follows:1.With respect to the high-dimensional and unbalanced dataset,a attributes selection methon based on ensemble and sampling was proposed in this dissertation.This approach increase the selection probability of attribute which is in favor of positive samples.Meanwhile,methon ensure the diversity of ensemble model by using attributes selector based on hierarchical sampling.Experiments demonstrated that the proposed feature subspace selection method is effective when facing high-dimensional and unbalanced health insurance dataset,the accuracy of the outlier detection algorithm had been improved by 90%.2.With respect to the mixed dataset,a two-stage hybrid method called MAVFCIForest was proposed in this dissertation.MAVF algorithm is in charge of categorical data.The results of MAVF became the input of CIForest algorithm combined with continuous data.The performance of the algorithm when facing the unbalanced data was improved by improved randomly generated hyper-planes,weighted voting approach and optimized model selection and combination.Experiments demonstrated that the algorithm is effective,the accuracy of the outlier detection algorithm have been improved by 22% facing highly unbalanced dataset and by 3% facing mixed dataset.3.Based on Spark's parallelization of the algorithm model,A health insurance outlier detection system was designed and achieved on the Apache Spark platform.The system includes the proposed MAVF algorithm and CIForest algorithm,as well as information gain rate algorithm.These algorithms can be seen by users,users can import the data by themselves,and adjust the parameters to achieve outlier detection according to different datasets and scenes.Also,system can display the corresponding test results through the visual module.And the result is promising.
Keywords/Search Tags:outlier detection, mixed data, Health insurance data, ensemble learning, spark
PDF Full Text Request
Related items