Font Size: a A A

Research On Hybrid Algorithm Of Medical Insurance Fraud Detection Based On Random Forest

Posted on:2024-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YuFull Text:PDF
GTID:2544307151956739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of China’s economic level,the domestic medical insurance system is gradually becoming more universal.However,the problem of medical insurance fraud has gradually emerged,which has a huge impact on the healthy operation of the medical insurance system.In recent years,the vigorous development of machine learning and big data technology has attracted much attention,bringing reform and innovation to many traditional industries.Identifying fraudulent data from massive medical insurance data and detecting fraudulent behavior has not only become an important responsibility of the medical insurance department,but also affects the security of medical funds.Therefore,it is of great significance to study efficient medical insurance fraud detection methods based on machine learning.Firstly,aiming at the heterogeneity,complexity and imbalance of medical insurance fraud data,a Gaussian random forest hybrid medical insurance fraud detection algorithm is proposed.The algorithm determines the number of clusters through the K-means clustering model,aggregates the data into multiple clusters using a Gaussian mixture model,and then performs category balancing in each cluster.Multiple decision tree models are trained for fraud detection.Secondly,in view of the large amount of calculation in the training process of the Gaussian model of medical insurance fraud,easy to fall into the local optimal solution,sensitive to outlier and other problems,a fuzzy C-means random forest hybrid medical insurance fraud detection model is proposed.The algorithm uses a fuzzy C-means clustering model to measure similarity based on the membership of data objects,and iteratively updates the clustering center and membership to complete the clustering process.Multiple decision tree models are trained in each cluster for fraud detection.Finally,the performance verification and comparative analysis of the proposed method were conducted on the medical insurance provider reimbursement dataset released by the US Medical Insurance and Medical Services Center and the Alibaba Tianchi big data competition,verifying the effectiveness of the algorithm.In order to solve the problem of medical insurance data training difficulty,some schemes are proposed,such as cluster analysis through unsupervised clustering model,and category balance of data through sampling method,so as to better train the supervised classification model.
Keywords/Search Tags:medical insurance fraud detection, machine learning, random forest, gaussian mixture model, fuzzy c-means
PDF Full Text Request
Related items