| So far,Chinese social security system has been established for 40 years.During this period,the party and the government have been committed to establishing a medical insurance system that is positive,inclusive,and universally involved.Up to now,the number of people participating in medical insurance in China has reached 1.344 billion,and the coverage rate has exceeded 95%,basically achieving the goal of universal participation in insurance.However,with the continuous expansion of medical insurance coverage,fraud cases involving illegally taking medical insurance funds by various means are also emerging.The annual losses due to medical insurance fraud account for about 7%-8% of domestic medical expenses,and the safety of medical insurance funds.The health care of the people has caused a huge threat.Therefore,accurate and efficient identification of medical insurance fraud behavior plays an important role in safeguarding citizens’ life and health and promoting the stable development of China’s social security system.This paper focuses on the problem of medical insurance fraud identification in China.Based on the characteristics of China’s medical insurance fraud identification,the big data mining technology is used to deeply mine and analyze real medical insurance big data in China,and the data size and structure of medical insurance fraud identification are too large.Complex,unbalanced and other issues.Firstly,the characteristics of the problem are analyzed,and the medical insurance business background is combined with the text mining technology to construct the medical insurance fraud feature from the four perspectives of cost,hospital,disease and behavior.Then,based on the Easy Ensemble integrated sampling method and Light GBM algorithm,the medical insurance fraud identification model is constructed.Compare the recognition effects with SVM,random forest,XGBoost and other methods,further explore and analyze the key features of fraud,find the law of fraud behavior,and propose scientific and reasonable anti-fraud suggestions.Finally,this paper uses 8.36 million real medical insurance treatment 452 hospitals in China to conduct deseratory data for experimental testing.The test results show that the model ACC is 0.86,the AUC is 0.81,the fraud sample identification rate is 82%,and the fraudulent person is effectively identified when the feature dimension is only 223 dimensions.Through the analysis of key characteristics,it is found that the cost characteristics are the most important indicators reflecting the insured’s fraud behavior.There are different differences in the amount of approval from the total amount to each order and each amount,although the overall approval amount of fraudulent personnel is higher.High,but the amount of approval for each order is relatively normal,so the law of fraud is to start from the normal amount of guaranteeing each reimbursement amount,split multiple times,and reimburse multiple orders,and obtain higher medical insurance approval amount as a whole;In terms of detail,the drug and treatment costs account for the highest proportion,and the fraudsters are higher in the treatment fee than the normal insured.Therefore,the fraudulent means is fraudulently defrauded to obtain the medical insurance fund;the higher hospital characteristics ranking reflects the conspiracy fraud of the Chinese doctors and patients.The seriousness.Therefore,in the process of identification and governance of medical insurance fraud in China,it is necessary not only to strengthen the supervision of the hospital declaration and approval process,but also to avoid the analysis of a single perspective and ignore the collusion between hospitals,pharmacies and patients,and should start from multiple angles and multiple layers.Improve the medical insurance supervision and review mechanism,establish a scientific and reasonable anti-fraud system,and promote the fair,healthy and sustainable development of China’s social security system. |