Font Size: a A A

Research On The Models Used For Financial Fraud Detection

Posted on:2016-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:W XueFull Text:PDF
GTID:2309330461956795Subject:Business management
Abstract/Summary:PDF Full Text Request
Since the 21 st century, China’s capital market develops rapidly and the number of listed companies increases greatly. However, along with the development of listed company institution, financial fraud has also occurred. The large scale and long duration of the financial fraud cases make the market shocked. Financial fraud directly or indirectly makes the investors’ interests hurt, dampens investor confidence, and brings serious damage to the stock market. Therefore, how to effectively build a financial fraud detection model is of great significance.In this paper, we make a systematic summary of the relevant literature in the field of financial fraud. Firstly, we conduct a brief review of the classical theory about the causes of financial fraud. Then, we summarize relevant indicators of financial fraud detection including not only financial characteristics but also non-financial characteristics. Finally, we make a literature review of how statistical methods and data mining methods are applied in this field.Based on the literature review, the paper chooses 161 listed companies punished by CSRC because of financial fraud during 2007-2013, and choose 161 non-fraudulent listed companies as paired sample. Meanwhile, this paper summarizes 32 financial indicators and 11 non-financial indicators from existing literatures. Then we use a feature selection method based on information gain on those 43 indicators, from which we find 14 indicators such as the ratio of other receivables to liquidity, earnings per share, the frequency of Board of Supervisors meetings.This paper uses the classification algorithms to detect financial fraud. First of all, we use C4.5, Bayesnet and libsvm, which are single classifiers algorithms. Experimental results show that:Bayesnet has the highest overall accuracy of 70.81%, but Type I error rate reaches 39.75%; C4.5 has a slightly lower overall accuracy 68.94%, but relatively balanced Type I error and Type II error rate; libsvm has the lowest overall accuracy, but it has the lowest Type I error rate of 32.3%. Subsequently, this paper uses two ensemble learning algorithms including AdaBoost algorithm and Random Forest algorithm. We found that Random Forest algorithm can get better results than AdaBoost algorithm and othe three single classifiers with an overall prediction accuracy of 73.6%, a Type I error rate of 27.33%, a F-measure of 0.736 and a AUC value reached 0.799. Besides these, we propose to take into account that predicting a fraud company to a non-fraudulent one has more serious impact. We use a cost-sensitive algorithm named MetaCost. MetaCost algorithm makes Type I error rate dropped to 14.6%, while the overall accuracy is maintained at 70.19%.The major possible contributions of this article may be as below. Firstly, we summarize 43 financial indicators and non-financial indicators, and then use a feature selection method based on information gain to select relevant indicators which can help to detect financial fraud. We find 14 key indicators including the ratio of other receivables to liquidity, earnings per share, the frequency of Board of Supervisors meetings, inventory turnover, asset-liability ratio, operating leverage, quick ratio, current ratio, liquidity ratio, the proportion of the primary operating profit, net profit growth, net assets per share, the ratio of operating income to net profit margin and return on assets. Secondly, since the misclassification cost of financial fraudulent firms is higher than that of non-fraudulent ones, a cost-sensitive learning algorithm named MetaCost is introduced to the field of financial fraud detection. MetaCost algorithm based on Random Forest can detect 85.4 percent of all fraudulent companies which achieves quite good results.
Keywords/Search Tags:financial fraud, data mining, ensemble learning, cost-sensitive learning
PDF Full Text Request
Related items