Font Size: a A A

Application Of Machine Learning Methods In Forecasting Financial Fraud Of Listed Companies

Posted on:2020-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:J X WangFull Text:PDF
GTID:2439330602456050Subject:Statistics
Abstract/Summary:PDF Full Text Request
The financial statements issued by listed companies are the most important basis for investors to know about their comprehensive level such as company sizes,operating conditions and profit potential to make relevant investment decisions.In the past,people believed that data were the most convincing because mistakes could easily be found through some relatively simple calculation,thus people put great trust in financial statements.However,with the opening of market economy and the booming of modern technologies,some listed companies prepare fraudulent financial statements for profit reasons.The fraudulent methods are increasingly covert so that sometimes they are difficult to be found in time,which has huge potential harm.Therefore,there is an urgent need to find effective detection methods for financial statement fraud.We hope to build mathematical models to determine the existence of fraud in listed companies by the financial statements of the given year.By reviewing the relevant literature of recent years,we find that foreign studies on financial state-ments mainly focus on corporate bankruptcy and financial crisis,while domestic studies mainly on whether the listed companies will be classified as special treat-ment(ST)ones and detect the signal of financial distress.However,there are fewer studies on the detection of financial statement fraud.Thus,it's significant that we make some research and analysis about this in the thesis.We apply machine learning methods to the detection of financial statement fraud.Machine learning can not only verify or refute hypotheses from top to bottom,but also draw non-hypothetical conclusions according to data from bot-tom to top.According to this,we use machine learning algorithms to build three models:Logistic regression model,Support Vector Machine(SVM)model and Random Forest(RF)model.Logistic regression model is widely used in the de-tection of hidden data,and previous studies also prove that it has good results,so we start our discussion with it and optimize the model.Since whether the financial statements are fraudulent or not is a typical classification problem,it is possible to achieve better results by using more accurate classification and pre-diction algorithms in machine learning.Since our samples are limited in size and high in dimension,and it is a classical binary classification problem,so the SVM model dealing with data of these characteristics is a good choice.Subsequently,binary tree is naturally associated with binary classification,and RF model with better fitting and integration effect for each decision tree classifier becomes our choice.With changing fraudulent schemes of financial statements,if the model can add or delete variables and automatically select variables over time,it will be more effective to identify whether the financial statements of listed companies are fraudulent.Therefore,for each model,it is optimized by the selection of parameters with Cross Validation.According to the list of publicly punished companies released on the official website of the SFC and its subordinates from 2013 to 2018,we collect the finan-cial statement data of fraudulent companies in different fraudulent years and that of non-fraudulent listed companies in corresponding years.Some of the collected data are used to build the models,and the others are used to test them.Since the number of fraudulent companies is relatively small compared with the total num-ber of listed companies,we adopt different data processing methods to establish models for weighted unbalanced data,models by over-sampling and models by under-sampling respectively.To evaluate the effects of the models,we select five indicators.The results show that the SVM model by under-sampling performs the best in correctly identifying fraudulent companies,while the RF model by under-sampling performs better on other indicators.Therefore,we suggest that the two models should be considered comprehensively when forecasting whether a company is fraudulent.Finally,to expend the practice of the models,we apply them to select companies with high probability of fraud and remove them from the stock pool.The backtesting results show that the portfolio return will be improved,which proves that the study in this thesis has great application value.
Keywords/Search Tags:Logistic Regression, Support Vector Machine, Random Forest, Financial Fraud
PDF Full Text Request
Related items