Font Size: a A A

Analyzing Fraudulent Financial Statements Using Clustering And Classification

Posted on:2008-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:J H DiFull Text:PDF
GTID:2189360215452011Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Fraudulent financial statements have a serious impact on capital market, securities market and the investors. Preventing firms to issue fraudulent financial statements is a necessary and meaningful task. Improving capability of identifying fraudulent financial statements is a effective way to solve this problem. But fraudulent means these firms use become more and more diversiform. It has become increasingly difficult to identify the fraud financial statements.As data mining technology continues to mature and become more widely applied. Data Mining Techniques become an alternative analysis for the financial statements. It can use abundant hidden information to anatomize the financial statements. We can use clustering to anatomize, compare result based on many situation and use practical meaning to explain the result. We can also use classify to analyze probability of fraudulence.This paper use Data Mining Techniques to anatomize the financial statements. Including aspects as follows:1.Data PreparationThe data this paper use is download from website wind , our team collect balance sheet, profit sheet, cash flow sheet, mid report, season report, finance summary and daily stock data, these data exist as Excel file, each firm has many kinds of report, each report has different format. We build index database by VB data integration program and index compute program. We build finance fraudulent character database by collecting data from website of SFC. Index data prepared by VB extraction program, a convenience sample data extraction.2.Finance Index SelectionWe choose 28 indexes based on profitability, asset structure, efficiency, cash flow , liquidity and growth. Index selection guided by following principles: First, combine the theory of fraudulent financial statements. Secondly, selected indexes must reflect all aspects of firms, Third, considering availability of index which we selected.After Index Selection, We use feature selection algorithm based on correlation, we use genetic algorithm as search strategy. We finally get three subset has good performance.3.Cluster analysis of financial dataThe empirical analysis of this paper focused on clustering's detecting ability and the effects after apply the result to the training data.We see that clustering feature has clearly performances in the cluster result. Quick-moving ratio is generally much higher than the average in other categories, the main business income growth rate is lower than the other categories, nets asset per share is higher than the other categories. Financial reporting fraud is usually associated with those companies in financial difficulties. In order to cover up financial difficulties, they are more likely to fraud. This explains why the main business growth is relatively low in the cluster. The surplus minus Cash flows playing a very important role in the accounting fraud, some accrual accounting fraud associated with a high level of it. Surplus minus cash flows positive is a signal of potential fraud. Moreover, the fraud company's free cash much lower, as compared with non-fraud. Fraud companies usually issue more interest securities, higher financial leverage, more account receivable balance, more sales of higher growths, higher market returns to its assets and market value. However, the absolute value of its assets and sales are usually smaller. This explains the cluster's liquid ratio is far higher than that of other cluster.4.Compare classifiers use different training dataWe use different training data for training the neural network classifiers. The first is random sampling method; the second sample of non-fraud, we made a choice based on clustering results. We use WEKA's multi-feedback neural network as classifiers. When we training classifiers based on clustering, Classification results have been noticeable improved, the correct identification rate of the test samples from 73.5% to 79.6%. Based on the above empirical analysis, we see clustering's ability to detect unknown data, especially for fraudulent financial statements application. Clustering can provide reference of training data choosing, thereby improving the classification model, and improve recognition accuracy.The process of data mining to find data model is very dependent on the data. Sometimes the data has so many complex structures that we can't find meaningful patterns even using best algorithm. Sometimes many features will offset each other. Financial statements data has complex data structure. Clustering provides a way to analyze complex data structure, it will decompose competing signal. Clustering is non-directed knowledge discovery tool, for the automatic detection clustering only detects existing data structure, not considering any specific target variables and there is also no difference between the independent variables and non-independent variables. Clustering algorithm search the records of the different groups called cluster, the aim of the algorithm is to find the comparability. Finally, we will find whether similar things on behalf of a meaningful reality.We analysis pattern of fraud company by using cluster, choose training data by cluster result, thereby increasing the rate of correct identification of financial fraud.
Keywords/Search Tags:Classification
PDF Full Text Request
Related items