Font Size: a A A

Fraudulent Financial Statements Detection Based On Data Mining

Posted on:2008-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:2189360215952234Subject:Quantitative Economics
Abstract/Summary:PDF Full Text Request
Financial statements are the documents that comprehensively reflect the financial position and operating results of a certain period. They are compiled based on the day-to-day accounting, processing, finishing, classification matrix with a basic uniform format and content. They are indispensable for investors, creditors, suppliers, government agencies and other financial statement users to understand information for the decision, so it is very important to correctly understand and use financial statements.Today, listed companies in China that violate laws and regulations and accounting standards, forged financial data making fraudulent financial information are frequently exposed. That listed companies manipulate the profit in laws and accounting standards within the permissible range is more common. It is inevitable to take investment risks for investors to make judgments on the basis of fraudulent financial statements. The corrupt practices of listed companies and registered accountants on the market had a tremendous impact. To study the identification methods of fraudulent financial statements is of great significance for improving the securities and fairness of China market.There are a lot of domestic and foreign experts and scholars in different fields conducted a study to the problem of listed companies cheating and from different angles. Someone used data mining technology and achieved satisfactory results. From the viewpoint of these documents and experts we can know some fraud indicators. Although these experts vary the starting point, from the choice of their attributes, the results were consistent.This paper use the more than 1,300 listed companies, including more than 70 listed companies which were publicly punished in 1994 -2006 of capital markets as study sample. The data are extremely accurate and detailed, and included in all the financial statements since the end of the first three years since the corresponding adjustment figures.After the data processing, the initial screening and target randomly selected, we get a sample data set. This paper presents an empirical study of four machine learning feature selection methods. Ahead of feature selection course, we test the general classification algorithms and record test results.The study illustrates how four feature selection methods—'ReliefF','Correlation-based','Consistency-based'and'Wrapper'algorithms help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. The CFS, CON and WRP methods measure the goodness of feature subsets rather than each single feature. An exhaustive search for the data set with 33 features is unrealistic due to the enormous computation time required. Therefore, heuristic search methods need to be used. Different search methods may lead to different results. Greedy hill climbing search strategies such as forward selection and backward elimination are often applied to search the feature subset space in a reasonable time. Although simple, these searches often yield good results compared to more sophisticated search strategies.After feature selection, we rebuild classification model with training samples, and test the model with the test samples. During feature selection, training and testing classifier method, sometimes we used cross-certification. Combining theoretical knowledge of the relevant financial accounting, we come to the conclusion: feature selection methods help to improve three aspects of the performance of scoring models: model simplicity, model speed and model accuracy. Use data mining techniques to study the data from these indicators and the specific people in real time engaged in financial accounting practice and the experience is consistent. In these four methods of feature selection'CFS'and'WRP'are superior to the other two methods.This paper presents an empirical study of four machine learning feature selection methods to show their performance in a real-world fraudulent financial statements detection problem and achieved a good result. This application can also be tried in the areas such as bank credit, debt ratings, investment appraisal, the credit rating business, performance evaluation and management of companies and securities regulation.
Keywords/Search Tags:Fraudulent
PDF Full Text Request
Related items