Font Size: a A A

Testing And Application Of Data Mining Partial Models In Stock Market

Posted on:2015-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
Abstract/Summary:PDF Full Text Request
China’s stock market has gone through24years of ups and downs. It has been keeping self-exploration constantly. Facing the weak form of efficient market, experts and scholars from all walks of life have been doing a lot of research on it. Now the research can be divided into two factions:fundamental analysis and technical analysis. Fundamental analysis acknowledged the stock price is a reflection of the intrinsic value of the company, focusing on the analysis of variable selection; Technical analysis places the history of the opening price, closing price,highest price,lowest price and so on as fertile soil to predict the future price,focusing on the method of data processing and model establishment.They both have their own advantages.Nevertheless there is no doubt that the Chinese stock market has not reached a weak-form efficiency with related stock price sequences.Technical analysis has its foothold. This article is part of technical analysis.Technical analysis contains Time-series Analysis, Fuzzy Mathematics, Chaos Theory, Data Mining and other techniques. This article will talk about Data Mining, which does not limit the form of information to be explored, but let the data speak for itself. At present scholars try to improve Data Ming from algorithm design and improvement, choosing the right variables and combination of different data mining model. This article also selects the data mining technology research as a starting point for the stock market, but try from a new perspective to explore and improve it,proposing the concept of Partial Models.The concept of Data Mining Partial Model was originally derived from thinking on the classification tree model’s unique structure. The output of tree model is a tree with many leaves. Each leaf gives a piece of knowledge. In practice, these knowledge has different values:some has high prediction accuracy while others low. So if each leaf is taken as a child model, we can test each leaf instead of testing the overall. In fact, valuable buying or selling in the stock market is limited, how to search them is important.This article built classification tree partial model based on Shanghai Composite Index. Since K line Graph theory is relatively order to compare model outputs with existing theory,the paper converted opening price,closing price,highest price and lowest price into the length of upper shadow,length of lower shadow,length of box, and box color.The four indicators as input variables,stock change after10days as output variable,the paper built decision trees and filtered out seven leaves of highest fitting found that if two consecutive days’price have symbols of both pregnancy and double-needle bottom,future price rises; if only symbol of double-needle bottom,future price rises only when at least one dip needle is longer than9.65;."Pregnant line combination" and "dip needle" is the experience people have made about the K line Graph.The initial exploration of Partial Model fit these experience basically.Decision Trees Partial Model is model from the perspective of output.Its essence is to accept only part of the model results, but not all.Further, on the basis of Decision Tree Partial Model,this paper has extended what it means.Valuable buying and selling points is limited,only when the stock signal is clear (either up or down), it is necessary to predict.Based on this idea, Partial Support Vector Machine model is designed to predict only when there is a best data environment. This is a partial model from the view of inputs. Specifically, SVM build model1on the training data set A,in which the good fitted data is denoted by set B.Then set B is used to build model2; try to find common points of set B,denoted by K. We predicted those records only of K characteristics.Prior to the establishment of SVM partial model, using analysis of variance we prove that SVM models with different data input is indeed significantly different in the goodness of fit. Now the data is divided into three groups, Under the original hypothesis of no significant difference exists in the fit of the three sets of data, P value is approximately zero.The null hypothesis can be considered negative.The same data in different time periods can indeed lead to a completely different fit goodness.After verifying the practicality of SVM partial model and rationality of decision tree partial model, we use two partial models looking to invest laws on the stock market. In the fifth chapter, on the use of Partial Decision Tree model, with " yesterday’s box length, yesterday’s box color, yesterday’s lower shadow length, today’s box length, today’s box color, today’s lower shadow length, DIF, DEA, DIF-DEA" as input variables,"stock change after10days " as the output variable, we find9leaves of fitting80%or more.And these9leaves are applied on validate data, finding that the32th,11th,132th,266th rules to reach100%accuracy.These rules actually are:If the DIF-DEA<-1.85, price forecast to fall; If DIF-DEA>11.05, price forecast to rise; If-1.85<DIF-DEA<11.05, the future price trend uncertain. In the historical experience,"when DIF>0and DEA>0, if DIF> DEA, the share price will rise; When DIF>0and DEA>0, if DIF <DEA, the stock price will fall; When DIF<0and DEA<0, if DIF> DEA, the share price will rise; when DIF<0and DEA<0, if DIF<DEA, the stock price will fall." It can be seen that the conclusion of this article is actually giving a more exact value interval based on historical experience. This paper argues that the model results of the interval is more strict (no longer with0as the dividing line, but-1.85and11.05as the dividing line). It may be due to investors’psychology: when the stock market rebounded slightly, most investors remain in a wait state, not easily shot; only the rebound of the stock market reaches a certain level, investors will believe that spring has come, shot bid, future price rising. And vice versa.When establishing the Partial Support Vector Machine model, we found four common laws on those good fitted data. To filter out data what meet these four laws and to predict, the correct rates were57.1%,46.1%,12.1%, and75%. It is significantly higher than55.5%which is without treatment on average. It further verify that there is data environment suitable for SVM model.To predict only when this environment comes is much better than blind prediction regardless of the timing.The traditional classical statistics are always at first to give a set of variables based on economic theory,to fix the relationship between these variables,and then to do a variety of regression analysis under a good framework built in advance.It is a "first theory, post data " thinking.However,Data mining technology is to break the routine. It is not set "what should be"in advance,but let the data speak for itself fully.It is a "first data, post theory " thinking. For this reason, the paper boldly in the absence of detailed mathematical derivation discusses the concept of Partial Models.This paper not only presents the concept of Partial Models, but also extends the concept of Partial Models:when we use data mining technology processing data, no matter data entry, or data processing, or resulting output, in the entire course, as long as there is one link not been adopted as a whole, we call it a Partial Model.Classification Tree Partial Model is "output" partial model,while Support Vector Machine Model is "input" partial model.In the future, more partial model with more meaning and more angle may appear.I believe more scholars will be participate in the discussion of Partial Models.
Keywords/Search Tags:Decision Trees, Support Vector Machines, PartialModel, Stock Market Prediction
PDF Full Text Request
Related items