Font Size: a A A

The Stock Data Mining Algorithm Based On Association Rules

Posted on:2017-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:M D WangFull Text:PDF
GTID:2347330491450957Subject:Statistics
Abstract/Summary:PDF Full Text Request
As the important tool of data analysis and processing,the results of related research with knowledge discovery in database(KDD)and data mining technology has been widely used in finance,health care,retail and other related industries of statistics.Among these statistics,the association rules mining in the aspect of financial statistics,stock prediction is more widely applied.But the classical association rule mining is faced with the so-called "higher-order logic" problem,so this article has carried out related research from the following aspects:First,In the face of the object on the orders of magnitude has essential difference.Classic basket data mining and data mining stocks have essentially different.The former can be used to represent a constant n,while the latter can only be expressed in an infinite symbol ?.Description of the share price,especially for some derivative asset based on the(underlying stock)prices,directly use the generic Apriori algorithm is not appropriate.Secondly,faced with time and space complexity bottleneck problem are more severe.The stock has a strong randomness,uncertainty and ambiguity.Conventional classical association rule mining algorithms can't express the relevance of mining objects between fuzzy information better.Facts have proved when the data quantity is small,use fuzzy association rules algorithm to deal with the stock data is a very effective way.But faced with a large amount of data or a large amount of data but there is time and space complexity and other bottlenecks.Finally,the signal will face failure and even disappearances.When the original transaction database of stock D converted to the extension transaction databases De with the sliding window technology based on the point of transaction,it will appear lots of interesting questions that support obviously is very low even not be considered,confidence,however,relatively high.If you always use the traditional mining algorithms to deal with these interesting association rules will be difficult.To solve the above problems,proved in the equities and derivatives of massive data mining and algorithm design of the law of Large numbers and the Central limit theorem is still processing huge amounts of data essential to the theory.Secondly,when faced with large amounts of data or large data volume,time and space complexity of bottleneck problem has become more serious,and with insufficient interest or(interestingly)association rules were missing for hard to find,and so on.For this we use a vector,matrix,and dimension reduction method for processing.But the problem is the stock data matrix are often particularly reason and memory to bear,is there a theoretical approach can be avoided when dealing with highdimensional data dimension the difficulties brought about by the excessive,according to dependencies between data to try to reduce the dimension,without losing too much information in the original data so that data-processing knowledge? In this article,similar algorithms for mining association rules give a complete mathematical proof on this issue.Provided that the dimension will appear an error rate,but there must be a way in the case of error control approximation of truth.In this paper,the key of algorithm design is given the similarity.Similarity has excellent approach of the concept of confidence.In order to further improve the efficiency of our similarity also it gives a good estimate of S *(ci,cj).Further,we also demonstrated S *(ci,cj)among the items set also has anti-monotonic,fully satisfies learn Apriori algorithm to find the similarity threshold s multiple extensions sets.This ensures that if you are among the items set similar pattern matching is capable of having a mathematical completeness.The transaction database into 0-1 matrix,then the minimum hash transformation and multiple matrix transformation matrices to simplify and extract with similar characteristics,followed by a special significance matrix transformation that is still built on the basis of a similar measure can not only speed up the verification frequent k item sets the speed,but also a significant reduction in the required I / O times,reduced storage space.Finally,the number of cases appear similarrelationship was support count with the support of a measure to replace the similarity measure matrix mining capacity at this time was much smaller than the original matrix M.Based on this correlation graph through association rule mining to derive between we are looking for transaction association rules.This proved to make the algorithm for mining association rules more efficient and more accurate information.
Keywords/Search Tags:Association rules, Association rules between the transaction, The Apriori algorithm, similarity
PDF Full Text Request
Related items