Font Size: a A A

The Research Of Quality Analysis And Evaluation Of Tracks Based On Association Rule Algorithm

Posted on:2017-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q C BaiFull Text:PDF
GTID:2322330491458128Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the improvement of information level in all walks of life, data mining technology has gradually been widely used. Due to the explosive growth of information, the traditional data mining technology is hardly to meet the needs of business. The traditional serial algorithm has the shortcomings of low mining rate, and cannot respond quickly to massive data mining. With the increasing of data size, we will have to encounter more and more complex data structure. Moreover, the dimension of the data will be higher and higher. Facing with the rapid growth of data scale, the traditional Frequent Item set mining algorithms need to scan the database constantly, which will increase the time complexity so as to affect the efficiency of the algorithm seriously. The parallel data mining appears under such a background.Association rule algorithm is an important branch of parallel data mining, which can mining data item sets with association relation from sets. Therefore, the association rules algorithm has a broad applying prospect in various industries. In recent years, Cloud computing platform such as Hadoop has attracted more and more attention of researchers. The parallel realization of traditional Frequent Itemset mining algorithms is becoming an important research direction. There are two bottlenecks in the mining of frequent item sets, namely too many iterations and overload I/O. The Hadoop platform has inherited many advantages of cloud computing and provided an effective strategy for big data distributed storage and parallel computing. Owing to its characteristics of high availability and low cost Hadoop can be used to relieve the pressure.Although the parallel algorithm based on association rules has been widely studied in recent years, there are still drawbacks of lots of candidate item sets exiting in each cycle scanning due to the excessive number of iterations. By studying the working principle of the MapReduce calculation model and its operation mechanism and fault tolerance mechanism, this paper proposed an optimization method on the parallel frequent item set mining based on MapReduce. Moreover, this paper also carried out theoretical design based on MapReduce frequent set mining algorithms and applied the improved algorithm to the rail quality analysis evaluation. By analyzing the rail defected data, strong association rules can be generated. In the MapReduce parallel computing processing, data partition matrix Tk was stored according to the row segmentation. The computational load was spread across all nodes in the cluster, which could reduce the time consumption of the vector multiplication and the moving matrix in each iteration. Finally, this paper analyzed and discussed the algorithm in detail.
Keywords/Search Tags:MapReduce, Association rule algorithm, Data mining, Frequent Itemset mining
PDF Full Text Request
Related items