| The carrier of information, the information content of the data, the data is generally considered Computer systems. Using processing data base, extracting information is information system’s basic functions. In today’s highly information-oriented society, the network can be said is the largest information systems, where the data is huge, diverse, heterogeneous, dynamic enterprise characteristics. How quickly from large amounts of data to extract useful the information programmers in the application development process, has become a serious problem. Data mining technology, the emergence of cloud computing has brought new opportunities. Cloud computing cloud distributed across multiple nodes of cluster storage and computing power. By deploying a large number of cheap ordinary PC and allows massive data storage and analysis, clusters vary in size, but also with respect to high-performance computers, many ordinary PC to be cheap, so the cost is reduced. The cluster servers are used to reduce business costs. Such storage costs and computing costs have reduced, making the cloud-based data mining of large data gradually become possible. In the Hadoop as a open source cloud computing software, its efficiency, scalability, low cost, has been widely used in data mining field.In this paper, based on the integration of Hadoop and other data mining system, and select the typical Apriori algorithm, which is a new data mining system algorithm modules are widely used in improving willing to handle massive data, improve its efficiency. Used in this article covering methods include:literature, structured approach, case study method, which is instructive analysis of cloud-based Hadoop data mining system architecture. This paper describes the traditional apriori algorithm and the improved algorithm is feasible, an example of the implementation process.Typical combing with Hadoop’s data mining system architecture and integration, bringing Hadoop-based data mining architecture of the system, each functional module brief expouds. Apriori algorithm developed Amin’s massive data processing bottleneck, the use of MapReduce programming model, based on the ideas put forward on the basis of database partitions simultaneously improved. |