| With the rapid development of Internet, the requirements of massive data processing become more and more popular. Generally, people have to spend lots of time learning the related knowledge, and spend more time developing it. At the same time, traditional data processing platform is lack of algorithm recommendations and data visualization, also it is hard to set up and ensure security so that user find it hard to meet the demand of massive data processing.To solve this problem, we put forward an open design scheme of distributed mass data processing platform. The platform uses HDFS, Hive and HBase to realize data storage, uses MapReduce, Spark and Storm to realize data processing, uses Kerberos to realize authority management. It also provides a unified operation interface to the user, so that users can handle the mass data simple and efficient. For efficiency, firstly we use an open source tool names Kettle to provide visual operation interface for users; secondly, we have learned the relevant research and experience in distributed computing, data processing, algorithm integrated processing and achieved the algorithm recommendations module, to help people process massive data; at the same time, we use JFreeChart to help people achieve the data visualization. It also provides a unified operation interface to the user, so that users can handle the mass data simple and efficient.The distributed data processing platform introduced in this paper is based on Hadoop platform, so that we can ensure the universality of the system, cut costs and keep up quality. The test shows that our design can improve the security of the system, and improve the efficiency of data processing. |