| With the widely using of computer technology and the Internet in all aspects of human social life, the amount of data shows explosive growth. Nowadays, the storage and processing of big data has become a new challenge for many companies. Accordingly, the application of big data processing attracted widespread attention. Hadoop is an open source platform which deals with the storage and processing of big data through distributed computing. Hadoop implements HDFS and MapReduce programing model, which can handle the large-scale data. However, these techniques are mainly designed for offline data and can not be used to real-time computing of big data.Moreover, with the popularity of intelligent terminals, users can access the Internet anytime and anywhere. The gradually growing of streaming data and the real-time characteristic of content and services in the Internet will demand the higher computing power. And it results in real-time distributed computing platforms such as Storm. As Hadoop represents the storage and processing of offline data, Twitter Storm represents the computing of streaming data. Besides, the Yahoo! S4, Spark of the University of California Berkeley, the Puma of Facebook are popular real-time computing framework. But these frameworks focus only on real-time computing; they have not provided service of data source accessing. And users not only build the deployment of these frameworks, they also need to learn the corresponding programing models. High cost of learning will reduce the efficiency.This paper designed and implemented a distributed scalable platform architecture for online data and implemented a system to provide service of data storage and statistics. Users can focus on their own business logic instead of programing, allowing the development of tasks of online data more conveninent. The system is based on Hadoop platform, using Storm as the framework of real-time computing which provides external environment for performing online tasks. And the System uses HBase which is a KeyValue database as main storage, so that it can still stable service in the case of high concurrency. In addition, the system provides users with a unified communication rules. Users can customize the business processing logic based on this set of rules, which greatly improving the productivity. |