| As the information technology which marked by the Internet, mobile Internet and the Internet of things developed rapidly, amount of information data is growing explosive. Big data processing technology has been used more and more widely. Distributed graph computing in social networking, e-commerce, recommendation system, and other fields has important practical application. Because of the good reliability and expansibility of Hadoop, it becomes the core component of ecological system at the big data field. Hadoop perform poorly at the field which need a large number of iterative such as graph computing and machine learning, it can’t meet the demand of massive graph data processing. The graph computing framework based on BSP model made up for the inadequacy of Hadoop. Apache Hama is a pure BSP parallel computing framework. Because of its short development time, the pretreatment process of graph computing need to improve. The Hama pretreatment process is unable to meet the need to process the graph data that the amount is large and the structure is complex. The data split mechanism in the graph pretreatment of Hama exists flaws.The purpose of this thesis is to put forward a new graph pretreatment technology based on BSP model. It made up for the shortcoming of Hama pretreatment technology, improved the data split technology, improved cluster resource utilization and made the Hama graph pretreatment technology meet different application scenarios. The main work and contributions of this thesis include:This thesis analyzed the mainstream distributed graph computing framework and developing trend. It made a deep analysis of the Hadoop and Hama framework. The architecture design and working principle of the HDFS and MapReduce that the Hama and Hadoop framework used was presented. This thesis made a deep comparison at the Hama graph pretreatment and Hadoop framework, introduced the big vertex data processing technology and the data split technology of the graph pretreatment based on the BSP model. The difference and shortcomings of the graph pretreatment between the Hama and Hadoop framework was analyzed in detail. Besides, the disadvantage of the data split technology at the Hama graph pretreatment process and Hadoop framework was presented clearly.Based on the above research results, this thesis proposes a new graph pretreatment technology of graph computing based on the BSP model. This thesis introduced the design concept of graph the pretreatment technology in detail. The new graph pretreatment technology was implemented in the Hama framework. It improved the data split technology of Hama. And the big vertex processing technology is used to do the graph balance. Finally, the thesis did stability experiment, some functional experiment and performance comparison experiment to test the new graph pretreatment technology used at Hama. The analysis of experimental data and the experimental results show that the new graph pretreatment technology based on the BSP model is very useful to solve the problem that the graph pretreatment technology of has. It achieved the expected effect and increased the efficiency of cluster resources, provided a variety of flexibility for different application scenarios. |