| Up to now,a huge-scale Internet has formed in China.The multifarious applications on the Internet have brought great convenience to people,and profoundly changed people’s way of studying,living and working.Today,many behaviors in daily life are dependent on varies of applications,and the data is sent by the application in the form of net message,and encapsulated into packets.So it has become a valuable subject how to mine the specific behavior information of network users and extract their features from the mass message data.This thesis aims to develop a system that can analyze and mine massive message data in big data conditions.The system includes four functions,collecting,and storaging huge mount of messsage data,data preprocessing and formatting,association mining analysis,and data visualization.As a result,the system will play a unique role in network construction.The data source of the system is provided by carrier operators,then data packet is analyzed in access module.The system storage depends on hadoop distributed file system and the data preprocessing module is responsible for further parsing and cleaning,and converting the message into designed format.The packet mining module realizes the association rules mining of mass formatted packet data in HDFS,and finally displays the associated results through the Echarts plugin.Meanwhile,this paper also tests the improved Apriori algorithm under the three dimensions of cluster size,data amount and minimum support.The results show that the improvement can significantly improve the mining efficiency of massive message data.With the increase of cluster size,In the large amount of data processing efficiency will be significantly improved. |