| Twenty-first Century is a rapid development of the Internet era,the Internet infrastructure in our country has been constantly improve and upgrade,and the categories of services provided by the Internet emerge in endlessly.The massive data not only contain commercial value,while the large number of Internet services to the network has brought a heavy burden.How to collect accurate and effective information from network packets,and find problems in the complex network situation,can help operators improve the environment of internet.This thesis describes the background and current status of the research,and then discusses the large-scale acquisition network and Hadoop distributed large data processing platform and its related components,which provides platform support and technical support for the later research.Thesis is mainly divided into two aspects,one is the accuracy of acquisition,mainly studies the accurate identification to HTTP request and response in data acquisition system,and time synchronization problem in large-scale acquisition system;the other one is the analysis of server and network performance.The theis proposes a new convergence download rate calculation method,and the analysis of network and server in many aspects,such as HTTP special field and delay.The analysis can help operators understand the server performance and network status.Based on the real collection network and the massive network data produced by the project,through the understanding of HTTP protocol and NTP protocol,we find the way to improve the accuracy of data acquisition in large-scale acquisition network and the acquisition accuracy.Through the big data analysis to get the server rate calculation method,and the multi-dimensional analysis of the server and network transmission performance has important implications to enhance the network performance. |