| Network traffic analysis is one of the most important network management technologies, which involves analysis of a network host address, network interconnection, network applications and network user behavior, means a lot to the allocation of network resources and network quality of service. In this paper, we study several problems based on the network address labeling system. The main results of this paper are summarized as follows:1. Propose and implement a communication information managing method based on double hash table data structures. Receive and complete high-speed network traffic in real-time computing is a big challenge core network flow analysis, this paper describes the design QSO log this class format, retain Communications Information filtering packet payload data in order to greatly reduce the flow of data processing costs. QSO log pretreatment node receives double hash table Communications Information records management structure and multi-core multi-threaded parallel computing, and ultimately achieve real-time processing capabilities standalone6million per QSO log data before and after pretreatment node computes compression ratio reached95.7%.2. In the network address attribute calibration process to address the brightness and ending the current active address Communications case basis, in increments of the timing calibration model applied on the superposition property data set computing, in order to improve the fit actual network address associated data. Proposed and implemented based on the MapReduce computation model calibration address property, with the Hadoop distributed parallel computing capability to meet every half hour pooled analysis of28GB of data, the final two days to complete a total of2.8billion active role in the type of network address attribute calibration storage.3. Address common business model study attributes, adjust UNIBS dataset into line with the system data in the form of this article, in contrast to the data set used to test six categories of machine learning based classifiers calibration accuracy and diversity, and ultimately made based on a weighted combination of the address Confidence business classification algorithm. The results show that the proposed algorithm compared to the overall classification accuracy ZeroR UNIBS32class classification application has40.57%increase compared to the sample space is the best base classifier J48classification accuracy rate increased by1.8%. Compared to the base classifier combination algorithm that confidence has been raised up to31.85%of their business to enhance the degree of confidence, enhance the2.59percent average. |