| Operator big data contains the full amount of data exchanged by network users,and has important applications in network service provision,network management,and network security.The operator big data includes various data sets,such as DPI data(deep packet inspection),CDR data(call detailed record),user information data,etc.DPI data is obtained by detecting and analyzing the traffic and message content at key points of the network through deep packet inspection technology.However,due to network congestion,unreliable protocols,and data collection,DPI data inevitably suffers from data loss.On the other hand,in areas such as data mining and data applications,ensuring data integrity is also critical.Therefore,how to infer complete traffic data from part of the traffic data becomes more and more important.Based on the above real needs,the research content of the thesis focuses on how to effectively model the tensor of DPI traffic data and use efficient and accurate algorithms to recover the missing data.Although there are already many research results in the field of traditional matrix completion,data recovery can be achieved.However,recent studies have shown that the tensor-based completion algorithm can recover missing values in traffic data more efficiently and accurately than matrix-based completion algorithms.Therefore,based on the characteristics of DPI traffic data and the requirements of actual application scenarios,the paper uses different tensor models to model the data of DPI traffic data,and applies different tensor completion algorithms to complete the missing data in the tensor.The main work and contributions of this paper can be divided into the following aspects.First,based on the high-order characteristics of DPI traffic data,this paper models DPI data as a third-order tensor,referred to as DPI tensor.In addition,according to the characteristics of DPI tensor,the traditional low-rank tensor completion algorithm is improved to make it more effective to recover the missing values in DPI tensor.The main idea of algorithm improvement is to combine the low-rank tensor completion algorithm with the singular value threshold algorithm(SVT algorithm),and extend the singular value threshold algorithm of the matrix to the tensor completion scene,and propose the TSVT algorithm.Finally,through experimental comparative analysis,it can be concluded that the TSVT algorithm can recover the missing values in the DPI tensor more effectively and accurately than the traditional low-rank tensor completion algorithm.Secondly,according to the characteristics of the operator data set,this paper adopts the method of joint data analysis on the issue of DPI traffic data completion.In the operator data sets,in addition to DPI data,it also includes users and web information data.In this paper,through the analysis of various data sets,the coupled tensor model is used to couple the heterogeneous information in multiple data sets,and the DPI coupled tensor is built.In addition,CMTF algorithm is used to complete the missing values in the DPI coupled tensor,which improves the accuracy of data recovery.The paper applies the coupled tensor method to joint data analysis of different data sets,which improves the data utilization rate and greatly improves the performance of data completion.Finally,based on the DPI coupled tensor completion,the paper also introduces the concept of user dynamic preference.The weight penalty is imposed on items with relatively large user preference changes to optimize the performance of DPI tensor completion.In the experimental part of the thesis,this paper compares and analyzes the performance of different completion algorithms in DPI missing data.And the paper also verifies the validity and accuracy of the algorithm through different metric methods. |