| The rise of large-scale clusters makes log information more complicated,and in large-scale clusters,logs can be divided into many types,that is,multi-source heterogeneous logs.For the heterogeneous logs in the same cluster,the administrator's grasp of the overall running state of the cluster is greatly hindered.Not only that,but also has a significant impact on log anomaly detection in the cluster.There are a lot of log aggregation methods and anomaly detection methods.Most of the log aggregation methods are basically performed under the same system.The heterogeneity of logs,the number of log attributes,and the continuity of log attributes are not fully considered.Issues such as the degree of association between log attributes.In the same cluster,log anomaly detection also has many related methods to solve.However,most methods still face many factors such as low accuracy,low efficiency,and large memory overhead.To this end,this paper firstly studies and analyzes multi-source heterogeneous logs in a large-scale cluster environment.Based on the research of association rules related algorithms,a dynamic frequent item set mining algorithm based on adjacent multi-tables is first proposed to mine between logs.Correlation and aggregation of highly correlated logs saves a lot of memory space,so that it can effectively understand the resource utilization of the cluster.Secondly,an improved DS evidence theory algorithm is proposed to detect the abnormality of the log after the aggregation,and divide the log after the aggregation into the abnormal log and the normal log.Finally,in order to improve the effective management of the overall operation of the cluster,this paper proposes a KNN classification algorithm based on improved K-modes clustering for the classification of abnormal logs.The specific work is as follows:1: In order to comprehensively and accurately understand the resource management platform in the cluster and the running status information of the jobs submitted by external users.In this paper,by changing the structure of data set storage,the association between logs is analyzed.A frequent item set mining algorithm based on multiple tables is proposed.This algorithm is used to analyze the relationship between log and log.Aggregate logs with associations.2: In the face of the log after the aggregation,in order to further improve the cluster management,it is necessary to perform anomaly detection and analysis on the log.In this paper,by studying the similarity calculation method of log attributes,the weight calculation method of log attributes is improved.An improved DS evidence theory is proposed to classify the logs after aggregation and discard the logs of unrecognizable type.3: After detecting the abnormal log,you need to know the type of log exception more clearly,so that the administrator can solve the abnormality purposefully.In this paper,the KNN classification algorithm based on improved K-modes clustering is proposed to classify the anomaly log.This method not only considers the frequency of the log in the whole data set,but also analyzes the span of the log attribute in the whole data set.It shows that compared with the same classification method,this method has certain improvement in classification accuracy and execution time. |