| Fault diagnosis and anomaly detection are two major focus issues in the field of operation and maintenance.With the digital transformation of various industries such as telecommunications,finance,and transportation,the increasing complexity of information systems has brought great challenges to the timely discovery of anomalies and rapid identification of faults in operations and maintenance.Although traditional model-driven fault diagnosis and anomaly detection methods have achieved fruitful results in industrial systems,they are difficult to adapt to complex heterogeneous networks.Data-driven methods have become the mainstream way to solve this problem in recent years,but they have not comprehensively analyzed time series and network topology,and the difference between alarm data and indicator data in the operation and maintenance field is also significant.Therefore,this study conducts technical research on fault detection and anomaly detection from the perspectives of alarm data and indicator data as well as time and topology dimensions,to analyze and mine the fault diagnosis rules,learn the patterns of anomaly phenomena,and propose highly feasible and performant methods.This paper proposes a fault diagnosis method based on clustering and graph data mining and an anomaly detection method based on probabilistic constrained autoencoder.(1)The fault diagnosis method for alarm data needs to merge streaming alarm data first and then detect and locate faults based on the merged alarm set.Currently,the industry mainly relies on rule-based methods for fault diagnosis,which lack self-learning and iterative updating functions.Since alarm merging is difficult to fit most supervised learning tasks,this paper uses density-based clustering to complete alarm merging to reduce the manual operation and maintenance burden.It uses frequent subgraph mining methods to intelligently discover and identify fault patterns and automates dispatching and maintenance.This method has an architecture with self-learning and iterative updating capabilities and has been deployed in an automatic dispatching system to reduce operational pressure and achieve timely detection and disposal of faults.Specifically,considering that alarms in the alarm merging set have spatial and temporal correlations,a mapping method for alarm text information vectorization is proposed to make alarms with related relationships close to each other in the vector space,and unsupervised clustering is used to complete alarm merging.Considering the topological relationships between nodes where alarms occur in the network scenario,a pattern mining method for the topology relationship between alarms and their corresponding nodes is proposed to achieve fault pattern discovery using frequent subgraph mining methods.(2)Due to the fact that the number of normal samples is far greater than that of abnormal samples,and there are often abnormal samples in the actual testing process that do not exist in the training set,most indicator-based anomaly detection methods are based on unsupervised learning reconstruction methods.That is,the potential patterns of normal samples are learned by reducing reconstruction loss.However,this method cannot guarantee that the model learns discriminative features between normal and abnormal samples.In addition,the concept of abnormal samples is based on probability distribution,which refers to the abnormal points deviating from the original distribution.However,pure neural network learning cannot obtain global statistical information,so the probability information of features should be added to explain the degree of sample deviation.Based on the probability-constrained autoencoder,the correlation of the learned distribution rather than the original value of each feature is learned.A new data preprocessing method is designed to efficiently obtain statistical information without losing a lot of physical meaning.This method can handle continuous and discrete data to provide unified access to various types of data.A clever loss function is defined to construct the constrained latent vector.By introducing deviation loss,we can make the latent vector converge to the prior distribution,used to calculate the deviation of each sample relative to the prior distribution.This design can better handle the problem of sample imbalance,retain the generalization ability of the model,and learn distinguishable features between normal and abnormal samples. |