Font Size: a A A

Research On Fault Detection And Root Cause Localization Techniques For AIOps

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaiFull Text:PDF
GTID:2568307169483424Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Modern IT systems is becoming more and more huge,and gradually develop from traditional static systems to dynamic and hybrid systems.The traditional manual oper-ation and maintenance is gradually inadequate,which gives birth to the application of artificial intelligence in the field of IT operation and maintenance,that is,the AIOps.It collects data through various IT operation and maintenance tools and equipment,and realizes automatic problem discovery and processing after analysis.On the other hand,the microservice architecture divides a single application into a group of service nodes running in independent processes and communicating with lightweight protocols,which improves the abstraction and modularity of the system.However,its huge number of service nodes,dynamic call chain and faults’ transmission characteristics also bring chal-lenges in operation and maintenance.How to quickly accurately detecting the fault of microservice system and localizating their root cause has become a research hotspot.Based on the concept and method of AIOps,this study proposes an intelligent fault detection method Trace VAE and an intelligent root cause localization method Model-Coder applied to microservice system,and combines them into an integrated fault detec-tion and localization method Trace Model.In the aspect of fault detection,aiming at the high complexity of the system,we define the deployment graph and service dependency graph to realize the formal representation of the system? Aiming at the problem that the detection of system response time anomaly is easily disturbed by the call chain,we extract the trace from the service dependency graph based on the maximum branch principle,the request class is divided according to the link,and the anomaly is detected in the request class,so as to eliminate the influence of the call chain? Aiming at the problem that some faults have low abnormal amplitude and it is easy to make misjudgment,the variational automatic encoder is used to map the request response time data into reconstruction probability,so as to increase the difference between normal data and abnormal data and improve the accuracy of anomaly judgment.In the aspect of root cause localization,aiming at the strong dynamic of microservice system,we locate the root cause based on the abnormal service dependency graph gener-ated in real time? Aiming at the various forms of fault root node,we define explicit and implicit nodes to improve the expression ability of fault feature? Aiming at the multi node characteristics of fault,we summarize the node characteristics based on the node feature group,and realize the node feature analysis at multi-node level? Aiming at the diversity of fault,we use the coding method to formally represent the node feature,and realize the formal comparison and analysis of fault feature.Experiments on the real-world microservice system monitoring data set show that the average root cause localization time of the proposed method is 110 s,the root cause localization accuracy of Model Coder is 0.93,and Trace Vae can improve the fault detection accuracy,which further improves the root cause localization accuracy of Trace Model to0.97,which is 12% higher than the state-of-the-art method.
Keywords/Search Tags:AIOps, microservice, fault detection, root cause localization
PDF Full Text Request
Related items