| In recent years,the credibility of software system has attracted great attention.There are two important backgrounds.On the one hand,the informatization level of important national fields such as aviation,aerospace,finance,electric power and national defense is increasing,and the operation of national infrastructure increasingly depends on largescale distributed software systems.In these key areas,software failure will cause huge losses to society and users; On the other hand,large-scale distributed software system is a complex life system,and its credibility cannot be solved once and for all before its ”birth”.It needs online ”grasp” and ”adjustment” in the process of software running.Therefore,the importance of software operation is becoming more and more prominent.However,with the increasing of software scale,the expansion of data and the diversification of service types,the manual software operation has been very inadequate.In recent years,with the rapid development and wide application of artificial intelligence technology,the concept of Artificial Intelligence for IT Operations(AIOps)came into being,which means applying artificial intelligence technology to software operation.Using intelligent means to assist manual decision-making,AIOps technology has achieved outstanding results in many big data operation fields.Anomaly detection is an important means of AIOps technology.Through the anomaly detection algorithm,we can analyze all kinds of data generated during the operation of the software,finding the anomalies,and then make targeted adjustment and troubleshooting.At present,there are two serious problems while using anomaly detection algorithm for software operation:(1)In the face of large-scale data,the commonly used deep learning(DL)based anomaly detection algorithm and rule-based anomaly detection algorithm are difficult to give consideration to both detection efficiency and detection accuracy—rule-based algorithm has high efficiency but low accuracy,while DL-based algorithm has high accuracy but low efficiency.This limits the application of such algorithms in actual systems;(2)Different from the traditional deep learning algorithm,the model of anomaly detection algorithm needs to be updated frequently,because the software will be updated and adjusted even after it is put into use.In this case,the model needs to be able to adapt to new data,but the currently proposed anomaly detection algorithm does not provide the corresponding model update strategy.After the software is updated,the anomaly detection algorithm also fails.In view of these practical problems,we puts forward corresponding solutions.The specific work of this paper is as follows:(1)Aiming at the problem that the DL-based anomaly detection algorithm and the rule-based anomaly detection algorithm cannot give consideration to the detection efficiency and detection accuracy at the same time,we propose an efficient anomaly detection(EAD)algorithm which based on ”rule and learning”.By combining the DL-based algorithm and the rule-based algorithm,our algorithm can take into account both detection efficiency and accuracy.The algorithm can not only ensure the accuracy,but also greatly improve the detection efficiency and reduce the occupation of computing resources.Compared with the traditional algorithm,our algorithm is more practical.(2)Aiming at the problem that the current anomaly detection algorithm model can not be updated stably,we propose an efficient model updating algorithm for anomaly detection.The algorithm updates the model by maintaining a fixed size ”data pool” and transfer learning.Specifically,we store information of the data in the ”data pool”,and fine tune the model by transfer learning with the ”data pool”.When the input data changes,the algorithm can update the model at the lowest cost.The algorithm ensures that the updated model will not be ”over forgotten”,and has high accuracy in both original data and new data.We carry out comparative experiments on real data sets and synthetic data sets respectively.The experimental results show that EAD can significantly improve the detection efficiency on the premise of ensuring the accuracy.Compared with traditional methods,EAD can save about 80%~95% execution time; The efficient model updating algorithm can expand the detection ability of the model at the lowest cost.By using the algorithm to update the model,we can ensure that the model has high accuracy in both the original data and the new data.Compared with directly expanding the data set,it can save 50%of the training time.In general,our work can meet the actual needs and further improves the practicability of anomaly detection algorithm in real software intelligent operation. |