| In the complex and changing cloud environment,in order to ensure that the deployed services and applications are online 24/7,it needs to closely monitor multiple metric time series(such as CPU utilization,request response delays,etc.)of entities(hosts,containers,applications,etc.),and ensure the quality and reliability of services.In recent years,many studies have used deep learning algorithms for time series anomaly detection,but most of them are metric-level.Due to the label data used for model training are difficult to obtain,which makes it difficult to use supervised learning algorithms,but unsupervised algorithms either require a large amount of normal data for training or have low accuracy,so that they cannot satisfy large-scale time series anomaly detection in the cloud environment.In view of the above problems,this thesis studies the anomaly detection of time series operational data in cloud environment,so as to timely detect anomalies and accurately locate the cause of the anomalies.Firstly,this thesis proposes LR-Semi VAE,a entity-level LSTM-based Semi-Supervised Variational Autoencoder anomaly detection algorithm for multivariate time series.LR-Semi VAE uses a small amount of label to drive a large amount of unlabeled data for training together,and uses VAE to learn the complex distribution of multivariate time series,and uses LSTM(Long Short-Term Memory)to model the temporal dependence relationship between data,and the label predicted by the classifier is used for VAE to reconstruct the input series.By improving ELBO loss,LR-Semi VAE focus on the normal mode and ignore the abnormal mode in the training process,and use reconstruction probability score as the anomaly detector.The anomaly detection performance of LR-Semi VAE is improved by about 30% relative to the semi-supervised learning algorithm VAE M2 on a third-party dataset,and greatly outperforming unsupervised learning algorithm LSTM-VAE by about 50%.Secondly,for the anomaly detection of service-oriented architecture and of microservices application,this thesis further proposes RT-Semi VAE algorithm based on LR-Semi VAE.RT-Semi VAE uses LSTM to capture the short-term dependence of multivariate time series,and uses the multi-head attention mechanism Transformer to learn the long-term dependence,and introduces parallel computing to improve the training speed of model,and RT-Semi VAE trace the root cause entity according to the service invocation chain and locate the cause of the anomaly to the specific metrics,so that the operations can timely find location of anomaly and take some measures to repair.Experiments show that anomaly detection performance of RT-Semi VAE is respectively improved by about 37% and 55% compared with VAE M2 and LSTM-VAE.Finally,the LR-Semi VAE algorithm for monolithic application and the RT-Semi VAE algorithm for service-dependent applications are fused and integrated,and a time series anomaly detection prototype for intelligent operation and maintenance services is designed,including monitoring and collecting time series data,storing data,integrating anomaly detection algorithms,locating anomaly,data display,etc.Comparing with the existing detection schemes from packet loss,memory leaks and CPU hog common anomaly types,the results show that the prototype can quickly and accurately detect anomalies and locate the cause of anomalies in a fine-grained manner,and it has greater advantages in ensuring the quality and stability of services. |