| With the marketization of technologies such as electronic commerce,business intelligence,big data analysis and face recognition,it generates a large demand for computing,storage,and network resources,and needs to be adjusted in real time as the business changes.Cloud computing can provide a win-win business model for its service providers and users with flexible expansion and on-demand distribution,making it a hot topic in the field of computer research.The infrastructure of cloud computing services often has a large number of servers running,thus ensuring stable operation of the server is the guarantee of the stable service provided by the upper layer of cloud computing.This paper collects the relevant state information of the server by means of the interface that is provided by the operating system running on the server.We use the threshold of the user-defined state value and server history state data to decide if the server is running in an abnormal state or not,and therefore realize the automatic abnormal perception,abnormal notification and abnormal traceability of cloud computing center server.Due to the large number of servers in cloud computing center,the requirement of real-time anomaly detection and the high-dimensionality of attributes collected,we designed and developed a framework consisting of seven modules to collect,transfer,store,analyze,trace and present the data.In this paper,we use supervised learning method to detect the explicit abnormal state,and use un-supervised learning to detect unknown abnormalities.Finally,the software developed in this paper is deployed in the server,and the data of server-related attributes is collected and tested by the related abnormal detection algorithm.The research work of this thesis includes the following aspects:(1)The cloud computing platform center server system anomaly detection system has carried on the demand analysis and detailed design,and open source software and framework has been used to implement the system.The functions completed include the use of asynchronous non-blocking data communication,fast storage and query of data based on column database,using message middleware to improve system stability,real-time data visualization based on Web Socket protocol.(2)After studying the anomaly detection algorithms,supervised learning support vector machine(SVM)algorithm is used to detect known server anomaly,and through the unsupervised learning One Class SVM algorithm,the Isolation Forest algorithm is employed to detect unknown server exceptions.At the same time,this paper puts forward the method of sliding window to comprehensively evaluate multiple continuous samples,and reduce the error of abnormal detection due to the slight instability of the server.(3)The system has been deployed and tested,and the server status indicators and logs are collected,then the abnormal detection algorithm and the method of reducing false alarm probability based on sliding window are used to carry out the experiments.At the same time,the function of abnormal notification is checked and the function of abnormal tracing based on time point location log information was tested. |