Font Size: a A A

Research On Machine Learning Algorithm Mining Real Time Log Stream In Cloud Operating System

Posted on:2019-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330545481637Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,computer system becomes more and more complex,and the interaction between the software and hardware appears more frequently.Cloud operating system has been more and more popular,but it is easier to lead all kinds of problems when the cloud operating system is running,which caused by caused by theirs large scales and high degree of complexity.Therefore,how to guarantee reliability has become an important issue in system design and management.In the modern and large-scale distributed system management,system logs have always been the primary source of checking system status.Systems operating produce many log records,and the console log is usually the main source of information to eliminate troubles for system administrators.Due to the increasing scale and complexity of the modern system,all kinds of components will generate amount of log information,including running report and error massage.In order to mine valuable knowledge and rules from system log data more effectively,researchers proposed a data mining method to solve this problem,combing machine learning algorithm statistics and other technologies.And machine learning algorithm is the main content of data mining and also an important tool for the study of various disciplines.Based on the above background,this thesis designed and realized intelligent fault detection and positioning system for cloud operating system,which applied machine learning algorithm,especially clustering algorithm.By analyzing the system framework source code,combined with the console log management mode,extract message templates,and fault classification templates.First,real-time pretreatment of log stream,process modeling,pattern matching and statistical grouping,extraction of feature vector matrix,and then using PCA for anomaly detection,and finally using S-Kmeans clustering algorithm for fault classification.The system can help the system administrator understand the real-time status of the cloud operating system,divide the log into different fault types and determine the root cause of the fault.The proposed approach is tested on the virtual cloud platform Apache Hadoop cluster.The experimental results showed that the fault detection and positioning accuracy could reach more than 98% according to the proposed S-Kmeans clustering algorithm and principal component analysis method.The innovation points of this thesis are presented mainly as follows:1.Firstly,the fault classification template was established in thesis,and then the machine learning algorithm was used to detect and locate the fault in real time,which greatly improved the detection efficiency.2.This thesis puts forward a S-Kmeans clustering algorithm which was used to dig the log data in real time.First,the principal component analysis algorithm is used to extract the abnormal eigenvectors and reduce the fault range.Then use S-Kmeans clustering algorithm to locate the fault in real time.The efficiency of fault classification is greatly improved.
Keywords/Search Tags:Cloud OS, Log, Mining, PCA, S-Kmeans
PDF Full Text Request
Related items