| System logs record the execution path and critical status of a computer system in real time,providing effective help for administrators to debug and maintain the system.In the field of system security,system log anomaly detection is an important technology to ensure reliable system operation and achieve rapid fault diagnosis.Existing anomaly detection methods require marker data and focus on sequential patterns or semantic information in logs,which easily leads to a high leakage rate and false alarm rate for anomaly detection,etc.Therefore,it is especially important to introduce high-performance anomaly detection methods into the field of intelligent operation and maintenance.In addition,there are many kinds of faults in large-scale distributed software systems,the debugging process is difficult to obtain user files,the system monitoring capability is limited,the existence of fault-tolerance mechanisms increases the difficulty of troubleshooting,and how to accurately diagnose the root cause of faults becomes the key to guaranteeing the high availability and reliability of the system.To address the above two points,the thesis presents a log semantic analysis-based anomaly detection method and a fault root cause diagnosis method,and explores system security through two main aspects:construction of theoretical models and experimental demonstrations.The research paper focuses on the following main topics:(1)Alog anomaly detection method based on system behavior analysis and global semantic awareness,namely LogBASA,is proposed.System log sequences not only contain a large amount of semantic information,but also record the execution paths and timestamps of system tasks,and these key pieces of information help to improve the reliability and effectiveness of anomaly detection.Firstly,a system log knowledge graph(SLKG)is constructed based on unstructured and multi-level system logs.Second,a self-attention encoder-decoder transformer model for log spatio-temporal association analysis is proposed to fuse the spatio-temporal features and semantic mapping of log sequences to analyze system behavior and log semantics in multiple dimensions.Based on this,a model training method combining adaptive spatial boundary delineation and sequence reconstruction objective functions is proposed to employ special words to characterize the log semantic states and train LogBASA by unsupervised training to automatically delineate abnormal boundaries and log sequence reconstruction.Finally,this thesis carries out comprehensive experiments on three actual datasets,and its accuracy rate reaches 99.3%,95.1%,and 97.2%,which is at least 3%more accurate than related models such as DeepLog,LogAnomaly,and LogCluster,proving the effectiveness and superiority of LogBASA.(2)A knowledge graph-based system fault root cause diagnosis method is proposed that is based on SLKG for the structured representation of relationships and dependencies among abnormal logs,semantic embedding of log templates by natural language processing techniques,degradation and visualization of semantic feature vectors by combining the t-SNE algorithm,fault category classification by using the K-means algorithm,and log text based on classification results.Based on the classification results,the log text is analyzed and annotated.Based on this,this thesis designs a text similarity matching module and an entity identification matching module to achieve efficient system root cause fault diagnosis.Experiments were conducted on the actual HDFS dataset,and the outcomes demonstrated that the proposed system fault root cause diagnosis method in this thesis outperforms the correlation models based on logistic regression and multilayer perceptron.The accuracy of the proposed method improved by 7%and 2%,respectively,and its fault root cause diagnosis accuracy reached 99.1%.This thesis constructs a system log knowledge base through knowledge mapping to achieve high-performance system anomaly detection and designs a combined coarse-and fine-grained system fault root cause diagnosis model for unknown abnormal system logs to achieve accurate identification and fault root cause diagnosis of system abnormal logs. |