Font Size: a A A

Application Of CatBoost Model To Locate Root Causes Of Abnormal Faults

Posted on:2021-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:F Y LiuFull Text:PDF
GTID:2428330626461116Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the increasing complexity of the network in the era of big data,faced with massive amounts of unstructured data,traditional operation and maintenance processing faces many problems of intelligence and monitoring visualization.In view of the pain points of the research data on the current network alarm,the large number of work orders,and the difficulty in locating the cause of the fault,based on the historical network alarm and fault location work order data,the root cause analysis model of the failure is established through machine learning methods,and the suitable for specific application scenarios The intelligent network data flow anomaly detection algorithm can quickly locate the cause of the failure,thereby reducing the actual number of orders and optimizing the dispatch of orders,improving the efficiency of network operation and maintenance,and improving the robustness of the entire operation and maintenance system.This article takes the classification of abnormal root cause location in the business application operation and maintenance system as research content,and conducts research based on text data.The main work of the experimental part is as follows:1.In the feature data mining stage,a text sequence mining framework based on time series and space which can be understood as base station ID in experimental data is proposed,and five aspects of TF-IDF for text mining based on time series vectorization,the word vectorization of the word2 vec in terms of alarm is performed for text mining based on time series;the word vectorization of the word2 vec in the base station is used for text mining based on spatial sequence.Improve the mining mode from the data itself and data statistics to fully predict the classification.2.Use word2 vec as word embedding to capture temporal and spatial sequence features and mine TF-IDF and word2 vec according to different sequence mining information,named tf-idf word2 vec to combine text information with local information,effectively solved the limitations of the two algorithms,and was verified on the model.3.In the modeling stage,based on the time and space-based sequence mining,compare the algorithms CatBoost,SVM,and RF experimentally,and CatBoost performs better.Secondly,based on the tf-idf word2 vec feature change,the CatBoost model and the weighted CatBoost model are compared,and the weighted CatBoost model AUC is improved,which proves the feasibility of the sequence mining feature proposed in this paper based on time and space for the classification algorithm to be constructed in the actual text operation and maintenance scenario.Finally,based on the weighted CatBoost,the model obtained a better score on the custom evaluation index,and the experimental idea was verified.
Keywords/Search Tags:SIntelligent operation, Maintenance, Abnormal fault location, Time-space sequence mining, Tf-idf_word2vec, Text classification, Catboost
PDF Full Text Request
Related items