| With the booming development of IoT,cloud computing and other industries,the scale of data center storage data volume is exploding,the reliability of data center storage system is facing serious challenges,and disk failure is the main source of data center failure.A data center disk failure can not only lead to data loss,but also to rework of computing tasks that are using data stored on the failed disk,which can have a significant impact on the efficiency of long-cycle cloud computing tasks.If disk failures can be accurately predicted,not only can the loss of data and the cost of data storage be greatly reduced,but also the efficiency of long-cycle cloud computing tasks can be significantly improved.Hard disk failure prediction technology has now become a hot research topic in academia and industry,and is of great importance to improve data center reliability.Although there have been some studies on disk failure prediction,most of them can only handle cases with small differences in positive and negative sample ratios,and it is difficult to handle disk data with extreme imbalance in positive and negative sample ratios in production environments.At the same time,the existing disk failure prediction studies also do not consider the SMART characteristics that change continuously with time,and the trained disk failure prediction models tend to gradually decrease in accuracy over time.To address the above problems,this paper investigates and implements a disk failure prediction method for unbalanced data from the practical requirements of disk failure prediction,combined with the PAKDD intelligent operation and maintenance competition open source dataset.The method uses LightGBM and CNN-LSTM serially integrated learning model structure,and the prediction accuracy under extreme unbalanced data is better than other methods.At the same time,considering that the disk SMART feature law will keep changing with time,this paper also designs and implements a disk failure prediction model design and generation system.In addition to supporting the import of disk data and one-click feature construction,the system also supports the visual design of CNN-LSTM network structure to help users to continuously optimize the model.The tests show that the disk failure prediction model design and generation system designed in this paper can meet the needs of users to design and generate models. |