| With the vigorous development of new-generation information technologies such as5 G,cloud computing,and big data,the amount of global data and information has exploded.As a mainstream data storage device,the reliability of hard disk has become one of the research hotspots in the current storage field.At present,academia has studied statistical learning methods and machine learning methods based on hard disk condition data to predict hard disk failures,and achieved certain results.However,due to the constraints of data set attribute selection,non-uniform dimensions,and imbalance of positive and negative samples,the prediction effect is not ideal,and it is difficult to apply to practical scenarios.In addition,the existing work mainly predicts the failure of the hard disk,and it is difficult to accurately reflect the health status of the hard disk.In view of the above problems,firstly the hard disk condition data set is analyzed,and the remaining useful life of the hard disk is calibrated for subsequent hard disk remaining useful life prediction;the threshold method and the correlation coefficient method is combined for attribute screening to ensure the prediction performance of the model and reduce the complexity of the model.spline interpolation and normalization are carried out on the data to improve the prediction performance of the model;the data is balanced by combining random undersampling and improved oversampling to improve the recall rate of the prediction model.On this basis,a hard disk failure prediction model and a hard disk remaining useful life prediction model based on ensemble learning algorithm are proposed.The experimental results on the public data set Backblaze show that the average results of the failure prediction model in terms of accuracy and recall are 97.93% and 95.99%,respectively,which are 0.78% and 2.84% higher than the optimal benchmark model on average;The average results on mean absolute error,root mean square error and coefficient of determination are 0.386,0.913 and 0.9914,respectively,which are 0.018,0.301 and 0.0011 higher than the optimal benchmark model.A hard disk health status assessment method is designed to reflect changes in hard disk condition before failure by integrating hard disk failure prediction and remaining useful life prediction.A hard disk health status assessment system that can be used in data centers is implemented,and the system has functions such as automatic data collection and storage,hard disk failure prediction and remaining useful life prediction.The system is experimentally tested,and the test results show that the system can complete the health status assessment of tens of thousands of hard disks within 3s,which can meet the actual needs. |