Font Size: a A A

Analysis And Research Of Disk Failures In Data Centers

Posted on:2021-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S YiFull Text:PDF
GTID:1488306107455194Subject:Computer architecture
Abstract/Summary:PDF Full Text Request
With the development of big data technology,the demand for data storage has been growing exponentially in data centers.Disk drives have become the main storage device in data centers,depending on its high data density,large capacity,and high cost-performance ratio.However,the short lifetime of disks affects the reliability of disk-based storage systems significantly.The latest research shows that disk is one of the highest risk components in data centers.Disk failure not only affects the availability of applications but also brings a high cost of maintenance to data centers.Focusing on the issues above,this dissertation researches the disk failure problem in data centers on the following three aspects: the failure analysis,the disk failure prediction and the disk failure recovery.In terms of the failure analysis,this dissertation analyzes how disk workload affects disk failures.We build the temporal features and spacial features of the disk workload with disk utilization and disk bandwidth.With an in-depth analysis of temporal features and spacial features,the access mode of disk workload is constructed to explain the influence of disk workload on disk failures.The proportion of sequential requests plays a decisive role in the reduction of the annual failure rate.According to the findings above,some scheduling advice of disk workload assignments is proposed for data centers to reduce disk failures.In terms of the disk failure prediction,this dissertation solves the mislabelling problem in data of disk failure prediction.By analyzing the prediction model fitting process in a productive environment,this dissertation reveals the mislabelling problem in the training data.Combining the cutting edge weighting algorithm in the relative neighborhood graph with the feature of data of disk failure prediction,a detecting and correcting algorithm for the mislabelling problem is proposed for disk failure prediction in a productive environment.The experiments on public datasets and popular disk failure prediction algorithms validate the improvement of our algorithm on the disk failure prediction.In terms of the disk failure recovery,this dissertation provides a disk failure recovery strategy to eliminate the contradiction between the limited source of disk failure recovery and the increasing frequency of disk failures.By analyzing the shortcomings of the existing disk failure recovery strategy when facing multiple warnings,this dissertation reveals the urgently demands to build a real-time,fine-grained disk failure recovery strategy.This dissertation proposed a disk failure recovery strategy by the understanding of the disk failure order,the sorting algorithm Lambda MART,and a sorting metric designed depending on features of disk warnings.The experiments verify our failure recovery strategy performs better than the current failure recovery strategies in data centers.In summary,this dissertation studies the disk failure by analysis,prediction,and recovery of it.With the study of disk failures in a productive environment,the research work can enhance the understanding of disk failures,reduced the impact of disk failures on applications and services,and recovery disk failure with limited resources in data centers.
Keywords/Search Tags:Disk Failure, Root Cause Analysis, Disk Failure Prediction, Disk Failure Recovery, Disk Workload, Mislabelling Problem, Learning to Rank
PDF Full Text Request
Related items