Font Size: a A A

The Design And Implementation Of Hard Disk Failure Prediction System Based On Azure Platform

Posted on:2021-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2568306500972729Subject:Engineering
Abstract/Summary:PDF Full Text Request
Hard disks are the main storage device of computers.With the development of computer hardware and software and Internet technology,the amount of data generated by people is growing exponentially,and the security of data can’t be ignored.For Both individuals and business users,the loss of data is unacceptable.At present,in order to ensure the integrity of data,the industries prepare multiple data copies for master-slave switching,and use erasure codes for data recovery,which has achieved certain results in ensuring data security,but they also have short comings like high costs,slow response and other issues,there is room for improvement.At the same time,the use of statistical and machine learning methods to predict hard disk failure has made great progress,and a prediction model that can be used in industry has emerged.Under this background,a real-time prediction system for hard disk failure has emerged.This thesis mainly describes the requirement analysis of hard disk failure real-time prediction system,related design and implementation.The real-time prediction system for hard disk failure is a system that spans over multiple machines and kinds of platforms and is divided into four subsystems for realization,they are data collection subsystem,prediction subsystem,prediction result processing subsystem,and prediction result analysis subsystem.No SQL databases are used in the data collection subsystem to store massive hard disk related data.The prediction subsystem uses Spark-based Azure Databricks for data processing and prediction.It also uses No SQL databases to store a large number of prediction results and feature data,and a small number of failing disks that will be frequently queried.The prediction result processing subsystem is connected to the original hard disk failure processing module.The prediction result analysis subsystem also uses Azure Databricks for data analysis and also uses Power BI to generate result analysis reports.The system is now completed and is in the online trial stage.After a period of efficiency evaluation,it can be confirmed that the total time for the system to complete the prediction of the single-time million hard disk failure is within 40 minutes,and the hard disk failure responds time was shortened by more than 4 hours,and at the same time,a method of combining machine learning models with actual business was explored,which laid the foundation for the deployment of other models.
Keywords/Search Tags:Hard Disk Failure Prediction, Azure, Spark, NoSQL
PDF Full Text Request
Related items