With the vigorous development of Internet technology,massive and complex user data is being generated in the network all the time,and human society has also entered the era of big data.Among them,the research work on medical big data has become the focus of attention at home and abroad.For the exponential growth of more and more medical data,how to deal with these massive data and how to apply these data to real medical applications,these problems have caused great difficulties and difficulties in the application of computer technology in the medical field.challenge.Faced with the growing demand for medical data analysis,distributed file storage systems have emerged.The distributed file storage system has the characteristics of being able to store massive data,easy to expand the storage scale and high fault tolerance,but there are still many bottlenecks that are difficult to break through in terms of fast data recovery and efficient data reading.In this paper,the research on the above problems is carried out,and the main work is as follows:In terms of data recovery methods,fault recovery methods based on XOR erasure codes are often used,which only use XOR operations for encoding and decoding.But it is not enough to tolerate failures,storage systems must also provide fast failure recovery to improve data recovery performance.This thesis proposes a single-node fault fast replacement and recovery method based on the combination of simulated annealing algorithm and hill-climbing algorithm.At the same time,we further expand the usage scenarios,so that the new recovery mechanism can be adapted to storage systems with different node performance(eg,transmission bandwidth and computing power).We have experimented with the replacement recovery mechanism in the distributed system architecture.The experimental results show that our recovery mechanism uses a shorter recovery time than the traditional mechanism,and avoids the search process of the replacement algorithm to the greatest extent.optimal solution.In terms of data reading,this thesis proposes a caching strategy specifically for erasure coded storage systems.The goal is to find a good balance between the number of blocks cached and the overall latency improvement per object.The latency improvement depends on the client’s location relative to the server storing the desired content and the frequency of recent cached data read operations.The experimental results show that the adaptive caching strategy has better system balance performance than the traditional caching strategy,and is more suitable for the actual operation of the massive distributed storage system with frequent reads.Finally,an extended file processing system based on medical data is designed and implemented,while applying two innovations to this system,and giving a detailed design scheme and implementation process. |