Font Size: a A A

Research On Key Technologies For Self-healing Scheduling In Distributed Systems

Posted on:2010-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:X LuFull Text:PDF
GTID:2178360272479363Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Self-healing scheduling technique is critical for dependability of computer systems and also a guarantee of high availability. Traditional techniques for failure recovery highly depend on the redundancy and administrators' domain knowledge. Due to the cost and difficulty of failure recovery, self-healing ability became an important research field in dependability computers research. Therefore, relative researches were developed in this paper to tackle the problem in this field. Our main contributions are summarized as follows:To overcome the challenges of recovery polices generation in the presence of inaccurate failure detection, a failure recovery model for microrebootable distributed systems based on discounted Partially Observable Markov Decision Processes is presented in this paper. Thus the reasonable recovery policies are generated by solving the POMDP model. To tackle the problem of computational complexity of exact solution, a value function approximate solution called fast informed bound solution is used for the near-optimal policies. In addition, the lower and upper approximations bound of the optimal value function are proposed, which are used for the error estimation of near-optimal value function with maximum bound difference. Simulation-based experimental results on a realistic network security situation prediction system demonstrate that the proposed model can be solved effectively, and the resulting policies convincingly outperform others.Secondly, a distributed systems tasks failure recovery model is presented based on microreboot. Compared with other models, our model not only takes recovery time into consideration, but also considers the reliability cost of recovery. Therefore, our model is more precise and accurate. The correspondingly algorithm of real-time task failure recovery is presented based on the extended bayes analysis, which takes reliability cost into account when recovery time priority is equal. To provide theoretic foundation for failure recovery, we prove the recoverability of our algorithm.Finally, we present an failure prediction method based on manifold learning. To extract failure features for prediction, we apply an nonlinear dimensionality deduction algorithm called supervised Hessian locally linear embedding algorithm. Then we adopt k nearest neighbors classifier for classification. The experimental results show that manifold learning approach can effectively find the failure inherent features and makes the failure prediction based on manifold learning possible.
Keywords/Search Tags:Self-healing Scheduling, POMDP, Failure Prediction, Manifold Learning, Task Scheduling
PDF Full Text Request
Related items