| Because of its inherent complexity and large-scale nature,container cloud storage systems often have various failure scenarios that cause application services to fail.The faults situation of container cloud storage has a certain relationship with the application operation situation.Because the container image in the container cloud storage masks the complexity of the environment configuration and installation steps required for different environments in the application deployment process.This allows multiple applications to share the kernel process and kernel resources of the system.Therefore,the performance of an application is affected by other applications that coexist with the same physical host.This situation that causes multiple applications to interfere with each other is prone to grey faults.Grey faults is a kind of failure that easily causes the service failure of the container cloud storage system.This fault can be observed by the applied fault detection mechanism,but it is often ignored by the system’s fault detection mechanism.Studying the relationship between grey faults and application interference scenarios will help the application-centric container cloud storage system to detect and predict faults,so as to reduce the probability of fault occurrence and system maintenance costs.First,by analyzing the characteristics of application performance changes caused by application performance interference,the relationship model RMAIG(the Relational Model based on the Relationship between Application Interference Situationcontext and Grey Fault)is established from the three perspectives of time,space,and logical relationship between applications to the relationship between application performance interference scenarios and grey faults.A grey fault detection strategy based on RMAIG model is proposed.The grey fault detection strategy based on the RMAIG model uses a mixed model of LSTM(Long Short-Term Memory)and BLSTM(bidirectional long short-term memory)to train the RMAIG model.It learns the forward and backward dependencies in the data to improve the self-learning of the strategy and achieve the purpose of accurately detecting grey faults through contextual comparison.Secondly,on the basis of grey fault detection,a grey fault prediction strategy GFP-SG(Grey Fault Prediction Strategy based on Situation Graph)based on situation modeling is proposed.Based on the analysis of the logical and spatiotemporal relationship between data application performance interference scenarios and grey faults,the strategy classifies the grey faults in a targeted context.It builds a snapshot of the situation based on changes in the application context,and establishes a contextual graph of the relationship between changes in the application’s running context and grey faults.Then it uses an adaptive isomorphic context mining algorithm to quickly compare the application context with the context snapshot in the context map,quickly track the change in the fault area,and effectively predict the upcoming situation of the grey fault.Finally,Grey-Warn(Container Cluster Grey Fault Early Warning System),a prototype of grey fault monitoring and early warning system for container storage cluster was designed.Grey-Warn integrates the grey fault detection strategy based on the RMAIG model and the GFP-SG strategy.Grey-Warn was tested and verified in a Docker-based container storage cluster environment.The results show that,with the increase of application running scale,compared with the existing fault prediction strategies,the method proposed in this paper has higher accuracy in grey fault prediction and effectively reduces the prediction cost. |