Font Size: a A A

Research On Node Fault Tolerance Selection And Backup Data Transmission In CEPH Distributed Storage System

Posted on:2023-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y N XiaFull Text:PDF
GTID:2568306836964529Subject:Engineering
Abstract/Summary:PDF Full Text Request
In order to meet the growing demand of massive data storage,Ceph distributed storage system solved the problem of massive data storage with a large amount of commercial hardware.But it also leads to the node failure frequently.In order to ensure the security and reliability of data,the research on disaster recovery of distributed storage system has achieved fruitful results at home and abroad.But there are still some problems need to be optimized.This paper mainly focuses on two problems.The first problem is that Ceph storage system only uses storage capacity as the primary and secondary OSD selection criteria,resulting in a long data repair delay.The second problem is that the bandwidth is tied up for a long tim caused by a large amount of data transmission in Ceph distributed storage system with multiple data centers.To solve these problems,this paper presents a node fault-tolerant selection method and backup data transmission method for Ceph distributed storage system.The specific research contents are as follows:(1)A node selection method FTNSC(Fault Tolerant Node Selection based on Ceph)based on multi-attribute decision-making and genetic algorithm is proposed.Considering the node load information and network state comprehensively,the data repair delay in case of node failure is reduced by maximizing the processing capacity of the node and the link bandwidth.A Ceph fault-tolerant node selection model is proposed,and the node load and network state information in the cluster are obtained by SDN as the data support for the optimal node selection.The primary and secondary OSD nodes are selected in two steps.Firstly,the performance of the node CPU,IO,memory and chip are considered.The multi-attribute decision algorithm is used to find the node with the best performance to store the primary OSD.Then we use the artificial bee colony algorithm to select the best second OSD according to the network status and node performance.The simulation of Ceph distributed file system by Mininet verifies the effectiveness of the proposed method.The results of the experiment show that the Ceph based fault tolerant node selection method can reduce the data recovery time when nodes fail.(2)For Ceph distributed storage system with multiple data centers,a data transmission path selection method DTPSMT(Data Transmission Path Selection Method based on Topsis)based on approximate ideal solution sorting method is proposed.The optimal transmission path of backup data is selected based on the comprehensive consideration of the maximum remaining bandwidth of the path and the number of hops and network parameters.Firstly,taking minimizing the transmission delay of backup data as the optimization goal,a backup data transmission model of multiple data centers is proposed.Then,SDN network is used to measure the network state in multiple data centers in real time,the maximum remaining bandwidth and hops of the path are measured,and iperf is used to realize the simultaneous data flow transmission of multiple data centers.Then the Dijkstra algorithm is used to obtain the k-shortest paths that meet the transmission bandwidth constraints and the node capacity constraints of the data center.Finally,on this basis,the approximate ideal solution sorting method is used to obtain the optimal backup transmission paths of multiple application data centers.Experiments show that dtpsmt algorithm has short transmission delay and can ensure the normal service of network service.
Keywords/Search Tags:Ceph distributed storage system, software defined network, node placement, data evacuation, artificial bee colony algorithm
PDF Full Text Request
Related items