Font Size: a A A

Research On Metadata Management Technology For Distributed Storage System For Big Data

Posted on:2020-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WuFull Text:PDF
GTID:2428330620953195Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data has shown an explosive exponential growth during the past decades.As a result,the application of distributed storage systems has developed rapidly.Meanwhile,the performance and security issues in storage systems have also attracted much attention of academia and industry.Hadoop Distributed File System(HDFS)adopts a master-slave architecture,which is an open source implementation of Google File System(GFS),and it has become the most popular distributed storage framework.However,in the current HDFS framework,the single Name-Node node has not been able to meet the increasing data requirements.Although HDFS has made corresponding improvements,it fails to consider the problem of metadata server load balancing as a Name-Node node.In addition,the current HDFS architecture cannot deal with the increasingly serious security issues such as data leakage.As an alternative defense method,mimic defense can effectively change the current network security pattern.It has been widely used in various fields and has been verified in practice.Therefore,the idea of mimic defense applied to transform the current HDFS architecture has very important academic research significance and practical application value.Based on the National 863 Program topic "Strategic Research on Active Defense System of Cyberspace(continued)",this paper focuses on the research of metadata management technology for distributed storage systems oriented to big data.Aiming at the performance and security issues of the current HDFS architecture and combining with Nginx reverse proxy technology,we have transformed the HDFS architecture and concentrated on metadata management part in HDFS architecture.The transformed architecture improves the performance and security of the HDFS architecture.The main innovations are as follows:1.Aiming at the problem that the metadata server cannot load balance in the current HDFS architecture,a load balancing algorithm based on reinforcement learning is proposed.The architecture directly joins the Nginx reverse proxy mechanism on the client and metadata server nodes and uses the Hsahing method to quantify the initial heterogeneity of metadata between executable entities,and then continuously measures the resource utilization of the metadata server rate and delay information,we use three policy modules: policy selection mechanism network,load balancing mechanism network and parameter update mechanism network to adjust the metadata load.Experiments show that the load dynamic balancing algorithm can dynamically adjust the load according to the performance of the metadata server,and has good adaptability in the case of sudden changes in data volume.2.Aiming at the security problems of the current HDFS architecture and combining with the DHR concept of mimic defense principle,we use Nginx reverse proxy technology to adjust and improve the HDFS architecture,giving the actual implementation mode and get the improved architecture with high-security verification results.Scheduling is an important part of mimic defense.During this part,a random seed scheduling algorithm based on executor heterogeneity,performance,and historical confidence is proposed for the key scheduling of existing mimic defense,and heterogeneity and performance of the implementation are involved as scheduling considerations.The experimental results show that the scheduling period of the proposed scheduling algorithm is lower than the random scheduling algorithm,but the scheduling effect is higher than the random scheduling algorithm.The scheduling result achieves a good dynamic balance in dynamics and security.3.Aiming at the task characteristics of the key voting links in the mimic defense system,based on the existing algorithms,the heterogeneity of the inter-body is introduced as the decision-making factor so that the voting algorithm can be more suitable for the threat scenarios faced by the mimic architecture.The experimental results show that compared with the existing consistent voting algorithm,the model can significantly improve the security performance of the system and effectively suppress the risk of common mode escape.
Keywords/Search Tags:distributed file system, HDFS, metadat, load balance, mimic defense, scheduling, voting
PDF Full Text Request
Related items