Font Size: a A A

Research And Optimization Intenet Of Vehicle Of Data Storage Strategy Base On Hadoop

Posted on:2016-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CaoFull Text:PDF
GTID:2322330542976242Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Internet of Vehicle(IoV)can greatly improve the traffic situation of the city,but will generate a lot of data.How to store vast amounts of IoV data has become a formidable challenge.Cloud computing just solves the mass data storage.Hadoop is an open source cloud computing framework and cloud computing platform that used most widely currently.HDFS stores all the data in the Hadoop and optimizing the HDFS haven drawn more and more attention by researchers.HDFS storage strategy has some drawbacks,such as the number of data replica fixed and not considering the actual operation of the DataNode when select DataNode to store data block,resulting in data distribution unevenly and node load unbalancely and other issues.This thesis have studied about these problems and proposed ART storage strategy to improve the storage performance of HDFS,the ART storage strategy include zoning division algorithm,dynamic replica number algorithm and cost-based node selection algorithm.Zone division algorithm dividing all the node data in the cluster into a High-Zone and Low-Zone two areas based on the node performance and node load.Nodes in the High-Zone have high remaining performance and should be selected preferentially when store data.In order to complete the regional division algorithm,this paper presents the compute method of the nodes performance and node load and introduces the data access frequency,which lay the foundation for future improvements.Under ensuring the data effectiveness,Dynamic replica number algorithm calculates the number of copies that every file should save dynamically which integrates the failure rate and access frequency of data nodes in the cluster.The algorithm not only reduces the data redundancy,and can guarantee the data read performance.The cost-based node selection algorithm improves the randomness when HDFS selects nodes and presents the transmission cost of inter-node data and also presents the calculate method of cost when select some node that combines the node actual performance and load node.When selecting a node,identify the most suitable data storage node by calculating the node cost.Through experiments,by hot data and non-hotspot data are stored demonstrate the effectiveness of zone division algorithm.Through data redundancy and the response time of hot data validity of dynamic replica number algorithm.The cost-based node select algorithm is verified effectively by the data storage time.Finally,according to the average response time and the node relative load,verify the ART storage strategy does improve the performance of the cluster,confirm the feasibility of the idea in the thesis.
Keywords/Search Tags:HDFS, zone division, replica number, node selection
PDF Full Text Request
Related items