Research And Optimization Intenet Of Vehicle Of Data Storage Strategy Base On Hadoop

Posted on:2016-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Cao

Full Text:PDF

GTID:2322330542976242

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The Internet of Vehicle(IoV)can greatly improve the traffic situation of the city,but will generate a lot of data.How to store vast amounts of IoV data has become a formidable challenge.Cloud computing just solves the mass data storage.Hadoop is an open source cloud computing framework and cloud computing platform that used most widely currently.HDFS stores all the data in the Hadoop and optimizing the HDFS haven drawn more and more attention by researchers.HDFS storage strategy has some drawbacks,such as the number of data replica fixed and not considering the actual operation of the DataNode when select DataNode to store data block,resulting in data distribution unevenly and node load unbalancely and other issues.This thesis have studied about these problems and proposed ART storage strategy to improve the storage performance of HDFS,the ART storage strategy include zoning division algorithm,dynamic replica number algorithm and cost-based node selection algorithm.Zone division algorithm dividing all the node data in the cluster into a High-Zone and Low-Zone two areas based on the node performance and node load.Nodes in the High-Zone have high remaining performance and should be selected preferentially when store data.In order to complete the regional division algorithm,this paper presents the compute method of the nodes performance and node load and introduces the data access frequency,which lay the foundation for future improvements.Under ensuring the data effectiveness,Dynamic replica number algorithm calculates the number of copies that every file should save dynamically which integrates the failure rate and access frequency of data nodes in the cluster.The algorithm not only reduces the data redundancy,and can guarantee the data read performance.The cost-based node selection algorithm improves the randomness when HDFS selects nodes and presents the transmission cost of inter-node data and also presents the calculate method of cost when select some node that combines the node actual performance and load node.When selecting a node,identify the most suitable data storage node by calculating the node cost.Through experiments,by hot data and non-hotspot data are stored demonstrate the effectiveness of zone division algorithm.Through data redundancy and the response time of hot data validity of dynamic replica number algorithm.The cost-based node select algorithm is verified effectively by the data storage time.Finally,according to the average response time and the node relative load,verify the ART storage strategy does improve the performance of the cluster,confirm the feasibility of the idea in the thesis.

Keywords/Search Tags:

HDFS, zone division, replica number, node selection

PDF Full Text Request

Related items

1	A Study On The Impact Of Traffic Analyzing Zone Division Scale And Highway Toll Adjustment Extent
2	Theoretical Analysis And Research On Division Method For Traffic Zone
3	Study On The Division Method Of Gabbro Weathering Zone In Ji’nan West Railway Station Area
4	Study On The Method And Application Of Traffic Zone Division Driven By Demand
5	Research On Replica Management Strategy For New Building Intelligent Platform
6	Speed Zone Division And Speed Transition Zone Setting Research
7	Research On The Optimal Selection Method Of Central Node Of UAV Cluster
8	Study On The Data Collection Method Of Internet Of Things Supported By UAV Cluster Collaboration
9	Research On Key Node Selection And Station Layout Optimization Of Public Bicycles Based On Complex Network Theory
10	Research On The Hub Node Selection Of Freight Collection Network Of "One Belt And One Road" China-Europe Freight Block Train(Kazakhstan Area)