Font Size: a A A

Design And Implementation Of HBase-based Traffic Stream Data Real-time Storage And Query Optimization

Posted on:2018-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:L J QuFull Text:PDF
GTID:2322330533459482Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the gradual maturity of Internet technology,intelligent transportation technology has been developed by leaps and bounds.The intelligent transportation technology based on massive data has been paid more and more attention by the industry.The traffic data acquisition technology develops rapidly,and the collection means diversified,and the amount of data increases.Medium-sized cities generate tens of millions of trading data every day,which collected to reach 100 terabytes every year.Faced with such a large amount of data,we should use the storage and query system to optimize to meet the practical application needs.Hadoop is a more mature large data processing solution,whose core technology HDFS and Map Reduce provides efficient data storage capabilities and data analysis capabilities.HBase distributes database using Hadoop's distributed file system to achieve storage,supporting Hadoop parallel computing framework.As a mass of data storage media,the useage of HBase has higher reliability and data processing capabilities.In order to solve the performance flaw of traffic flow data on traditional relational database,this paper designs and realizes a real-time storage and query system of traffic flow data based on HBase,aimed at data writing and data reading on the basis.The main research work is as follows:(1)Based on the road vehicle information data,this paper designs a cluster optimization scheme based on HBase for storage and query.(2)In the aspect of data storage,this paper designs a HBase composite primary key storage model based on data characteristics.Firstly,a Region pre-partitioning strategy based on the segmentation of data community is introduced to solve the problem of data "hot spot" caused by Region partition.Secondly,in order to solve the problem of data loss caused by cluster node change,a storage scheduling algorithm based on hash technology and consistency hash algorithm is proposed.Finally,this paper provides two aspects of the experiment of the data write performance and buffer queue write threshold,verifying the optimization of the data query module compared to the existing data.The memory module has better performance.(3)In the aspect of data query,this paper designs the multi-level caching strategy with Redis distributed server and local disk,and gives the corresponding implementation scheme.This paper first presents a Redis distributed cache server system architecture,design a cache record value storage model.Then,according to the difference of the access frequency,the concept of heat value is introduced,and a cache elimination algorithm based on heat accumulation is designed.Finally,this paper validates the two aspects of data read efficiency and cache elimination strategy,and verifies that the data query module in this optimization scheme has better performance than the existing data query module.
Keywords/Search Tags:HBase, Intelligent Transportation, Redis cluster, Storage Model, Multi-level Cache, Cache Elimination Algorithm
PDF Full Text Request
Related items