| Under the environment of big data,the massive data is being generated in the field of internet and other industries and have a rapid growth.Massive heterogeneous trajectory data,as a kind of huge scientific,economical and social value data,is particularly important for its inquiry.The traditional massive trajectory data query system framework has the problem that the real-time query of users can not be satisfied when the data volume increases.A real-time processing system design method for massive trajectory data based on storm is proposed.First of all,a series of operations such as cleaning,trajectory compression and data normalization of the original massive heterogeneous trajectory data make it a unified data structure to facilitate system processing,and then use the distributed real-time computing system storm platform to process real-time massive trajectories,using kafka as an intermediate cache to increase system throughput.However,in the process of querying,the system encounters two types of problems,such as unbalanced resource allocation of nodes and excessive data volume exceeding the upper limit of system processing.For the problem of uneven allocation of system resources caused by the default storm scheduling mechanism,the traditional slot low-usage-priority strategy does allocate the slot resources of each node in the storm cluster uniformly.However,the CPU load of the nodes in the cluster is still unbalanced.In this thesis,we improve the traditional slot-based low-usage-first strategy.When allocating slot resources to each node in the cluster,we not only consider the slot usage of the nodes,but also consider the CPU load of the nodes,and solve the imbalance problem of CPU load.When a new node in a cluster joins,a dynamic load migration algorithm is used to migrate the load of the nodes in the original cluster to the newly added node.Aiming at the problem that the data of the trajectory increases too much beyond the upper limit of the system in a special period,an overloading method is designed to intermittently insert the trajectory data into the system,so that the amount of data processed by the real-time query system is obviously reduced.Through the specific business data to test the system performance,it mainly includes the system test that introduces the traditional slot-based low-use-first-priority strategy,the system test that introduces the improved slot low-priority strategy.In the end,the system test that introduces the dynamic load migration algorithm,and the system node load test that introduces the overload handling method.The test results show that the real-time query system based on storm has been effectively improved in load balancingproblems in different scenarios,and the real-time performance of the system has been significantly improved.When the amount of data exceeds the upper limit of the system processing,there is no problem of node overload and system downtime,and the real-time query requirement of mass trace data can be met. |