Font Size: a A A

Parallel Query Optimization For InfluxDB Time Series Database

Posted on:2024-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2558307028499864Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of Internet of Things(Io T)technology,a large number of sensor devices are connected to the Internet.The massive amount of time series data collected by these Io T devices plays a key role in the fault warning and operation status analysis of production equipment.As a database management system optimized for time series data,the time series database can efficiently cope with the query and analysis tasks of time series data.However,high-dimensional time series data are usually generated in large Io T systems.For example,in power systems,the monitoring equipment of transmission lines and power plants will generate a large number of time series every day and a large number of data points every second,and the number of time series tends to increase year by year with the expansion of business.The existing time series database has the defects of high query latency and cannot provide real-time time series data query and analysis capability when dealing with the task of aggregation query of massive time series,which makes the time series database cannot serve the low latency business requirements well.For example,in the monitoring service of electric power system,it is necessary to perform aggregated query analysis on a large number of time series in time to discover the equipment with abnormal operation status.In order to solve the problem of high latency of time series database in massive time series analysis query scenario,this paper redesigned the query engine of time series database based on InfluxDB,designed and implemented the parallel time series database query execution engine InfluxDB-PP(Parallel Processing).InfluxDB-PP enhances the parallel capability of query processing for high-volume time series data,and enables parallel processing of different time series data by fully utilizing the computational resources of the system.In addition,as a database system used to analyze time series data,the time series database often involves a large number of expression calculations.Therefore,InfluxDB-PP also designs and implements expression query compilation in order to eliminate the overhead brought by a large number of function calls in traditional database expression calculation.Finally,a monitoring system is built with InfluxDB-PP as the core in order to meet the needs of the power system.The main contributions of this paper are as follows:(1)InfluxDB-PP,a parallel execution engine,is designed and implemented based on InfluxDB.InfluxDB-PP is designed and implemented in a pipelined query processing approach,and each query operator has multiple instances in its parallel execution framework.In order to balance the workload of each operator instance,different time series are scheduled to different operator instances by a multiple data-source data scheduler before query execution,and different time series data are processed in parallel by taking advantage of multi-core processors.In addition,InfluxDB-PP also refines the granularity of inter-arithmetic data transfer in order to improve the execution efficiency of the common window aggregation type queries in the time series scenario,and improves the processing efficiency of time series data by designing the data processing strategy of batch data blocks.(2)Expression query compilation is designed and implemented in InfluxDB-PP.In order to solve the problem of function calls caused by a large number of expression calculations in the time series scenario,this paper tries the expression query compilation technique for the first time in the time series database.The purpose is to eliminate a large number of function calls by calling the corresponding code fragments during the execution of the aggregation or filter operator.(3)A monitoring system is built with the electric power system as the scenario.Taking power data as the scenario,this paper builds a monitoring system with InfluxDB-PP as the core.The monitoring system mainly consists of three parts: ETL component,InfluxDB-PP and user interface.In this paper,InfluxDB-PP is designed based on the above techniques,and the effectiveness of the above methods in multiple time series analysis query scenarios is verified by comparing the simulated power data with the open source InfluxDB.
Keywords/Search Tags:Time Series Database, Time Series Data Query Optimization, Query Compilation, Monitoring System
PDF Full Text Request
Related items