| With the advent of the Internet age,businesses are processing more data than ever before,and the speed,volume and diversity of that data is increasing.For enterprises with data-intensive operations,having an efficient HTAP(Hybrid Transactional/Analytical Processing)system is very beneficial.While an isolated HTAP system design architecture can physically isolate OLTP(On-Line Transaction Processing)and OLAP(On-Line Analytical Processing)workloads and reduce the intertwined impact between them,there is a higher demand for data freshness.The traditional ETL(Extract-Transform-Load)data transfer method has many intermediate layers,high complexity and difficult operation and maintenance,while the single-threaded log capture and playback architecture cannot meet the high data freshness requirements of HTAP.In order to solve the above problems,this paper designs and implements Data Train,a real-time HTAP-oriented ETL transmission service system.First,the system adopts a log capture approach from the compute layer down to the storage layer in order to avoid the transaction log maintenance overhead of opening row-level data changes on the OLTP side.The concurrent log capture capability is achieved by combining the design structure of distributed storage engine with partition as the smallest unit for log capture.Second,in order to reduce the processing latency of single-threaded data log replay,the method of reallocating data logs to multiple partition group replay channels with table partition logs as the minimum management unit and regrouping is used.To minimize the processing latency of the log replay process by implementing a parallel log replay algorithm for multiple replay channels.Finally,to ensure data consistency on the OLAP side,a singlechannel sequential transaction replay is used to reduce the volume of transaction logs and thus improve data transfer efficiency by introducing a double-buffered queue for batch optimization of multiple transactions and further compressing the internal logs of batch transactions on top of that.The Data Train system proposed in this paper verifies the feasibility and advantages of the Data Train system in the HTAP environment through comparative experiments in an experimental environment.Experimental results show that concurrent log captures at the storage layer can be increased by up to about 90% compared to the compute layer;concurrent log playback algorithms reduce transaction processing latency by about 80%compared to single-threaded playback channels; and batch transaction processing optimization is enabled to increase transaction throughput by about 15%. |