With the rapid popularization of the Internet,various Internet applications have higher and higher requirements for database.These applications require that the database not only has good ACID attributes,but also supports high-speed data writing and updating while maintaining query efficiency.In addition,it must have good distributed extensibility.Under this background,a new generation of NoSQL database came into being.At present,popular NoSQL databases include Google’s Spanner/F1,Alibaba’s Ocean Base,Cockroach DB,Ti DB,etc.Their underlying storage engines all adopt LSM-Tree architecture.The database based on the LSM-Tree can convert the user’s random write into sequential write to improve the writing performance.Compared with the database based on the B+Tree,the reading performance of the LSM-Tree database is relatively poor,but its reading performance will not decline too much in practical application by adding optimization methods such as index,cache,and bloom filter.In addition,the LSM-Tree database must also run multiple background threads to compact the data files in the disk regularly.Due to these threads,the database can maintain the optimized shape of the LSM-Tree,and reduce reading amplification and space amplification.The compaction operation of the background thread consumes considerable CPU computing resource.Therefore,the efficiency of the database system in processing ordinary online transaction logic will decline while foreground operations such as data query and transaction processing compete with background threads for CPU computing resource.In the mixed read-write workload,this resource competition will bring some obvious performance impacts to the database.According to the experiment of the Ali’s X-Engine database,with the rapid development of new storage technologies such as SSD and NVM,data I/O bandwidth has made great progress.At this time,the bottleneck limiting data throughput in the LSM-Tree database has shifted to the CPU’s computing power.The data compaction of the LSM-Tree is the main reason for the bottleneck.To solve the above problem,this thesis takes RocksDB,an open source LSM-Tree database,as the research object.This thesis focuses on how to offload the main computational tasks of the compaction operation to FPGA.In this thesis,we design and implement an FPGAbased RocksDB data compaction acceleration system based on the RocksDB source code and the Open CL programming model by using FPGA to create the hardware logic computing unit for data compaction processing.This system can realize the offloading of compaction tasks to reduce the computational load on CPU and improve the performance of the database system in processing common data access transactions.In order to solve the appeal problem and implement the FPGA-based RocksDB data compaction acceleration system,the major works of this thesis are as follows:(1)Studying the project architecture and source code of the RocksDB,analyzing its shortcomings,and giving ideas to solve these problems.(2)Designing the overall architecture of the FPGA-based RocksDB data compaction acceleration system,including software architecture design and FPGA hardware logic structure design.(3)Implementing the FPGA hardware logic of the RocksDB data compaction process.By using the fast reconfigurability of FPGA,we can quickly implement the prototype and iteration of this system.In this thesis,we plan to use the Alveo U280 Data Center Accelerator Card model FPGA from Xilinx for hardware development,and use Vitis tool to realize the software based on C/C++ high-level programming language and Open CL programming model.(4)Designing and implementing the FPGA driver to manage the data flow and task scheduling between host and FPGA.At the end of this thesis,for the FPGA-based RocksDB data compaction acceleration system,this thesis mainly uses RocksDB’s own performance test tool db_bench,Vitis tools and some test programs for experiment and data collection.The experimental data indicates that the overall CPU computing resource consumed by this system are about 10% less than that of the native RocksDB when processing KV data sets with smaller key size,and the maximum CPU usage when processing computational tasks is about 50% lower than that of the native RocksDB.The data compaction throughput of this system based on FPGA is about 40%higher than that of the native RocksDB based on CPU.The experimental results prove that this system can bring performance improvement to RocksDB with higher data compaction efficiency and lower CPU computing resource usage. |