| In the information age,the business scene of production and living has become increasingly complex,and the volume of data generated has also expanded dramatically.Streaming data covers a wide range of areas and is close to everyone.From individual social activities and online shopping information,to traditional traffic and security surveillance data,to transaction logs in financial markets,to telemetry data for sensors in the industrial world,these are streaming data.By archiving and analyzing convective data,you can summarize objective laws and make rational decisions.Compared with traditional data types,streaming data usually presents as a set of multi-dimensional data records that grow with timeline.In data records,there are both structural metrics that can be intuitively understood,and there may be unstructured data like image.These properties lead to the traditional storage requirements for file systems with bounded,fixed-format data,or databases that make it difficult to apply streaming data.Therefore,how to build a system for large-scale streaming data storage is worth studying.With the self-designed and developed large-scale streaming data storage system as background,this thesis expounding in detail the design ideas,the involved theoretical technologies and their implementation methods.The main works of the thesis is organized as follows:(1)Proposed the idea of dividing and processing structured and unstructured data in the data stream and unified retrieval.(2)Based on the KV storage engine RocksDB,a structured data storage module is constructed.Around this module,a mixed mapping method of structured data to key-value pairs is proposed,and a context-based data compression algorithm is adopted to reduce the size of key pairs.(3)A flat hierarchy of containers、unstructured data storage module is designed and implemented based on the continuous disk write model.According to the characteristics of streaming data,the indexes are compressed and stored in blocks,which reduces number of indexes.Aiming at the characteristics of streaming data storage,a storage node load balancing adaptive algorithm is proposed.Through the above work and optimization,it is ensured that the storage system can cope with flexible format and large-scale streaming data storage,and at the same time ensure scalability in the system. |