| RRDtool is currently a very popular file-based database used to store time-series data.However, the performance of an RRDtool-based storage system is quite poor in dealingwith a large quantity of RRD files that need to be updated due to the operating system’sreadahead and buffer-cache behaviors, which will result in limited scalability of thesystem: tens of thousands of, or perhaps one hundred thousand of RRD files in a singlesystem. Another challenge is the flexibility of the system’s capacity which stores a rapidlyincreasing number of RRD files. Moreover, it is significant and essential to keep thesystem highly available regardless of component or system failures.In this thesis, a storage system which combines mem-RRD and MooseFS for largescale time-series data is investigated and implemented in response to the issues mentionedabove. Mem-RRD is designed to replace the original RRDtool, it exploits user-levelbuffering and performs better on the aspect of I/O. MooseFS is a distributed file systemwhich guarantees high availability and flexible capacity. The system is built and deployedin a network measurement circumstance and its effectiveness is demonstrated by detailedtesting and observation. Briefly speaking, this large scale time-series data storage systemprovides good performance including I/O performance, availability and scalability. |