| Migration to clouds is an important way to keep abreast of the development of the digital economy,and performance is the core competitiveness of cloud service providers.Optimizing the performance of public cloud storage systems is crucial for cloud service providers.Currently,public cloud storage systems face the problems such as complex user performance requirements and low performance of hardware devices.On the one hand,it is difficult for public cloud storage systems to identify application types of users.On the other hand,existing public cloud storage software can not saturate the performance of newly-introduced high-speed devices.Therefore,in this paper,the following research starts from three perspectives: identifying user requirements,coordinating application requests,and collaborative work of hardware and software,respectively.· To solve the problem of cloud storage system’s perception of user performance requirements,the paper analyzes the storage traces from the real environment of public cloud storage,summarizes the storage signatures of applications,and proposes Sketcher,a cloud application identification approach based on machine learning methods.Sketcher solves the problems of the biased distribution and the existence of noise in samples through anomaly detection and oversampling techniques,respectively.The experimental results show that Sketcher improves the recognition accuracy by up to 52.4% compared with existing methods.· To reduce the high overhead of ensuring consistency in distributed scenarios,this paper proposes Aurogon,an all-phase reordering-resistant distributed transaction processing technology.This paper finds that the root cause of transaction aborts is request reordering,and Aurogon uses adaptive request deferral mechanism and”pre-attaching” mechanism in different phases of transaction processing to avoid nonuniform data access latency and late arrivals of dependent requests,respectively.The experimental results show that Aurogon reduces the abort rate by 73% and increases the throughput rate by 4.1 times compared with existing methods.· To improve the poor scalability and low accuracy in clock synchronization,this paper proposes 2LClock,a software-hardware co-design clock synchronization mechanism based on RDMA.2LClock splits the clock synchronization process to reduce the impact of network stack fluctuations,and improves scalability by bypassing the remote CPU in the critical path of clock synchronization with the combination of different transport modes of RDMA.The experimental results show that the average error of clock synchronization in 2LClock is 41 nanoseconds,and the CPU utilization rate of critical nodes is reduced by 97% compared with existing work.· To tackle the difficulty of the low performance of the fault-tolerant mechanism on ZNS,this paper proposes ZNSRAID,a redundant array mechanism adapted to ZNS.ZNSRAID utilizes the persistent memory region to cache the parity data and designs a dynamic request splitting mechanism to saturate the performance of ZNS.ZNSRAID also implements the data layout with low tail latency for requests.The experimental results show that ZNSRAID improves the throughput by 2.97 times and reduces the tail latency by 82% compared to existing work. |