| Recent years have witnessed the rapid deployment of large-scale datacenters wh ich host multiple distributed applications,such as web search,data mining and storage ser-vices.For many of these applications,network latency is critical for their performance.Network latency can be mainly divided into two parts:host network stack latency and in-network latency.With the RDMA(Remote Direct Memory Access)technology which bypasses the kernel network stack by implementing the entire transport logic in hardware NICs,the host latency is much reduced.However,even with RDMA en-abled,in-network latency is still high and the performance of latency-sensitive applica-tions still remains inferior due to the following reasons:1)the lack of fine-grained QoS(Quality of Service)causes latency-sensitive traffic to be queued behind throughput-intensive traffic,resulting in large queueing delay;2)coarse-grained network load bal-ance induces traffic hot spots;3)inefficient loss recovery mechanism of RDMA results in frequent RTO(Retransmission Time Out)under packet loss which increases tail la-tency.To tackle the above problems and reduce in-network latency,many approaches have been proposed,but few of them take the hardware resource limitations of network devices into consideration.For example,network switches have limited number of hardware priority queues and network interface cards(NICs)usually have limited on-chip memory size(several MBs).In this thesis,we argue that a practical and high performance solution for low latency network should take the constraints of hardware resources into consideration.Specifically,to tackle the lack of QoS problem,previous work have proposed fine-grained priority-based flow scheduling to provide higher priority to latency-sensitive traffic thus avoiding being queued behind throughput-intensive traffic.However,they assume an infinite number of priority queues in network switches,which is usually not true in commodity datacenters.We found that there are usually only 2~3 avail-able priority queues that can be leveraged for flow scheduling.Thus,how to provide fine-grained flow scheduling with limited hardware priority queue resources remains a challenge.There are also rich literatures propose to provide more fine-grained load balance by leveraging the multiple paths in datacenter networks to eliminate traffic hot spots.However,most of these work focuses on TCP traffic but not RDMA.Differ-ent from TCP,RDMA transport is implemented in NICs which has very small on-chip memory size.Thus how to bring the benefits of the rich diversity of multiple paths in datacenters to RDMA connections with the constraints of limited hardware on-chip memory resource is a challenge.Similarly,to address the inefficiency of RDMA’s loss recovery mechanism,one also needs to consider the limited on-chip memory size.This thesis aims to provide low latency for datacenter networks in a hardware re-source efficient fashion by developing several novel techniques.Specifically,we first achieved fine-grained flow scheduling with only one more hardware priority queue in network switches while reduces the FCT(Flow Completion Time)for latency sensitive flows.Secondly,we propose a multi-path transport for RDMA to provide better load balance for RDMA traffic while minimizing the on-chip memory footprint.Thirdly,we design a memory efficient loss recovery mechanism for RDMA to enhance its perfor-mance when facing loss.We have implemented and evaluated our solutions,the results shows that:with our flow scheduling scheme,the traffic FCT is reduced by up to 60.5%compared with conventional TCP;moreover,our multi-path RDMA can reduce the FCT by up to 17.7%while improving the network utilization by up to 47%;Furthermore,our loss recovery mechanism can achieve up to 14.02x throughput while reducing 99%tail FCT by 3.1lx under certain loss rate for RDMA. |