| At present,many terminal devices have multiple network interfaces,but the current network architecture only supports single path transmission protocol,that is,terminal devices can only use one network interface for each communication,which wastes valuable bandwidth resources.MPTCP(Multipath TCP)protocol is a multipath transmission protocol proposed by IETF.It supports the parallel transmission of data by multiple TCP subflows.At the same time,because MPTCP protocol is a transport layer protocol extended based on TCP protocol,it also has good compatibility and can adapt to the existing TCP/IP network protocol architecture.Data scheduling algorithm is one of the decisive factors for the performance of MPTCP protocol.At present,the default transmission scheduling algorithm in MPTCP protocol is Min Rtt,that is,it gives priority to select the subflow with the smallest RTT from the subflows with remaining congestion window,but the algorithm does not consider the complexity of the network,so that MPTCP connection will face the problems of data packet disorder,network congestion and receiver congestion in the transmission process.In this paper,the performance of MPTCP protocol is tested and verified,and the reasons for the decline of MPTCP protocol performance are analyzed.Then,for the main purpose of low delay transmission,data scheduling algorithms based on congestion window and round-trip delay and scheduling algorithms based on reinforcement learning are designed respectively.The specific work is as follows:(1)In order to realize the low delay transmission of data,a scheduling algorithm based on cwnd and rtt is proposed in this paper.The algorithm aims to synchronize the transmission completion time of each subflow in a decision cycle.The algorithm is divided into two stages: subflows parallel transmission and tail delay processing.In the subflows parallel transmission stage,the data is scheduled with the transmission rate priority strategy,which can make full use of the bandwidth of each subflow and send data quickly;In the tail delay processing stage,the greedy strategy of first completion first is used for data scheduling to avoid the slow link delaying the overall task completion time.The two stages are divided based on the amount of data in the MPTCP buffer.The cooperation of the two modules can reduce the transmission delay of the task.(2)Due to the delay of heuristic algorithm in parameter acquisition in dynamic network,the scheduling algorithm can not make correct decisions according to the current network conditions;To solve this problem,this paper proposes a scheduling algorithm based on reinforcement learning Q-learning algorithm.The scheduling algorithm can select the corresponding sub stream according to the existing environment state and the policy function,and calculate the reward and punishment value according to the transmission completion time of the subflow to update the Q-value table.By continuously optimizing the Q-value table to realize the perception of the whole network environment,the scheduler can select the most appropriate sub stream to complete the transmission task according to the existing environment state. |