| In the age of the Internet of everything,vehicles become the third most connected device,the technology of the Internet of vehicles has been widely concerned by the academic and industrial circles.With information and communication technology as the core,the Internet of vehicles integrates many cutting-edge technologies such as communication,sensing and positioning,which can significantly improve driving safety and efficiency.On the one hand,due to the diversity of user needs,there are many different standard networks in the Internet of vehicles.On the other hand,due to the characteristics of high moving speed and strong randomness of vehicle user nodes,a large number of switches will occur between different networks,thus reducing the communication quality of users.Therefore,how to reduce the unnecessary handoff times under the premise of ensuring the service quality of vehicle users has become a key topic in the research of vehicle network communication.This paper first introduces Vehicle ad-hoc Network(VANET)and Long-term Evolution(LTE)networks and their related technologies,and focuses on the classification,performance indicators and classical vertical handoff algorithms of vertical handoff,and analyzes and discusses the advantages and disadvantages of each classical algorithm.Then,this paper proposes a vertical handoff strategy based on Q-learning algorithm.Q-learning,as a kind of algorithm in reinforcement learning,enables each vehicle user to conduct interactive learning with the environment of channel conditions as an intelligent agent,so as to finally obtain the result that meets the target requirements.In the trigger condition of this handoff strategy,the bandwidth is allocated equally to each vehicle user considering the guarantee of the service quality and load balancing of vehicle users.The multi-cell scene model was simulated by MATLAB,and compared with the classic algorithm based on SINR handoff strategy and rate handoff strategy,the proposed Q-learning algorithm can reduce unnecessary handoff times on the premise of ensuring the system throughput.Considering the running time of Q-learning algorithm is too long,this paper proposes another algorithm of reinforcement Learning,namely the vertical handoff strategy based on UCB algorithm.This algorithm makes a balance between the utilization of historical information and the exploration of future information,and this algorithm can be deployed in a distributed way.The handoff strategy follows the handoff scenarios and trigger conditions designed above,and is simulated by MATLAB,and two classical algorithms and Q-learning algorithms are compared.The simulation results show that the handoff strategy has better performance and less running time. |