With the development of wireless communication and multimedia technology,people are no longer satisfied with the traditional audio-visual services,and thus the multi-modal services that can be heard,seen and touched will become the mainstream of multimedia services in the B5 G era.Due to the significant differences between different modal signals,how to simultaneously meet the requirements of low latency,high reliability and high throught has become the key to the development of multi-modal services.Device-to-Device communication(D2D)has great potential in realizing multi-modal communication because of its excellent performance in offloading network traffic and reducing transmission latency.In addition,inspired by the phenomena that hearing,vision and touch are highly correlated,this paper studies the key technologies of efficient stream transmission oriented to multi-modal services based on D2 D communication and cross-modal communication by leveraging the intrinsic correlation of different modal signals.The innovation of this paper lies in solving three problems that need to be solved urgently in the efficient stream transmission for multimodal services: 1)the loss of rate distortion performance caused by the heterogeneous signal gap,2)the decline of communication robustness caused by the dynamic diversity of the network,and 3)the reduction of network throughput caused by the limited service resources.The research work of this paper mainly includes the following four parts.(1)The theory and technology of cross-modal coding based on semantic association are studied.In order to solve the problem of rate-distortion performance loss caused by signal heterogeneity gap,a cross-modal coding model is proposed to compress multi-modal signals by using the potential semantic correlation between modalities,and the feasibility of the coding model is verified from both theoretical and technical aspects.In theory,a new information measure,semantic entropy,is proposed to characterize the randomness of a pair of multimodal source symbols and the fuzziness of their semantic relationships,so that the total information of the source symbols and their semantic relationships can be measured,and the minimum rate and rate-distortion theory required by the source in cross-modal coding are determined.In terms of technology,guided by the obtained cross-modal coding theory,this paper designs a universal cross-modal codec which is highly compatible with the existing video/haptic codec by means of cross-modal prediction and channel coding technology based on deep learning.Experimental results show that,compared with the existing coding schemes,the proposed cross-modal coding can achieve significant rate-distortion performance advantages.(2)A D2D-assisted wireless cross-modal transmission mechanism is established.In order to solve the problem of communication robustness degradation caused by the diversity of network dynamics,this paper studies the universality of network architecture,the adaptability of network dynamics and the reliability and balance of delay.First,a universal and flexible wireless cross-modal transmission architecture based on Mobile Edge Computing(MEC)is designed.Secondly,a mobile social-aware caching strategy is designed with D2 D user contact rate and interest similarity as the main parameters to reduce the impact of dynamic changes caused by the mobility and sociality of D2 D users on the cache hit rate.Finally,a delay-reliability-aware routing is proposed,and the routing selection is modeled as a shortest path problem based on delay-reliability-aware rate.Because the objective function contains random variables,two stochastic optimization algorithms are designed to solve the problem.Simulation results show that the proposed cache placement scheme can achieve at least 7.21% performance improvement and the routing scheme can achieve at least 10.74% performance improvement compared with other alternatives.(3)A cooperative transmission content sharing scheme based on D2 D is designed.Aiming at the problem of network throughput reduction caused by service resource constraints,a green content sharing based on many-to-many D2 D cooperative transmission is proposed to reduce communication energy consumption.The solution consists of three key steps: supply and demand matching,data segmentation and file reconstruction.Firstly,supply and demand matching is described as a classical maximum weighted matching problem,and a distributed matching algorithm is designed to solve it.Secondly,according to the link rate and energy consumption,a data partitioning algorithm is designed to achieve load balancing based on the weighted round Robin strategy.Finally,the file reconstruction problem is approximated as a shortest Hamiltonian path problem,and a distributed greedy algorithm is designed to find the shortest file reconstruction path.Simulation results show that the proposed scheme can effectively reduce the energy consumption without increasing the transmission delay.(4)A content-aware cross-modal resource scheduling strategy is proposed.In order to solve the problems of robustness degradation caused by service diversity and network throughput reduction caused by resource constraints,a content-aware scheduling strategy is designed by using content correlation to fuse haptic preemptive scheduling and video signal recovery.First,a flexible crossmodal bitstream transmission scheme is proposed,including video resource allocation,haptic preemptive scheduling,and video signal recovery.Among them,the flexible setting of single or double transmission and the queuing mechanism at the micro-slot level can meet the communication needs of different levels.Secondly,an online content-aware resource allocation algorithm is designed to maximize the video throughput under the constraint of haptic requirements by making full use of the content correlation between different types of bitstreams.Simulation results show that the proposed resource scheduling scheme improves the video throughput by about 11.7% compared with other mainstream schemes,while ensuring the low latency and high reliability of the tactile stream. |