| In today’s society,we rely more and more on online communication to solve problems and often open online video conferences.At the same time,online classes are particularly frequent during the epidemic.In the process of video transmission,an inevitable problem is the flower screen in the weak network environment,which will seriously affect the user experience.However,in the above scenarios,we mostly share ppt or some other documents for communication.For this kind of video transmission,as long as we give priority to ensuring the quality of text area and weakening the coding accuracy of other areas,the user experience will not be greatly affected.In this way,we can significantly reduce the network bandwidth and deal with the flower screen of weak network environment.In order to solve the above problems,this paper proposes a set of solutions:in the video conference scene,in the weak network environment with insufficient network bandwidth,give priority to ensuring the quality of text.Specifically,first determine the text area through the text detection algorithm,and then use ROI coding to encode the text area more finely.Combined with the above general ideas,the following problems are found in the implementation of this paper:(1)the existing text detection algorithms can not well meet the needs of the scene,including real-time and accuracy;(2)ROI coding needs to be modified inside the encoder.At present,there is no API directly used,and it needs to be dynamically adjusted according to the network environment;(3)The meeting scene has its particularity.Frequent text detection wastes resources.In order to solve the above problems,this paper proposes relevant algorithms to optimize the whole process,as follows:Firstly,this paper proposes a text detection algorithm based on knowledge distillation,which is optimized on the basis of dbnet to improve the real-time performance.Taking the more complex RESNET as the teacher model and mobilenetv3 as the student model,the network is optimized through the proposed knowledge distillation algorithm,including middle-level feature learning,probability graph knowledge distillation The relationship between pixels is used to enhance the generalization ability.At the same time,in order to better improve the detection ability of small text,the fusion factor is introduced to enhance the FPN module,and finally a model with satisfactory real-time and accuracy in this scene is obtained.Secondly,a region of interest algorithm based on H264 encoder is proposed,which goes deep into the encoder,analyzes and modifies the principle of H264 encoder,and realizes different coding accuracy in different regions.At the same time,an algorithm of dynamically adjusting ROI coding according to the network environment is proposed to adapt to different network environments.In addition,in order to save resources,an inter frame difference detection algorithm is proposed to ensure that the region of interest is updated when necessary.Considering the dynamic switching effect of PPT,a window mechanism is proposed to ensure that the region of interest will not be updated frequently when ppt turns pages,wasting resources.Thirdly,a video conference system is built to simulate and verify the effect of the scheme in the actual environment.The video conference system integrates the proposed algorithm module.The algorithm module combines the above text detection algorithm with the region of interest algorithm,and achieves the ideal effect after testing. |