| In recent years,millions of User Generated Content(UGC)videos have sprung up with the rapid development of mobile camera technologies and social media platforms.However,due to the limitations such as shooting devices,technology and transmission bandwidth,videos inevitably introduce distortion in the processes of capturing,compressing,transmitting,reconstructing,displaying,etc.Additionally,different from the Professionally Generated Content(PGC)videos,which are all taken by the professional photographer,UGC videos are usually acquisited and uploaded by amateur photographers.Since the low threshold for producing UGC videos,the contents of UGC videos are generally disparate and the UGC videos are susceptible to extremely diverse and complicated degradations,which brings new challenges to video quality assessment(VQA).Therefore,it is necessary to conduct in-depth research on UGC VQA,which can effectively guide the optimization of video processing and transmission,monitor the quality of videos provided by the platform,and provide users with the best perceived experience.In order to provide better perceived experience with limited transmission bandwidth,it is necessary to perform preprocessing on videos.To analyze the impact of preprocessing methods on video quality,this paper proposes the first Preprocessed and Transcoded Video Database(PTVD).The dataset selects 15 original videos with different contents from public platforms and processes them with various preprocessing methods and bitrates,generating a total of 570 test videos.Then,we design and conduct a subjective experiment according to international standards.The analysis shows that appropriate preprocessing algorithms can effectively improve the video quality.Subsequently,in order to measure the impact of preprocessing algorithms on video quality,this paper proposes a UGC VQA model based on saliency guidance and local feature embedding.For spatial feature extraction,the model combines saliency prediction with spatial Transformers to guide the model to focus more on areas of interest when aggregating global information.Besides,gradient feature embeddings are imported to provide local information to our spatial Transformer.In the temporal domain,we utilize the outstanding time-series modeling ability of Transformers to model temporal features,and improve model efficiency by introducing the temporal sampling strategy.The superior performance of the model on the PTVD dataset and public UGC VQA dataset demonstrates its effectiveness.User Generated Content(UGC)videos are susceptible to complicated and variant contents and degradations,which prevents the existing blind video quality assessment(BVQA)models from good performance since the lack of the adaptability of distortions and contents.To mitigate this,we propose a novel prior-augmented perceptual vision transformer(PriorFormer)for the BVQA of UGC,which boots its adaptability and representation capability for divergent contents and distortions.Concretely,we introduce two powerful priors,i.e.,the content and distortion priors,by extracting the content and distortion embeddings from two pre-trained feature extractors.Then we adopt these two powerful embeddings as the adaptive prior tokens,which are transferred to the vision transformer backbone jointly with implicit quality features.Additionally,we design a temporal pooling module leveraging the temporal perception mechanism of HVS.Based on the above strategy,the proposed PriorFormer achieves state-of-the-art performance on three public UGC VQA datasets including KoNViD-1K,LIVE-VQC and YouTubeUGC. |