| In recent years,with the rapid development of stereoscopic imaging technology,stereoscopic video plays an increasingly important role in the fields of entertainment and industry.However,the collection,storage,encoding,transmission,and display of stereoscopic video are affected by factors such as technical level and imaging equipment,and there will be problems such as distortion and degradation of perceived quality,resulting in a poor viewing experience.Therefore,judging whether stereoscopic video meets the perceptual quality of human vision has become one of the key research issues in this field.It can not only judge the perceptual quality of stereoscopic video,but also assist in the improvement of stereoscopic video production process and technology.By studying the human visual perception mechanism and the information characteristics of stereoscopic video,this thesis proposes the following two stereoscopic video quality assessment methods based on convolutional neural networks(CNN).Firstly,this thesis proposes a stereoscopic video quality assessment method based on multi-scale and multi-level attention enhancement fusion.This method builds a three-branch convolutional neural network model,the left-view and right-view branches extract the features of the left-view and right-view videos respectively,and they extract the rich scale change features in the stereoscopic video through the channel-wise multi-scale feature aggregation unit.At the same time,the model adopts a multi-level attention enhancement fusion strategy in the feature fusion branch to simulate the long-term complex binocular information fusion and attention change process in the human visual pathway.In addition,this method preprocesses the data into video blocks and combines 3D convolution for effective extraction of temporal features.Secondly,this thesis proposes a stereoscopic video quality assessment method based on disparity and competitive fusion.The model built by this method consists of a left-view branch,a right-view branch,and a disparity compensation branch,the inputs of the three branches are left-view video blocks,right-view video blocks and difference video blocks.In order to simulate the mechanism of visual attention and the mechanism of visually capturing multi-scale information,both the left and right viewpoint branches use multi-scale cross-dimensional attention modules to realize multi-scale feature extraction and cross-dimensional attention guidance.And the model constructs a disparity branch with an enhancement unit to effectively extract the disparity information and compensate it to the feature stream,which strengthens the stereoscopic visual quality perception of the overall model.In addition,based on the binocular rivalry phenomenon in the human visual system,the model designs a multi-level competition fusion structure,which realizes the feature fusion from the left and right viewpoints through three competitions.Both of the proposed stereoscopic video quality assessment methods are experimentally verified on the symmetrical distortion stereoscopic video database NAMA3DS1-COSPAD1 and the asymmetrical distortion stereoscopic video database QI-SVQA.The results show that the two methods have achieved good experimental results on both symmetric and asymmetric distortion data which are consistent with the quality perception of the human eye.And they all have good adaptability. |